a68f0cf222
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open findings from thed692232re-review are now Resolved: Server327e9c5Server-031, -032, -038, -039, -040, -041, -042, -043 Contractsbd1d1f1Contracts-016, -017 Testsd48099fTests-025, -026 IntegrationTests865c22aIntegrationTests-022, -023, -024 Client.Java10bd0c0Client.Java-027, -028, -029, -030, -031 Client.Rust83eba4bClient.Rust-021 Also fixes stale "Open findings" header counts in Client.Java and IntegrationTests findings.md that survived the resolve passes (the agents updated each finding's Status but missed the header sum). `regen-readme.py --check` is now green. Module status: 11 / 11 reviewed, 0 / 276 total open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
464 lines
60 KiB
Markdown
464 lines
60 KiB
Markdown
# Code Review — IntegrationTests
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/ZB.MOM.WW.MxGateway.IntegrationTests` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-24 |
|
|
| Commit reviewed | `d692232` |
|
|
| Status | Reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
### 2026-05-20 re-review (commit `a020350`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring `"worker"`, which matches almost any plausible fault message and dilutes the check to "the description is non-empty"). |
|
|
| 2 | mxaccessgw conventions | No issues found. The five new tests honor live opt-in gating, `[Collection]` serialization, "no synthesized events", and the credential-redaction contract for the assertions they make. |
|
|
| 3 | Concurrency & thread safety | No issues found. `GatewaySession.State`/`FinalFault` access in the abnormal-exit poll loop goes through `_syncRoot`; `RecordingServerStreamWriter.Messages` returns a locked snapshot copy. |
|
|
| 4 | Error handling & resilience | No issues found. `ShutDownAsync`'s opt-in `propagateStreamFaults` correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path. |
|
|
| 5 | Security | Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from `DiagnosticMessage` only; it does not assert the credential is absent from the accumulated test output, where the worker `stderr`/`stdout` and the gateway log are echoed). |
|
|
| 6 | Performance & resource management | No issues found. All six `RecordingServerStreamWriter<MxEvent>` instantiations use `using` declarations; `using CancellationTokenSource` is the consistent pattern. |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` documents all five new parity surfaces and the two new env-var defaults (`MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER`/`_PASSWORD`). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-018 (`GatewayServiceFixture.TryGetSession` declares `out GatewaySession session` non-nullable while the caller binds it as `out GatewaySession? session`; the null-forgiving operator inside `SessionRegistry.TryGet` propagates a misleading non-null annotation). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active `StreamEvents` task observed the worker fault; relies entirely on the session-state poll and would silently pass if `MarkFaulted` were ever moved off the stream-consumption path). |
|
|
| 10 | Documentation & comments | No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases). |
|
|
|
|
### 2026-05-20 review (commit `1cd51bb`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a `StreamEvents` task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
|
|
| 2 | mxaccessgw conventions | Live opt-ins, `[Collection]` serialization, and the "don't synthesize events" rule are honored. No issues found. |
|
|
| 3 | Concurrency & thread safety | `LiveResourcesCollection` serializes all three live classes; `RecordingServerStreamWriter` locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
|
|
| 4 | Error handling & resilience | `ShutDownAsync` already isolates cleanup exceptions per category. No issues found. |
|
|
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
|
|
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (`RecordingServerStreamWriter.messageArrived` `SemaphoreSlim` is never disposed; the type owns an `IDisposable` field but is not itself disposable). |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 (`[Trait("Category", ...)]` repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between `LiveGalaxyRepositoryFactAttribute` and `GalaxyRepositoryOptions`). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-014 (`Unadvise`, `RemoveItem`, `Unregister`, `WriteSecured` ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-`Ok` MXAccess failure as `ProtocolStatusCode.Ok`, contradicting both the assertion and `HResultConverter`). |
|
|
|
|
### 2026-05-18 review (commit `6c64030`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForMessageAsync` ignores cancellation). |
|
|
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
|
|
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
|
|
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
|
|
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
|
|
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
|
|
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
|
|
|
|
### 2026-05-24 review (commit d692232)
|
|
|
|
Re-review pass at `d692232` scoped to: the rename (`dc9c0c9`) of the
|
|
namespaces, csproj, and the `IntegrationTestEnvironment.ResolveRepositoryRoot`
|
|
walker; the `DashboardLdapLiveTests` update to inject a `GroupToRole`
|
|
mapping in place of the dropped `LdapOptions.RequiredGroup` default
|
|
(`27ed651`); and the inline `NullDashboardEventBroadcaster` fake added
|
|
to `WorkerLiveMxAccessSmokeTests` for the new `EventStreamService` ctor
|
|
parameter (`d692232`).
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
|
|
| 2 | mxaccessgw conventions | No issues found — namespaces, env-var names, and live-test attribute discipline preserved. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | Confirmed: the `RpcException(StatusCode.Cancelled)` sibling catch added in `WorkerLiveMxAccessSmokeTests.ShutDownAsync` covers the only site where a gRPC-mapped `OperationCanceledException` can surface — no other call sites need widening. |
|
|
| 5 | Security | No issues found. |
|
|
| 6 | Performance & resource management | No issues found. |
|
|
| 7 | Design-document adherence | No issues found — `docs/GatewayTesting.md` was updated in the same wave to describe the GroupToRole-fixture pattern. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-022 (`ResolveRepositoryRoot` silently falls back to `Directory.GetCurrentDirectory()` on exhausted walk, which can hide misconfiguration), IntegrationTests-024 (inline `NullDashboardEventBroadcaster` fake is the only such use today but worth watching for duplication once a second consumer arrives). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-023 (`DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the `ldap_group` claim but does not assert the emitted `Role: Admin` claim, leaving the role-mapping path untested). |
|
|
| 10 | Documentation & comments | No issues found. |
|
|
|
|
## Findings
|
|
|
|
### IntegrationTests-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
|
|
|
|
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
|
|
|
|
### IntegrationTests-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
|
|
|
|
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
|
|
|
|
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
|
|
|
|
### IntegrationTests-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
|
|
|
|
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
|
|
|
|
### IntegrationTests-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
|
|
|
|
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
|
|
|
|
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
|
|
|
|
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
|
|
|
|
### IntegrationTests-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
|
|
|
|
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — no `[Collection]` or assembly-level `CollectionBehavior` existed. Added `LiveResourcesCollection.cs` with a `[CollectionDefinition(Name, DisableParallelization = true)]` and applied `[Collection(LiveResourcesCollection.Name)]` to `WorkerLiveMxAccessSmokeTests`, `GalaxyRepositoryLiveTests`, and `DashboardLdapLiveTests`. A named collection (rather than an assembly-wide `DisableTestParallelization`) was chosen so the live classes serialize against each other and within themselves while non-live tests (`IntegrationTestEnvironmentTests`) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
|
|
|
|
### IntegrationTests-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
|
|
|
|
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — `LiveLdapFactAttribute.Enabled` and `LiveGalaxyRepositoryFactAttribute.Enabled` each inlined the ordinal `== "1"` comparison while `LiveMxAccessFactAttribute` delegated to `IntegrationTestEnvironment`. Added `IntegrationTestEnvironment.IsEnabled(string variableName)` as the single implementation; `LiveMxAccessTestsEnabled`, `LiveLdapFactAttribute.Enabled`, and `LiveGalaxyRepositoryFactAttribute.Enabled` now all call it. Verified by build.
|
|
|
|
### IntegrationTests-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
|
|
|
|
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal `ServerCallContext` stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
|
|
|
|
### IntegrationTests-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
|
|
|
|
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
|
|
|
|
**Re-triage:** The named method `WaitForFirstMessageAsync` no longer exists — IntegrationTests-003's resolution renamed/replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages and blocks on a `SemaphoreSlim`. The underlying defect still held: that replacement method also took only a `timeout` and never observed a `CancellationToken`. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's `firstMessage.Task.WaitAsync` detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added an optional `CancellationToken` parameter to `WaitForMessageAsync`, linked with the existing timeout source via `CancellationTokenSource.CreateLinkedTokenSource`, so a per-test cancellation aborts the wait promptly. `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now creates a `CancellationTokenSource`, passes its token into the `StreamEvents` `TestServerCallContext` and into `WaitForMessageAsync`, so the stream call and the wait share one cancellation source. On timeout the method already throws a `TimeoutException` whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate `output.WriteLine`). Verified by build; live tests not executed.
|
|
|
|
### IntegrationTests-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts `Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code)`. `HResultConverter.CreateProtocolStatus` (`src/MxGateway.Worker/Conversion/HResultConverter.cs:39`) actually sets `Code = ProtocolStatusCode.MxaccessFailure` whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is *not* `Ok` on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting `Ok`. A reader can reasonably conclude the gateway always reports `Ok` for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
|
|
|
|
**Recommendation:** Reword the invalid-handle comment to "the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero `Hresult` carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as `MxaccessFailure` (per `HResultConverter`), not as a `RpcException`.
|
|
|
|
**Resolution:** 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero hresult carrying the COM failure (per `HResultConverter`), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as `MxaccessFailure`, never as an `RpcException`.
|
|
|
|
### IntegrationTests-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` constructs a `RecordingServerStreamWriter<MxEvent>` and starts a `StreamEvents` task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the `Write` command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an `OnWriteComplete`, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with `new TestServerCallContext()` (no cancellation source), any fault raised by the stream task (an exception from `EventStreamService`, a session-not-found, a backpressure overflow) is swallowed — `streamTask` is later awaited in `ShutDownAsync` only inside a broad `catch (Exception ex)`, which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (`eventWriter`) is dead code in this test that suggests an assertion was intended.
|
|
|
|
**Recommendation:** Either remove the unused `eventWriter`/`StreamEvents` plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an `OnWriteComplete` event for the written item via `eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...)`, matching the round-trip proof used by `scripts/run-client-e2e-tests.ps1 -VerifyWrite`.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Rewrote `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` so the previously-dead `eventWriter`/`StreamEvents` plumbing actually drives an assertion. The test now waits for an `OnWriteComplete` event matching the Write's (server, item) handle pair via `eventWriter.WaitForMessageAsync` (using `IntegrationTestEnvironment.LiveMxAccessEventTimeout`), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a `CancellationTokenSource` and the test asserts `streamTask.IsFaulted == false` before cleanup. `ShutDownAsync` gained an opt-in `propagateStreamFaults` flag so a faulted `StreamEvents` task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so `StreamEvents` observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RecordingServerStreamWriter<T>` owns a `SemaphoreSlim messageArrived` (`IDisposable`) but does not itself implement `IDisposable`, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an `IDisposable` field — the standard hygiene under `Directory.Build.props`'s `TreatWarningsAsErrors=true` is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself `IDisposable`; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
|
|
|
|
**Recommendation:** Make `RecordingServerStreamWriter<T>` implement `IDisposable`, dispose `messageArrived` in `Dispose`, and wrap each instantiation in a `using` block (`using RecordingServerStreamWriter<MxEvent> eventWriter = new();`).
|
|
|
|
**Resolution:** 2026-05-20 — `RecordingServerStreamWriter<T>` now implements `IDisposable` and its `Dispose` releases the `messageArrived` semaphore. All six live tests in `WorkerLiveMxAccessSmokeTests` now allocate the writer with a top-of-method `using` declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
|
|
|
|
### IntegrationTests-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** IntegrationTests-005 was resolved by adding live coverage for `Write` and an invalid-handle `AddItem`, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, and `WriteSecured` ordering. CLAUDE.md singles out `WriteSecured` ("`WriteSecured` failing before a value-bearing NMX body") and `OperationComplete` semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
|
|
|
|
**Recommendation:** Add live MXAccess coverage for the teardown chain (`Unadvise` then `RemoveItem` then `Unregister`, asserting each replies with `ProtocolStatusCode.Ok` and the next operation no longer references the freed handle), and at minimum one `WriteSecured` parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Added three new `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`, all reusing the existing opt-in env var and `ShutDownAsync` cleanup helper. (1) `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies `Ok` with the matching `MxCommandKind`, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-`Ok` MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves an ArchestrA user id via `AuthenticateUser` (credentials env-overridable through `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER` / `..._PASSWORD`, defaulting to the `admin`/`admin123` GLAuth user from `glauth.md`), issues `WriteSecured` against an advised item, and asserts the reply carries `MxCommandKind.WriteSecured`, the protocol status is one of the documented parity outcomes (`Ok` for an unprotected provider, `MxaccessFailure` when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` opens a session, kills the worker process tree (via a new `TestWorkerProcessFactory.KillAllAndDetach` helper) without going through CloseSession, and polls the session via a new `GatewayServiceFixture.TryGetSession` accessor until it transitions to `SessionState.Faulted` within the live event timeout; asserts the final state is `Faulted`, that `FinalFault` is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). `docs/GatewayTesting.md` was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Every live-test method in the three live classes carries an identical `[Trait("Category", "LiveMxAccess")]` (or `LiveLdap` / `LiveGalaxy`) attribute. The trait is uniform within each class and is exactly the information the `[Collection(LiveResourcesCollection.Name)]` class-level attribute also implies. xUnit's `[Trait]` is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's `LiveLdap` includes five copies of the same line.
|
|
|
|
**Recommendation:** Move each `[Trait("Category", ...)]` to the class declaration alongside the existing `[Collection(...)]`, and remove the per-method copies. Verify the trait still surfaces in `--filter Trait=Category=LiveLdap` after the change.
|
|
|
|
**Resolution:** 2026-05-20 — Lifted `[Trait("Category", "LiveMxAccess")]`, `[Trait("Category", "LiveLdap")]`, and `[Trait("Category", "LiveGalaxy")]` to the class declarations of `WorkerLiveMxAccessSmokeTests`, `DashboardLdapLiveTests`, and `GalaxyRepositoryLiveTests` respectively (alongside the existing `[Collection(LiveResourcesCollection.Name)]`), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so `--filter Category=Live*` filters still match.
|
|
|
|
### IntegrationTests-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26`, `src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The default Galaxy Repository connection string `"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;"` is duplicated verbatim between the production `GalaxyRepositoryOptions.ConnectionString` initializer and the test-side `LiveGalaxyRepositoryFactAttribute.ConnectionString` fallback. The docs (`docs/GatewayTesting.md`) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
|
|
|
|
**Recommendation:** Expose the production default through a `public const string` on `GalaxyRepositoryOptions` (e.g. `DefaultConnectionString`) and have `LiveGalaxyRepositoryFactAttribute.ConnectionString` read `Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString`. Single source of truth, build-time guarantee they cannot drift.
|
|
|
|
**Resolution:** 2026-05-20 — Added `public const string GalaxyRepositoryOptions.DefaultConnectionString` carrying the production default, set the `ConnectionString` initializer to reference it, and changed `LiveGalaxyRepositoryFactAttribute.ConnectionString` to fall back to `GalaxyRepositoryOptions.DefaultConnectionString`. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
|
|
|
|
### IntegrationTests-017
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` proves the subscription is live by waiting for one matching `OnDataChange`, snapshots `dataChangeCountBeforeUnadvise`, then sends `UnAdvise`, waits 500 ms, snapshots `dataChangeCountAfterTeardown`, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:
|
|
|
|
1. After `WaitForMessageAsync` returns the first match, any additional `OnDataChange` for the same `(serverHandle, itemHandle)` published by the provider before the worker processes `UnAdvise` is delivered into the recording writer.
|
|
2. The snapshot at line 362 is taken *immediately before* the `UnAdvise` command is sent (line 370). Events that arrive in the window between that snapshot and the worker processing `UnAdvise` (network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot.
|
|
3. `MXAccess` providers can publish `OnDataChange` at sub-second cadence; the strict-equality assertion has no slack for in-flight events.
|
|
|
|
The test passes today only because the chosen test item (`TestChildObject.TestInt`) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events *after the worker stops the subscription*") would be better expressed by capturing the snapshot after `UnAdvise` returns `Ok` rather than before it is issued.
|
|
|
|
**Recommendation:** Move the "before" snapshot to immediately *after* `UnAdvise` returns `Ok` (the point past which the parity rule applies), or weaken the assertion to "no events with a `WorkerSequence` strictly greater than the last sequence observed within `dataChangeCountBeforeUnadvise + N` events arrived in the post-teardown drain" where `N` accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.
|
|
|
|
**Resolution:** 2026-05-20 — Removed the `dataChangeCountBeforeUnadvise` snapshot taken just before the `UnAdvise` command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken *after* the teardown chain completes. The test now waits one 500 ms settle window for in-flight `OnDataChange` events (which the provider already published before the worker acknowledged `UnAdvise`) to drain, captures `dataChangeCountAfterFirstSettle`, waits another 500 ms, and asserts the count is unchanged in `dataChangeCountAfterSecondSettle`. The parity rule under test ("no further `OnDataChange` after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.
|
|
|
|
### IntegrationTests-018
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session)` declares `session` as a non-nullable `GatewaySession`, but its implementation is `_registry.TryGet(sessionId, out session)`, which (in `SessionRegistry.cs:43`) uses `session!` to silence the nullability warning on `Dictionary.TryGetValue`. On a `false` return the `out` parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as `out GatewaySession? session`, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a `false` return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces `Nullable=enable;TreatWarningsAsErrors=true` (`src/Directory.Build.props`), so the convention is for `TryX` patterns to either annotate the out as `T?` or to mirror BCL `Dictionary.TryGetValue<TKey, TValue>` (which uses `[MaybeNullWhen(false)] out TValue`).
|
|
|
|
**Recommendation:** Change the signature to `public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session)` (mirroring BCL `TryGetValue`) and propagate the same annotation down through `ISessionRegistry.TryGet` / `SessionRegistry.TryGet` so the `session!` fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.
|
|
|
|
**Resolution:** 2026-05-20 — Propagated `[MaybeNullWhen(false)]` through the entire `TryGet*` chain. `GatewayServiceFixture.TryGetSession`, `ISessionManager.TryGetSession` / `SessionManager.TryGetSession`, and `ISessionRegistry.TryGet`/`TryRemove` plus their `SessionRegistry` implementations now carry the BCL `Dictionary.TryGetValue`-style annotation, and the `session!` null-forgiving operator inside `SessionRegistry.TryGet` / `TryRemove` was removed because the annotation makes it redundant. Existing in-tree callers (`SessionManagerTests.cs` line 28) were updated to `out GatewaySession?` to match. The compiler now warns at callsites that read `session` without checking the boolean return, closing the silent contract gap. Verified by `dotnet build src/MxGateway.IntegrationTests/...` (0 warnings), `dotnet build src/MxGateway.Tests/...` (0 warnings), and `dotnet test src/MxGateway.Tests/...` (479 passed).
|
|
|
|
### IntegrationTests-019
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves a credential pair via `ResolveLiveMxAccessSecuredCredentials`, passes the password into `AuthenticateUser` and `WriteSecured`, and asserts the password is absent from `writeSecuredReply.DiagnosticMessage`. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, `WriteSecured` payloads, and `AuthenticateUser` credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:
|
|
|
|
- The `TestOutputLoggerProvider` writes every gateway-side `ILogger` entry to `ITestOutputHelper`. A regression that logged the request body (or the `WorkerCommandRequest` envelope) would put the password into test output without failing this test.
|
|
- `WriteWorkerOutput` echoes worker `stdout`/`stderr` lines to `ITestOutputHelper`. A worker-side regression that printed the credential (e.g. a debug log added to `MxAccessCommandExecutor`) would land in test output without failing this test.
|
|
- `output.WriteLine(...)` calls in the test body (`AuthenticateUser status=... user_id=...` and `LogReply("WriteSecured", ...)`) currently don't include the request body, but a future maintenance change that printed `command.WriteSecured.Value` or a similar struct dump would silently leak the credential past the existing assertion.
|
|
|
|
Because `ITestOutputHelper` doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.
|
|
|
|
**Recommendation:** Replace the bare `ITestOutputHelper` injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a `StringBuilder`. At the end of the test, assert the buffer does not contain `verifyPassword`. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through `GatewayLogRedactor` (`src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs`) so the credential-bearing commands are redacted at the logger sink the test sees.
|
|
|
|
**Resolution:** 2026-05-20 — Added a `RecordingTestOutputHelper` private class that implements `ITestOutputHelper`, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a `StringBuilder` exposed via a `Captured` property. The WriteSecured parity test now constructs this wrapper, passes it to both `TestWorkerProcessFactory` (so worker `stdout`/`stderr` lines flow through it) and `GatewayServiceFixture` (so the `TestOutputLoggerProvider`'s gateway-`ILogger` entries flow through it), and uses it for every direct `WriteLine`. A new `LogReplyTo(ITestOutputHelper sink, …)` static helper underpins the existing `LogReply` instance method so the test body can route reply logging through the recording wrapper. After the cleanup `finally` block completes, the test asserts `recordedOutput.Captured` does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body `WriteLine` — a future regression that dumped the request body, the `WorkerCommandRequest` envelope, or the `WriteSecured` payload would land in the buffer and fail the assertion.
|
|
|
|
### IntegrationTests-020
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` asserts the `FinalFault` description contains at least one of these substrings (case-insensitive): `disconnect`, `pipe`, `heartbeat`, `worker`, `end of stream`. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The `"worker"` substring defeats that intent — the gateway routes through `SetFaulted` with messages like *"Worker pipe disconnected."*, *"Worker shutdown timed out."*, *"Worker was killed by the gateway: …"*, *"Worker heartbeat expired. …"*, *"Worker event channel rejected an event."*, *"Worker pipe write failed."*, *"Worker read loop failed."*, *"Worker sent unexpected envelope body …"* — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word *Worker* would still pass this test.
|
|
|
|
CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (`SessionState.Faulted` with an actionable cause), so the test's documented value is verifying *which* of the WorkerClient error codes drove the transition. Today the assertion is effectively `Assert.NotEmpty(observedFault)`.
|
|
|
|
**Recommendation:** Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — `PipeDisconnected` ("pipe", "disconnect") and `EndOfStream` ("end of stream"). Drop the broad `"worker"` term, and drop `"heartbeat"` unless the test deliberately covers the heartbeat path too (it does not — `HeartbeatGraceSeconds = 15` and the poll deadline is `StreamShutdownTimeout = 10` seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert `FinalFault.StartsWith("Worker pipe disconnected")` against the message constant in `WorkerClient.cs:380` so a rename surfaces as a compile-time / test-time failure.
|
|
|
|
**Resolution:** 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the `FinalFault` description to contain `"pipe disconnected"` (matching the `WorkerClient.cs:378-381` `WorkerFrameProtocolErrorCode.EndOfStream` → `WorkerClientErrorCode.PipeDisconnected` → `"Worker pipe disconnected."` message) or `"end of stream"`, dropping the broad `"worker"` term that previously matched every `WorkerClient` fault message (all of which begin with "Worker"), and dropping `"heartbeat"` because the test's `StreamShutdownTimeout` (10 s) is below `HeartbeatGraceSeconds` (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.
|
|
|
|
### IntegrationTests-021
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The abnormal-exit test only polls `session.State` and `session.FinalFault`. It does not assert anything about `streamTask` after the kill. The chain that puts the session into `Faulted` is: the read loop hits EOS → `SetFaulted(PipeDisconnected, …)` → `_events.Writer.TryComplete(fault)` → `ReadEventsAsync` propagates the `WorkerClientException` → `EventStreamService.ProduceEventsAsync`'s `catch (Exception exception) when exception is WorkerClientException` calls `session.MarkFaulted(exception.Message)`. The test verifies the *end state* of that chain but not that the `StreamEvents` call is what produced the transition. If a future change moved the `MarkFaulted` call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that `streamTask.IsFaulted` (or that awaiting it throws a `WorkerClientException`) would protect that contract.
|
|
|
|
The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (`Assert.False(streamTask.IsFaulted, …)`). The abnormal-exit test should add the inverse: the stream task *must* be faulted (or at least completed with a `WorkerClientException`) after the kill.
|
|
|
|
**Recommendation:** After the session-state poll succeeds, assert `streamTask.IsCompleted` (the channel has terminated) and inspect `streamTask.Exception?.InnerException` for a `WorkerClientException` (or assert `streamTask.IsFaulted` and await with `ShouldThrowAsync`). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing `Assert.False(streamTask.IsFaulted, …)` on line 217 — the abnormal-exit case wants the opposite assertion.
|
|
|
|
**Resolution:** 2026-05-20 — After the session-state poll loop confirms `SessionState.Faulted`, the test now awaits `streamTask.WaitAsync(StreamShutdownTimeout)` (with a try/catch that logs the surfaced exception type/message), then asserts `streamTask.IsCompleted` and `streamTask.IsFaulted`. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → `WorkerClient.SetFaulted(PipeDisconnected, …)` → `ReadEventsAsync` propagates the fault → `EventStreamService` calls `session.MarkFaulted` → `MxAccessGatewayService.StreamEvents` re-throws the mapped `RpcException`. A future refactor that moved `MarkFaulted` off the stream-consumption path would leave `streamTask` completing cleanly, which the new `IsFaulted` assertion would now catch (inverse of the existing `Assert.False(streamTask.IsFaulted, …)` in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps `WorkerClientException` to `RpcException` at the public boundary (`MxAccessGatewayService.MapWorkerClientException`); asserting on the surface type alone would be brittle, while the `IsFaulted` check directly tests the contract the recommendation is protecting.
|
|
|
|
### IntegrationTests-022
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The walker introduced in `dc9c0c9` searches parents for a directory containing `src/` plus either `.git` or a `*.slnx`/`*.sln` file, falling back to `Directory.GetCurrentDirectory()` when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a `.git` or `.slnx` marker silently uses the wrong root and then the live `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current `bin/` layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.
|
|
|
|
**Recommendation:** Throw a clear `InvalidOperationException` from `ResolveRepositoryRoot` when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (`MXGATEWAY_LIVE_MXACCESS_WORKER_EXE`) remains the escape hatch for unusual deployments.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Replaced the silent `Directory.GetCurrentDirectory()` fallback in `IntegrationTestEnvironment.ResolveRepositoryRoot` with an `InvalidOperationException` whose message names the start directory, the expected markers (`src/`, `.git`, `*.sln`, `*.slnx`), and the `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` escape hatch. Added regression test `IntegrationTestEnvironmentTests.ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers` that creates an isolated temp directory under `Path.GetTempPath()` containing no markers, asserts the throw, and asserts the message carries the start directory, `.git`, `.sln`, and the worker-exe env-var name so a future refactor that dropped any of these from the diagnostic would fail loudly. TDD red/green confirmed: the new test fails against the pre-fix walker (`Assert.Throws() Failure: No exception was thrown`) and passes after the throw is added. The existing `ResolveRepositoryRoot_AcceptsGitWorktreeFile` test still passes, so the happy path is unaffected.
|
|
|
|
### IntegrationTests-023
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the principal carries an `LdapGroupClaimType` claim containing `GwAdmin` (line 26-28). After the `27ed651` refactor, `DashboardAuthenticator.CreatePrincipal` also emits a `ClaimTypes.Role` claim with the mapped role (`Admin` for the seeded `GwAdmin` → `Admin` mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting `Role: Admin` (e.g. a future refactor of `MapGroupsToRoles` that returned an empty list) would silently pass.
|
|
|
|
**Recommendation:** Add an `Assert.Contains` asserting the principal holds a `ClaimTypes.Role` claim with value `DashboardRoles.Admin` (or `"Admin"`). Mirror the style of the existing `LdapGroupClaimType` assertion.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Added an `Assert.Contains` to `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` that mirrors the existing `LdapGroupClaimType` assertion style, asserting the principal carries a `ClaimTypes.Role` claim with value `DashboardRoles.Admin`. Couples the live test to the `GwAdmin` → `Admin` mapping wired through `CreateAuthenticator`'s `GroupToRole` dictionary, so a future regression to `MapGroupsToRoles` (returning an empty list, dropping the RDN-fallback lookup, breaking the case-insensitive comparer) would now surface as a failed live test rather than a silent pass. The test is gated by `MXGATEWAY_RUN_LIVE_LDAP_TESTS`; build is green and the test continues to skip cleanly when the env var is unset.
|
|
|
|
### IntegrationTests-024
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The inline `NullDashboardEventBroadcaster` private class is the third copy in the repository (the other two live in `EventStreamServiceTests` and `GatewayEndToEndFakeWorkerSmokeTests` under the Tests module — see Tests-025). Each carries a singleton `Instance` field and a no-op `Publish`. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.
|
|
|
|
**Recommendation:** When Tests-025 lands the shared `TestSupport/NullDashboardEventBroadcaster.cs`, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Chose option (b) (extract within IntegrationTests) so the unit-test and integration-test projects stay independently buildable without a shared test-helpers project. Moved the inline private class out of `WorkerLiveMxAccessSmokeTests.cs` into a new public type at `src/ZB.MOM.WW.MxGateway.IntegrationTests/TestSupport/NullDashboardEventBroadcaster.cs` (namespace `ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport`), mirroring the layout the unit-test Tests project established for Tests-007/Tests-021. The fixture wires through the same `NullDashboardEventBroadcaster.Instance` singleton, but the constructor is now private so future integration tests can rely on a single shared no-op. A new `using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport;` in `WorkerLiveMxAccessSmokeTests.cs` is the only change to its `EventStreamService` wiring. Build is green (0 warnings, 0 errors); no regression in the existing tests (4 passed / 15 skipped, identical to the pre-change baseline).
|