Files
mxaccessgw/code-reviews/IntegrationTests/findings.md
T
Joseph Doherty bc55396334 Resolve IntegrationTests-001 and IntegrationTests-002 code-review findings
IntegrationTests-001: documented the live Galaxy Repository test suite and
its MXGATEWAY_RUN_LIVE_GALAXY_TESTS / MXGATEWAY_LIVE_GALAXY_CONN gating in
docs/GatewayTesting.md.

IntegrationTests-002: documented the live LDAP test suite in
docs/GatewayTesting.md and added a concrete "Provisioning the GwAdmin group"
step to glauth.md so DashboardLdapLiveTests' GwAdmin-membership assumption
is reproducible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:09 -04:00

178 lines
12 KiB
Markdown

# Code Review — IntegrationTests
| Field | Value |
|---|---|
| Module | `src/MxGateway.IntegrationTests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 8 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForFirstMessageAsync` ignores cancellation). |
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
## Findings
### IntegrationTests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| Status | Resolved |
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
### IntegrationTests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Status | Resolved |
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
### IntegrationTests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
| Status | Open |
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
**Resolution:** _(open)_
### IntegrationTests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
| Status | Open |
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
**Resolution:** _(open)_
### IntegrationTests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
| Status | Open |
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
**Resolution:** _(open)_
### IntegrationTests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
| Status | Open |
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
**Resolution:** _(open)_
### IntegrationTests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| Status | Open |
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
**Resolution:** _(open)_
### IntegrationTests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| Status | Open |
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
**Resolution:** _(open)_
### IntegrationTests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
| Status | Open |
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
**Resolution:** _(open)_
### IntegrationTests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
| Status | Open |
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
**Resolution:** _(open)_