a0203503a7
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
290 lines
34 KiB
Markdown
290 lines
34 KiB
Markdown
# Code Review — IntegrationTests
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/MxGateway.IntegrationTests` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-20 |
|
|
| Commit reviewed | `1cd51bb` |
|
|
| Status | Reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
### 2026-05-20 review (commit `1cd51bb`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a `StreamEvents` task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
|
|
| 2 | mxaccessgw conventions | Live opt-ins, `[Collection]` serialization, and the "don't synthesize events" rule are honored. No issues found. |
|
|
| 3 | Concurrency & thread safety | `LiveResourcesCollection` serializes all three live classes; `RecordingServerStreamWriter` locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
|
|
| 4 | Error handling & resilience | `ShutDownAsync` already isolates cleanup exceptions per category. No issues found. |
|
|
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
|
|
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (`RecordingServerStreamWriter.messageArrived` `SemaphoreSlim` is never disposed; the type owns an `IDisposable` field but is not itself disposable). |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 (`[Trait("Category", ...)]` repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between `LiveGalaxyRepositoryFactAttribute` and `GalaxyRepositoryOptions`). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-014 (`Unadvise`, `RemoveItem`, `Unregister`, `WriteSecured` ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-`Ok` MXAccess failure as `ProtocolStatusCode.Ok`, contradicting both the assertion and `HResultConverter`). |
|
|
|
|
### 2026-05-18 review (commit `6c64030`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForMessageAsync` ignores cancellation). |
|
|
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
|
|
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
|
|
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
|
|
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
|
|
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
|
|
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
|
|
|
|
## Findings
|
|
|
|
### IntegrationTests-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
|
|
|
|
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
|
|
|
|
### IntegrationTests-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
|
|
|
|
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
|
|
|
|
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
|
|
|
|
### IntegrationTests-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
|
|
|
|
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
|
|
|
|
### IntegrationTests-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
|
|
|
|
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
|
|
|
|
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
|
|
|
|
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
|
|
|
|
### IntegrationTests-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
|
|
|
|
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — no `[Collection]` or assembly-level `CollectionBehavior` existed. Added `LiveResourcesCollection.cs` with a `[CollectionDefinition(Name, DisableParallelization = true)]` and applied `[Collection(LiveResourcesCollection.Name)]` to `WorkerLiveMxAccessSmokeTests`, `GalaxyRepositoryLiveTests`, and `DashboardLdapLiveTests`. A named collection (rather than an assembly-wide `DisableTestParallelization`) was chosen so the live classes serialize against each other and within themselves while non-live tests (`IntegrationTestEnvironmentTests`) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
|
|
|
|
### IntegrationTests-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
|
|
|
|
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — `LiveLdapFactAttribute.Enabled` and `LiveGalaxyRepositoryFactAttribute.Enabled` each inlined the ordinal `== "1"` comparison while `LiveMxAccessFactAttribute` delegated to `IntegrationTestEnvironment`. Added `IntegrationTestEnvironment.IsEnabled(string variableName)` as the single implementation; `LiveMxAccessTestsEnabled`, `LiveLdapFactAttribute.Enabled`, and `LiveGalaxyRepositoryFactAttribute.Enabled` now all call it. Verified by build.
|
|
|
|
### IntegrationTests-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
|
|
|
|
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal `ServerCallContext` stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
|
|
|
|
### IntegrationTests-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
|
|
|
|
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
|
|
|
|
**Re-triage:** The named method `WaitForFirstMessageAsync` no longer exists — IntegrationTests-003's resolution renamed/replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages and blocks on a `SemaphoreSlim`. The underlying defect still held: that replacement method also took only a `timeout` and never observed a `CancellationToken`. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's `firstMessage.Task.WaitAsync` detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added an optional `CancellationToken` parameter to `WaitForMessageAsync`, linked with the existing timeout source via `CancellationTokenSource.CreateLinkedTokenSource`, so a per-test cancellation aborts the wait promptly. `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now creates a `CancellationTokenSource`, passes its token into the `StreamEvents` `TestServerCallContext` and into `WaitForMessageAsync`, so the stream call and the wait share one cancellation source. On timeout the method already throws a `TimeoutException` whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate `output.WriteLine`). Verified by build; live tests not executed.
|
|
|
|
### IntegrationTests-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts `Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code)`. `HResultConverter.CreateProtocolStatus` (`src/MxGateway.Worker/Conversion/HResultConverter.cs:39`) actually sets `Code = ProtocolStatusCode.MxaccessFailure` whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is *not* `Ok` on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting `Ok`. A reader can reasonably conclude the gateway always reports `Ok` for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
|
|
|
|
**Recommendation:** Reword the invalid-handle comment to "the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero `Hresult` carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as `MxaccessFailure` (per `HResultConverter`), not as a `RpcException`.
|
|
|
|
**Resolution:** 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero hresult carrying the COM failure (per `HResultConverter`), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as `MxaccessFailure`, never as an `RpcException`.
|
|
|
|
### IntegrationTests-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` constructs a `RecordingServerStreamWriter<MxEvent>` and starts a `StreamEvents` task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the `Write` command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an `OnWriteComplete`, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with `new TestServerCallContext()` (no cancellation source), any fault raised by the stream task (an exception from `EventStreamService`, a session-not-found, a backpressure overflow) is swallowed — `streamTask` is later awaited in `ShutDownAsync` only inside a broad `catch (Exception ex)`, which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (`eventWriter`) is dead code in this test that suggests an assertion was intended.
|
|
|
|
**Recommendation:** Either remove the unused `eventWriter`/`StreamEvents` plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an `OnWriteComplete` event for the written item via `eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...)`, matching the round-trip proof used by `scripts/run-client-e2e-tests.ps1 -VerifyWrite`.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Rewrote `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` so the previously-dead `eventWriter`/`StreamEvents` plumbing actually drives an assertion. The test now waits for an `OnWriteComplete` event matching the Write's (server, item) handle pair via `eventWriter.WaitForMessageAsync` (using `IntegrationTestEnvironment.LiveMxAccessEventTimeout`), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a `CancellationTokenSource` and the test asserts `streamTask.IsFaulted == false` before cleanup. `ShutDownAsync` gained an opt-in `propagateStreamFaults` flag so a faulted `StreamEvents` task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so `StreamEvents` observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RecordingServerStreamWriter<T>` owns a `SemaphoreSlim messageArrived` (`IDisposable`) but does not itself implement `IDisposable`, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an `IDisposable` field — the standard hygiene under `Directory.Build.props`'s `TreatWarningsAsErrors=true` is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself `IDisposable`; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
|
|
|
|
**Recommendation:** Make `RecordingServerStreamWriter<T>` implement `IDisposable`, dispose `messageArrived` in `Dispose`, and wrap each instantiation in a `using` block (`using RecordingServerStreamWriter<MxEvent> eventWriter = new();`).
|
|
|
|
**Resolution:** 2026-05-20 — `RecordingServerStreamWriter<T>` now implements `IDisposable` and its `Dispose` releases the `messageArrived` semaphore. All six live tests in `WorkerLiveMxAccessSmokeTests` now allocate the writer with a top-of-method `using` declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
|
|
|
|
### IntegrationTests-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** IntegrationTests-005 was resolved by adding live coverage for `Write` and an invalid-handle `AddItem`, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, and `WriteSecured` ordering. CLAUDE.md singles out `WriteSecured` ("`WriteSecured` failing before a value-bearing NMX body") and `OperationComplete` semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
|
|
|
|
**Recommendation:** Add live MXAccess coverage for the teardown chain (`Unadvise` then `RemoveItem` then `Unregister`, asserting each replies with `ProtocolStatusCode.Ok` and the next operation no longer references the freed handle), and at minimum one `WriteSecured` parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Added three new `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`, all reusing the existing opt-in env var and `ShutDownAsync` cleanup helper. (1) `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies `Ok` with the matching `MxCommandKind`, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-`Ok` MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves an ArchestrA user id via `AuthenticateUser` (credentials env-overridable through `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER` / `..._PASSWORD`, defaulting to the `admin`/`admin123` GLAuth user from `glauth.md`), issues `WriteSecured` against an advised item, and asserts the reply carries `MxCommandKind.WriteSecured`, the protocol status is one of the documented parity outcomes (`Ok` for an unprotected provider, `MxaccessFailure` when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` opens a session, kills the worker process tree (via a new `TestWorkerProcessFactory.KillAllAndDetach` helper) without going through CloseSession, and polls the session via a new `GatewayServiceFixture.TryGetSession` accessor until it transitions to `SessionState.Faulted` within the live event timeout; asserts the final state is `Faulted`, that `FinalFault` is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). `docs/GatewayTesting.md` was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Every live-test method in the three live classes carries an identical `[Trait("Category", "LiveMxAccess")]` (or `LiveLdap` / `LiveGalaxy`) attribute. The trait is uniform within each class and is exactly the information the `[Collection(LiveResourcesCollection.Name)]` class-level attribute also implies. xUnit's `[Trait]` is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's `LiveLdap` includes five copies of the same line.
|
|
|
|
**Recommendation:** Move each `[Trait("Category", ...)]` to the class declaration alongside the existing `[Collection(...)]`, and remove the per-method copies. Verify the trait still surfaces in `--filter Trait=Category=LiveLdap` after the change.
|
|
|
|
**Resolution:** 2026-05-20 — Lifted `[Trait("Category", "LiveMxAccess")]`, `[Trait("Category", "LiveLdap")]`, and `[Trait("Category", "LiveGalaxy")]` to the class declarations of `WorkerLiveMxAccessSmokeTests`, `DashboardLdapLiveTests`, and `GalaxyRepositoryLiveTests` respectively (alongside the existing `[Collection(LiveResourcesCollection.Name)]`), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so `--filter Category=Live*` filters still match.
|
|
|
|
### IntegrationTests-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26`, `src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The default Galaxy Repository connection string `"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;"` is duplicated verbatim between the production `GalaxyRepositoryOptions.ConnectionString` initializer and the test-side `LiveGalaxyRepositoryFactAttribute.ConnectionString` fallback. The docs (`docs/GatewayTesting.md`) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
|
|
|
|
**Recommendation:** Expose the production default through a `public const string` on `GalaxyRepositoryOptions` (e.g. `DefaultConnectionString`) and have `LiveGalaxyRepositoryFactAttribute.ConnectionString` read `Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString`. Single source of truth, build-time guarantee they cannot drift.
|
|
|
|
**Resolution:** 2026-05-20 — Added `public const string GalaxyRepositoryOptions.DefaultConnectionString` carrying the production default, set the `ConnectionString` initializer to reference it, and changed `LiveGalaxyRepositoryFactAttribute.ConnectionString` to fall back to `GalaxyRepositoryOptions.DefaultConnectionString`. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
|