d2d2e5f68f
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).
Module status
| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server | 8 | 43 | +6 |
| Contracts | 2 | 17 | +2 |
| Tests | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java | 5 | 31 | +5 |
| Client.Rust | 1 | 21 | +1 |
| Worker | 0 | 25 | 0 (rename-only diff, clean) |
| Worker.Tests | 0 | 30 | 0 (rename-only diff, clean) |
| Client.Dotnet | 0 | 17 | 0 (rename + alarm-fix diff, clean) |
| Client.Python | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
| Client.Go | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.
New findings
* Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
any session id from any Viewer; no per-session ACL guards the
EventsHub group fan-out.
* Server-039 (Low / Error handling) — HubTokenService.Validate
accepts a payload with null Name/NameIdentifier.
* Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
full-vs-RDN lookup precedence.
* Server-041 (Low / Design adherence) — EventStreamService calls
IDashboardEventBroadcaster.Publish without a try/catch — fragile
seam relying on the never-throw contract.
* Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
* Server-043 (Low / Documentation) — HubTokenService singleton
sharing across login + hub-token validation undocumented.
* Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
reserved-for-future-use ambiguity.
* Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
omits the alarm_filter_prefix filter description.
* Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
* Tests-026 (Medium / Testing coverage) — no test proves
EventStreamService actually calls IDashboardEventBroadcaster.Publish.
* IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
silent fallback to Directory.GetCurrentDirectory().
* IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
success-path asserts ldap_group but not the Role claim.
* IntegrationTests-024 (Low / Conventions) — inline
NullDashboardEventBroadcaster fake duplicates Tests-side copies.
* Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
Gradle task names still use the old short project names.
* Client.Java-028 (Medium / Design adherence) — JavaClientDesign
build-layout shows the old `com/dohertylan/mxgateway/` package paths.
* Client.Java-029 (Low / Documentation) — README installDist path
cites the wrong directory.
* Client.Java-030 (Low / Testing coverage) — no Java test exercises
the regenerated QueryActiveAlarmsRequest RPC.
* Client.Java-031 (Low / Conventions) — README prose uses old short
project names instead of canonical prefixed ones.
* Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
"Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
that does not exist; actual layout is the flat top-level crate.
Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.
Process notes
- The `code-reviews/` tree was not in this working copy's git
history (the local extract pre-dates the divergent branch that
carried the reviews). Restored from `dd7ca16` via
`git checkout dd7ca16 -- code-reviews/` before the re-review.
- Some "Resolved" entries in the restored findings.md reference
fixes that landed on the divergent branch (the same one that
carried the reviews) and are not present on the current main
lineage. The re-review treats those statuses as historical;
the new pass only files findings against HEAD's actual state.
- `python code-reviews/regen-readme.py --check` is green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
464 lines
57 KiB
Markdown
464 lines
57 KiB
Markdown
# Code Review — IntegrationTests
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/ZB.MOM.WW.MxGateway.IntegrationTests` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-24 |
|
|
| Commit reviewed | `d692232` |
|
|
| Status | Reviewed |
|
|
| Open findings | 3 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
### 2026-05-20 re-review (commit `a020350`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring `"worker"`, which matches almost any plausible fault message and dilutes the check to "the description is non-empty"). |
|
|
| 2 | mxaccessgw conventions | No issues found. The five new tests honor live opt-in gating, `[Collection]` serialization, "no synthesized events", and the credential-redaction contract for the assertions they make. |
|
|
| 3 | Concurrency & thread safety | No issues found. `GatewaySession.State`/`FinalFault` access in the abnormal-exit poll loop goes through `_syncRoot`; `RecordingServerStreamWriter.Messages` returns a locked snapshot copy. |
|
|
| 4 | Error handling & resilience | No issues found. `ShutDownAsync`'s opt-in `propagateStreamFaults` correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path. |
|
|
| 5 | Security | Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from `DiagnosticMessage` only; it does not assert the credential is absent from the accumulated test output, where the worker `stderr`/`stdout` and the gateway log are echoed). |
|
|
| 6 | Performance & resource management | No issues found. All six `RecordingServerStreamWriter<MxEvent>` instantiations use `using` declarations; `using CancellationTokenSource` is the consistent pattern. |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` documents all five new parity surfaces and the two new env-var defaults (`MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER`/`_PASSWORD`). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-018 (`GatewayServiceFixture.TryGetSession` declares `out GatewaySession session` non-nullable while the caller binds it as `out GatewaySession? session`; the null-forgiving operator inside `SessionRegistry.TryGet` propagates a misleading non-null annotation). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active `StreamEvents` task observed the worker fault; relies entirely on the session-state poll and would silently pass if `MarkFaulted` were ever moved off the stream-consumption path). |
|
|
| 10 | Documentation & comments | No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases). |
|
|
|
|
### 2026-05-20 review (commit `1cd51bb`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a `StreamEvents` task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
|
|
| 2 | mxaccessgw conventions | Live opt-ins, `[Collection]` serialization, and the "don't synthesize events" rule are honored. No issues found. |
|
|
| 3 | Concurrency & thread safety | `LiveResourcesCollection` serializes all three live classes; `RecordingServerStreamWriter` locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
|
|
| 4 | Error handling & resilience | `ShutDownAsync` already isolates cleanup exceptions per category. No issues found. |
|
|
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
|
|
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (`RecordingServerStreamWriter.messageArrived` `SemaphoreSlim` is never disposed; the type owns an `IDisposable` field but is not itself disposable). |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 (`[Trait("Category", ...)]` repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between `LiveGalaxyRepositoryFactAttribute` and `GalaxyRepositoryOptions`). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-014 (`Unadvise`, `RemoveItem`, `Unregister`, `WriteSecured` ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-`Ok` MXAccess failure as `ProtocolStatusCode.Ok`, contradicting both the assertion and `HResultConverter`). |
|
|
|
|
### 2026-05-18 review (commit `6c64030`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForMessageAsync` ignores cancellation). |
|
|
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
|
|
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
|
|
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
|
|
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
|
|
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
|
|
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
|
|
|
|
### 2026-05-24 review (commit d692232)
|
|
|
|
Re-review pass at `d692232` scoped to: the rename (`dc9c0c9`) of the
|
|
namespaces, csproj, and the `IntegrationTestEnvironment.ResolveRepositoryRoot`
|
|
walker; the `DashboardLdapLiveTests` update to inject a `GroupToRole`
|
|
mapping in place of the dropped `LdapOptions.RequiredGroup` default
|
|
(`27ed651`); and the inline `NullDashboardEventBroadcaster` fake added
|
|
to `WorkerLiveMxAccessSmokeTests` for the new `EventStreamService` ctor
|
|
parameter (`d692232`).
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
|
|
| 2 | mxaccessgw conventions | No issues found — namespaces, env-var names, and live-test attribute discipline preserved. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | Confirmed: the `RpcException(StatusCode.Cancelled)` sibling catch added in `WorkerLiveMxAccessSmokeTests.ShutDownAsync` covers the only site where a gRPC-mapped `OperationCanceledException` can surface — no other call sites need widening. |
|
|
| 5 | Security | No issues found. |
|
|
| 6 | Performance & resource management | No issues found. |
|
|
| 7 | Design-document adherence | No issues found — `docs/GatewayTesting.md` was updated in the same wave to describe the GroupToRole-fixture pattern. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-022 (`ResolveRepositoryRoot` silently falls back to `Directory.GetCurrentDirectory()` on exhausted walk, which can hide misconfiguration), IntegrationTests-024 (inline `NullDashboardEventBroadcaster` fake is the only such use today but worth watching for duplication once a second consumer arrives). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-023 (`DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the `ldap_group` claim but does not assert the emitted `Role: Admin` claim, leaving the role-mapping path untested). |
|
|
| 10 | Documentation & comments | No issues found. |
|
|
|
|
## Findings
|
|
|
|
### IntegrationTests-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
|
|
|
|
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
|
|
|
|
### IntegrationTests-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
|
|
|
|
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
|
|
|
|
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
|
|
|
|
### IntegrationTests-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
|
|
|
|
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
|
|
|
|
### IntegrationTests-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
|
|
|
|
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
|
|
|
|
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
|
|
|
|
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
|
|
|
|
### IntegrationTests-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
|
|
|
|
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — no `[Collection]` or assembly-level `CollectionBehavior` existed. Added `LiveResourcesCollection.cs` with a `[CollectionDefinition(Name, DisableParallelization = true)]` and applied `[Collection(LiveResourcesCollection.Name)]` to `WorkerLiveMxAccessSmokeTests`, `GalaxyRepositoryLiveTests`, and `DashboardLdapLiveTests`. A named collection (rather than an assembly-wide `DisableTestParallelization`) was chosen so the live classes serialize against each other and within themselves while non-live tests (`IntegrationTestEnvironmentTests`) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
|
|
|
|
### IntegrationTests-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
|
|
|
|
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — `LiveLdapFactAttribute.Enabled` and `LiveGalaxyRepositoryFactAttribute.Enabled` each inlined the ordinal `== "1"` comparison while `LiveMxAccessFactAttribute` delegated to `IntegrationTestEnvironment`. Added `IntegrationTestEnvironment.IsEnabled(string variableName)` as the single implementation; `LiveMxAccessTestsEnabled`, `LiveLdapFactAttribute.Enabled`, and `LiveGalaxyRepositoryFactAttribute.Enabled` now all call it. Verified by build.
|
|
|
|
### IntegrationTests-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
|
|
|
|
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal `ServerCallContext` stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
|
|
|
|
### IntegrationTests-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
|
|
|
|
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
|
|
|
|
**Re-triage:** The named method `WaitForFirstMessageAsync` no longer exists — IntegrationTests-003's resolution renamed/replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages and blocks on a `SemaphoreSlim`. The underlying defect still held: that replacement method also took only a `timeout` and never observed a `CancellationToken`. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's `firstMessage.Task.WaitAsync` detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added an optional `CancellationToken` parameter to `WaitForMessageAsync`, linked with the existing timeout source via `CancellationTokenSource.CreateLinkedTokenSource`, so a per-test cancellation aborts the wait promptly. `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now creates a `CancellationTokenSource`, passes its token into the `StreamEvents` `TestServerCallContext` and into `WaitForMessageAsync`, so the stream call and the wait share one cancellation source. On timeout the method already throws a `TimeoutException` whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate `output.WriteLine`). Verified by build; live tests not executed.
|
|
|
|
### IntegrationTests-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts `Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code)`. `HResultConverter.CreateProtocolStatus` (`src/MxGateway.Worker/Conversion/HResultConverter.cs:39`) actually sets `Code = ProtocolStatusCode.MxaccessFailure` whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is *not* `Ok` on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting `Ok`. A reader can reasonably conclude the gateway always reports `Ok` for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
|
|
|
|
**Recommendation:** Reword the invalid-handle comment to "the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero `Hresult` carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as `MxaccessFailure` (per `HResultConverter`), not as a `RpcException`.
|
|
|
|
**Resolution:** 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero hresult carrying the COM failure (per `HResultConverter`), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as `MxaccessFailure`, never as an `RpcException`.
|
|
|
|
### IntegrationTests-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` constructs a `RecordingServerStreamWriter<MxEvent>` and starts a `StreamEvents` task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the `Write` command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an `OnWriteComplete`, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with `new TestServerCallContext()` (no cancellation source), any fault raised by the stream task (an exception from `EventStreamService`, a session-not-found, a backpressure overflow) is swallowed — `streamTask` is later awaited in `ShutDownAsync` only inside a broad `catch (Exception ex)`, which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (`eventWriter`) is dead code in this test that suggests an assertion was intended.
|
|
|
|
**Recommendation:** Either remove the unused `eventWriter`/`StreamEvents` plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an `OnWriteComplete` event for the written item via `eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...)`, matching the round-trip proof used by `scripts/run-client-e2e-tests.ps1 -VerifyWrite`.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Rewrote `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` so the previously-dead `eventWriter`/`StreamEvents` plumbing actually drives an assertion. The test now waits for an `OnWriteComplete` event matching the Write's (server, item) handle pair via `eventWriter.WaitForMessageAsync` (using `IntegrationTestEnvironment.LiveMxAccessEventTimeout`), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a `CancellationTokenSource` and the test asserts `streamTask.IsFaulted == false` before cleanup. `ShutDownAsync` gained an opt-in `propagateStreamFaults` flag so a faulted `StreamEvents` task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so `StreamEvents` observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RecordingServerStreamWriter<T>` owns a `SemaphoreSlim messageArrived` (`IDisposable`) but does not itself implement `IDisposable`, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an `IDisposable` field — the standard hygiene under `Directory.Build.props`'s `TreatWarningsAsErrors=true` is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself `IDisposable`; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
|
|
|
|
**Recommendation:** Make `RecordingServerStreamWriter<T>` implement `IDisposable`, dispose `messageArrived` in `Dispose`, and wrap each instantiation in a `using` block (`using RecordingServerStreamWriter<MxEvent> eventWriter = new();`).
|
|
|
|
**Resolution:** 2026-05-20 — `RecordingServerStreamWriter<T>` now implements `IDisposable` and its `Dispose` releases the `messageArrived` semaphore. All six live tests in `WorkerLiveMxAccessSmokeTests` now allocate the writer with a top-of-method `using` declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
|
|
|
|
### IntegrationTests-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** IntegrationTests-005 was resolved by adding live coverage for `Write` and an invalid-handle `AddItem`, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, and `WriteSecured` ordering. CLAUDE.md singles out `WriteSecured` ("`WriteSecured` failing before a value-bearing NMX body") and `OperationComplete` semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
|
|
|
|
**Recommendation:** Add live MXAccess coverage for the teardown chain (`Unadvise` then `RemoveItem` then `Unregister`, asserting each replies with `ProtocolStatusCode.Ok` and the next operation no longer references the freed handle), and at minimum one `WriteSecured` parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Added three new `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`, all reusing the existing opt-in env var and `ShutDownAsync` cleanup helper. (1) `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies `Ok` with the matching `MxCommandKind`, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-`Ok` MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves an ArchestrA user id via `AuthenticateUser` (credentials env-overridable through `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER` / `..._PASSWORD`, defaulting to the `admin`/`admin123` GLAuth user from `glauth.md`), issues `WriteSecured` against an advised item, and asserts the reply carries `MxCommandKind.WriteSecured`, the protocol status is one of the documented parity outcomes (`Ok` for an unprotected provider, `MxaccessFailure` when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` opens a session, kills the worker process tree (via a new `TestWorkerProcessFactory.KillAllAndDetach` helper) without going through CloseSession, and polls the session via a new `GatewayServiceFixture.TryGetSession` accessor until it transitions to `SessionState.Faulted` within the live event timeout; asserts the final state is `Faulted`, that `FinalFault` is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). `docs/GatewayTesting.md` was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Every live-test method in the three live classes carries an identical `[Trait("Category", "LiveMxAccess")]` (or `LiveLdap` / `LiveGalaxy`) attribute. The trait is uniform within each class and is exactly the information the `[Collection(LiveResourcesCollection.Name)]` class-level attribute also implies. xUnit's `[Trait]` is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's `LiveLdap` includes five copies of the same line.
|
|
|
|
**Recommendation:** Move each `[Trait("Category", ...)]` to the class declaration alongside the existing `[Collection(...)]`, and remove the per-method copies. Verify the trait still surfaces in `--filter Trait=Category=LiveLdap` after the change.
|
|
|
|
**Resolution:** 2026-05-20 — Lifted `[Trait("Category", "LiveMxAccess")]`, `[Trait("Category", "LiveLdap")]`, and `[Trait("Category", "LiveGalaxy")]` to the class declarations of `WorkerLiveMxAccessSmokeTests`, `DashboardLdapLiveTests`, and `GalaxyRepositoryLiveTests` respectively (alongside the existing `[Collection(LiveResourcesCollection.Name)]`), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so `--filter Category=Live*` filters still match.
|
|
|
|
### IntegrationTests-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26`, `src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The default Galaxy Repository connection string `"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;"` is duplicated verbatim between the production `GalaxyRepositoryOptions.ConnectionString` initializer and the test-side `LiveGalaxyRepositoryFactAttribute.ConnectionString` fallback. The docs (`docs/GatewayTesting.md`) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
|
|
|
|
**Recommendation:** Expose the production default through a `public const string` on `GalaxyRepositoryOptions` (e.g. `DefaultConnectionString`) and have `LiveGalaxyRepositoryFactAttribute.ConnectionString` read `Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString`. Single source of truth, build-time guarantee they cannot drift.
|
|
|
|
**Resolution:** 2026-05-20 — Added `public const string GalaxyRepositoryOptions.DefaultConnectionString` carrying the production default, set the `ConnectionString` initializer to reference it, and changed `LiveGalaxyRepositoryFactAttribute.ConnectionString` to fall back to `GalaxyRepositoryOptions.DefaultConnectionString`. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
|
|
|
|
### IntegrationTests-017
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` proves the subscription is live by waiting for one matching `OnDataChange`, snapshots `dataChangeCountBeforeUnadvise`, then sends `UnAdvise`, waits 500 ms, snapshots `dataChangeCountAfterTeardown`, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:
|
|
|
|
1. After `WaitForMessageAsync` returns the first match, any additional `OnDataChange` for the same `(serverHandle, itemHandle)` published by the provider before the worker processes `UnAdvise` is delivered into the recording writer.
|
|
2. The snapshot at line 362 is taken *immediately before* the `UnAdvise` command is sent (line 370). Events that arrive in the window between that snapshot and the worker processing `UnAdvise` (network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot.
|
|
3. `MXAccess` providers can publish `OnDataChange` at sub-second cadence; the strict-equality assertion has no slack for in-flight events.
|
|
|
|
The test passes today only because the chosen test item (`TestChildObject.TestInt`) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events *after the worker stops the subscription*") would be better expressed by capturing the snapshot after `UnAdvise` returns `Ok` rather than before it is issued.
|
|
|
|
**Recommendation:** Move the "before" snapshot to immediately *after* `UnAdvise` returns `Ok` (the point past which the parity rule applies), or weaken the assertion to "no events with a `WorkerSequence` strictly greater than the last sequence observed within `dataChangeCountBeforeUnadvise + N` events arrived in the post-teardown drain" where `N` accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.
|
|
|
|
**Resolution:** 2026-05-20 — Removed the `dataChangeCountBeforeUnadvise` snapshot taken just before the `UnAdvise` command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken *after* the teardown chain completes. The test now waits one 500 ms settle window for in-flight `OnDataChange` events (which the provider already published before the worker acknowledged `UnAdvise`) to drain, captures `dataChangeCountAfterFirstSettle`, waits another 500 ms, and asserts the count is unchanged in `dataChangeCountAfterSecondSettle`. The parity rule under test ("no further `OnDataChange` after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.
|
|
|
|
### IntegrationTests-018
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session)` declares `session` as a non-nullable `GatewaySession`, but its implementation is `_registry.TryGet(sessionId, out session)`, which (in `SessionRegistry.cs:43`) uses `session!` to silence the nullability warning on `Dictionary.TryGetValue`. On a `false` return the `out` parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as `out GatewaySession? session`, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a `false` return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces `Nullable=enable;TreatWarningsAsErrors=true` (`src/Directory.Build.props`), so the convention is for `TryX` patterns to either annotate the out as `T?` or to mirror BCL `Dictionary.TryGetValue<TKey, TValue>` (which uses `[MaybeNullWhen(false)] out TValue`).
|
|
|
|
**Recommendation:** Change the signature to `public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session)` (mirroring BCL `TryGetValue`) and propagate the same annotation down through `ISessionRegistry.TryGet` / `SessionRegistry.TryGet` so the `session!` fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.
|
|
|
|
**Resolution:** 2026-05-20 — Propagated `[MaybeNullWhen(false)]` through the entire `TryGet*` chain. `GatewayServiceFixture.TryGetSession`, `ISessionManager.TryGetSession` / `SessionManager.TryGetSession`, and `ISessionRegistry.TryGet`/`TryRemove` plus their `SessionRegistry` implementations now carry the BCL `Dictionary.TryGetValue`-style annotation, and the `session!` null-forgiving operator inside `SessionRegistry.TryGet` / `TryRemove` was removed because the annotation makes it redundant. Existing in-tree callers (`SessionManagerTests.cs` line 28) were updated to `out GatewaySession?` to match. The compiler now warns at callsites that read `session` without checking the boolean return, closing the silent contract gap. Verified by `dotnet build src/MxGateway.IntegrationTests/...` (0 warnings), `dotnet build src/MxGateway.Tests/...` (0 warnings), and `dotnet test src/MxGateway.Tests/...` (479 passed).
|
|
|
|
### IntegrationTests-019
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves a credential pair via `ResolveLiveMxAccessSecuredCredentials`, passes the password into `AuthenticateUser` and `WriteSecured`, and asserts the password is absent from `writeSecuredReply.DiagnosticMessage`. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, `WriteSecured` payloads, and `AuthenticateUser` credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:
|
|
|
|
- The `TestOutputLoggerProvider` writes every gateway-side `ILogger` entry to `ITestOutputHelper`. A regression that logged the request body (or the `WorkerCommandRequest` envelope) would put the password into test output without failing this test.
|
|
- `WriteWorkerOutput` echoes worker `stdout`/`stderr` lines to `ITestOutputHelper`. A worker-side regression that printed the credential (e.g. a debug log added to `MxAccessCommandExecutor`) would land in test output without failing this test.
|
|
- `output.WriteLine(...)` calls in the test body (`AuthenticateUser status=... user_id=...` and `LogReply("WriteSecured", ...)`) currently don't include the request body, but a future maintenance change that printed `command.WriteSecured.Value` or a similar struct dump would silently leak the credential past the existing assertion.
|
|
|
|
Because `ITestOutputHelper` doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.
|
|
|
|
**Recommendation:** Replace the bare `ITestOutputHelper` injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a `StringBuilder`. At the end of the test, assert the buffer does not contain `verifyPassword`. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through `GatewayLogRedactor` (`src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs`) so the credential-bearing commands are redacted at the logger sink the test sees.
|
|
|
|
**Resolution:** 2026-05-20 — Added a `RecordingTestOutputHelper` private class that implements `ITestOutputHelper`, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a `StringBuilder` exposed via a `Captured` property. The WriteSecured parity test now constructs this wrapper, passes it to both `TestWorkerProcessFactory` (so worker `stdout`/`stderr` lines flow through it) and `GatewayServiceFixture` (so the `TestOutputLoggerProvider`'s gateway-`ILogger` entries flow through it), and uses it for every direct `WriteLine`. A new `LogReplyTo(ITestOutputHelper sink, …)` static helper underpins the existing `LogReply` instance method so the test body can route reply logging through the recording wrapper. After the cleanup `finally` block completes, the test asserts `recordedOutput.Captured` does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body `WriteLine` — a future regression that dumped the request body, the `WorkerCommandRequest` envelope, or the `WriteSecured` payload would land in the buffer and fail the assertion.
|
|
|
|
### IntegrationTests-020
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` asserts the `FinalFault` description contains at least one of these substrings (case-insensitive): `disconnect`, `pipe`, `heartbeat`, `worker`, `end of stream`. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The `"worker"` substring defeats that intent — the gateway routes through `SetFaulted` with messages like *"Worker pipe disconnected."*, *"Worker shutdown timed out."*, *"Worker was killed by the gateway: …"*, *"Worker heartbeat expired. …"*, *"Worker event channel rejected an event."*, *"Worker pipe write failed."*, *"Worker read loop failed."*, *"Worker sent unexpected envelope body …"* — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word *Worker* would still pass this test.
|
|
|
|
CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (`SessionState.Faulted` with an actionable cause), so the test's documented value is verifying *which* of the WorkerClient error codes drove the transition. Today the assertion is effectively `Assert.NotEmpty(observedFault)`.
|
|
|
|
**Recommendation:** Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — `PipeDisconnected` ("pipe", "disconnect") and `EndOfStream` ("end of stream"). Drop the broad `"worker"` term, and drop `"heartbeat"` unless the test deliberately covers the heartbeat path too (it does not — `HeartbeatGraceSeconds = 15` and the poll deadline is `StreamShutdownTimeout = 10` seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert `FinalFault.StartsWith("Worker pipe disconnected")` against the message constant in `WorkerClient.cs:380` so a rename surfaces as a compile-time / test-time failure.
|
|
|
|
**Resolution:** 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the `FinalFault` description to contain `"pipe disconnected"` (matching the `WorkerClient.cs:378-381` `WorkerFrameProtocolErrorCode.EndOfStream` → `WorkerClientErrorCode.PipeDisconnected` → `"Worker pipe disconnected."` message) or `"end of stream"`, dropping the broad `"worker"` term that previously matched every `WorkerClient` fault message (all of which begin with "Worker"), and dropping `"heartbeat"` because the test's `StreamShutdownTimeout` (10 s) is below `HeartbeatGraceSeconds` (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.
|
|
|
|
### IntegrationTests-021
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The abnormal-exit test only polls `session.State` and `session.FinalFault`. It does not assert anything about `streamTask` after the kill. The chain that puts the session into `Faulted` is: the read loop hits EOS → `SetFaulted(PipeDisconnected, …)` → `_events.Writer.TryComplete(fault)` → `ReadEventsAsync` propagates the `WorkerClientException` → `EventStreamService.ProduceEventsAsync`'s `catch (Exception exception) when exception is WorkerClientException` calls `session.MarkFaulted(exception.Message)`. The test verifies the *end state* of that chain but not that the `StreamEvents` call is what produced the transition. If a future change moved the `MarkFaulted` call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that `streamTask.IsFaulted` (or that awaiting it throws a `WorkerClientException`) would protect that contract.
|
|
|
|
The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (`Assert.False(streamTask.IsFaulted, …)`). The abnormal-exit test should add the inverse: the stream task *must* be faulted (or at least completed with a `WorkerClientException`) after the kill.
|
|
|
|
**Recommendation:** After the session-state poll succeeds, assert `streamTask.IsCompleted` (the channel has terminated) and inspect `streamTask.Exception?.InnerException` for a `WorkerClientException` (or assert `streamTask.IsFaulted` and await with `ShouldThrowAsync`). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing `Assert.False(streamTask.IsFaulted, …)` on line 217 — the abnormal-exit case wants the opposite assertion.
|
|
|
|
**Resolution:** 2026-05-20 — After the session-state poll loop confirms `SessionState.Faulted`, the test now awaits `streamTask.WaitAsync(StreamShutdownTimeout)` (with a try/catch that logs the surfaced exception type/message), then asserts `streamTask.IsCompleted` and `streamTask.IsFaulted`. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → `WorkerClient.SetFaulted(PipeDisconnected, …)` → `ReadEventsAsync` propagates the fault → `EventStreamService` calls `session.MarkFaulted` → `MxAccessGatewayService.StreamEvents` re-throws the mapped `RpcException`. A future refactor that moved `MarkFaulted` off the stream-consumption path would leave `streamTask` completing cleanly, which the new `IsFaulted` assertion would now catch (inverse of the existing `Assert.False(streamTask.IsFaulted, …)` in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps `WorkerClientException` to `RpcException` at the public boundary (`MxAccessGatewayService.MapWorkerClientException`); asserting on the surface type alone would be brittle, while the `IsFaulted` check directly tests the contract the recommendation is protecting.
|
|
|
|
### IntegrationTests-022
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) |
|
|
| Status | Open |
|
|
|
|
**Description:** The walker introduced in `dc9c0c9` searches parents for a directory containing `src/` plus either `.git` or a `*.slnx`/`*.sln` file, falling back to `Directory.GetCurrentDirectory()` when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a `.git` or `.slnx` marker silently uses the wrong root and then the live `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current `bin/` layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.
|
|
|
|
**Recommendation:** Throw a clear `InvalidOperationException` from `ResolveRepositoryRoot` when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (`MXGATEWAY_LIVE_MXACCESS_WORKER_EXE`) remains the escape hatch for unusual deployments.
|
|
|
|
**Resolution:** _(empty until closed)_
|
|
|
|
### IntegrationTests-023
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` |
|
|
| Status | Open |
|
|
|
|
**Description:** `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the principal carries an `LdapGroupClaimType` claim containing `GwAdmin` (line 26-28). After the `27ed651` refactor, `DashboardAuthenticator.CreatePrincipal` also emits a `ClaimTypes.Role` claim with the mapped role (`Admin` for the seeded `GwAdmin` → `Admin` mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting `Role: Admin` (e.g. a future refactor of `MapGroupsToRoles` that returned an empty list) would silently pass.
|
|
|
|
**Recommendation:** Add an `Assert.Contains` asserting the principal holds a `ClaimTypes.Role` claim with value `DashboardRoles.Admin` (or `"Admin"`). Mirror the style of the existing `LdapGroupClaimType` assertion.
|
|
|
|
**Resolution:** _(empty until closed)_
|
|
|
|
### IntegrationTests-024
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) |
|
|
| Status | Open |
|
|
|
|
**Description:** The inline `NullDashboardEventBroadcaster` private class is the third copy in the repository (the other two live in `EventStreamServiceTests` and `GatewayEndToEndFakeWorkerSmokeTests` under the Tests module — see Tests-025). Each carries a singleton `Instance` field and a no-op `Publish`. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.
|
|
|
|
**Recommendation:** When Tests-025 lands the shared `TestSupport/NullDashboardEventBroadcaster.cs`, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.
|
|
|
|
**Resolution:** _(empty until closed)_
|