611 lines
82 KiB
Markdown
611 lines
82 KiB
Markdown
# Code Review — IntegrationTests
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/ZB.MOM.WW.MxGateway.IntegrationTests` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-06-15 |
|
|
| Commit reviewed | `410acc9` |
|
|
| Status | Re-reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
### 2026-06-15 re-review (commit `410acc9`)
|
|
|
|
Scope: `git diff 42b0037..HEAD -- src/ZB.MOM.WW.MxGateway.IntegrationTests/`
|
|
(5 files). The substantive change is the `DashboardLdapLiveTests` cutover to the
|
|
shared `ZB.MOM.WW.Auth.Ldap.LdapAuthService` + `DashboardGroupRoleMapper`
|
|
(matching the production `DashboardAuthenticator` ctor split); plus the
|
|
`ResolveRepositoryRoot` `stopBoundary` parameter and its new regression test
|
|
(IntegrationTests-025 resolution), and XML-doc backfill on
|
|
`LiveLdapFactAttribute` / `WorkerLiveMxAccessSmokeTests`. NOTE: the review
|
|
brief's "live alarm-subtag smoke test(s)" do not exist in this diff — no new
|
|
alarm-subtag tests landed here. Instead the in-window Server alarm-monitor
|
|
evolution (`ebf1d95`/`9208225`/`410acc9`) changed `GatewayAlarmMonitor`'s
|
|
constructor without updating its IntegrationTests caller, leaving the whole
|
|
module non-compiling (IntegrationTests-026).
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-026 (the entire IntegrationTests project fails to compile at HEAD — `WorkerLiveMxAccessSmokeTests` constructs `GatewayAlarmMonitor` with the stale 3-arg form `(sessionManager, options, logger)` while the production ctor now requires 5 args `(ISessionManager, IAlarmWatchListResolver, GatewayMetrics, IOptions<GatewayOptions>, ILogger)`; verified by `dotnet build` → CS7036). |
|
|
| 2 | mxaccessgw conventions | No issues found. Live opt-in gating, `[Collection]`/`[Trait]` discipline, "no synthesized events", and the credential-redaction contract for the LDAP failure-path assertions are all preserved; the cutover keeps the existing skip-by-default behaviour. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | No issues found. The `ServerUnreachable` test still asserts the connect failure is absorbed into a `Fail` result; the fail-closed contract now lives in the shared `LdapAuthService` and the test exercises it via `Port = 1`. |
|
|
| 5 | Security | No issues found. The wrong-password / unknown-user / unreachable tests still assert no credential leak into `FailureMessage`; the cutover adds no new credential surface and writes no secrets to evidence/probe logs. |
|
|
| 6 | Performance & resource management | No issues found. |
|
|
| 7 | Design-document adherence | Issue found: IntegrationTests-028 (the live test hand-rolls a field-by-field `LibraryLdapOptions` from the gateway shadow `LdapOptions` defaults instead of binding `MxGateway:Ldap` the way production's `AddZbLdapAuth(configuration, "MxGateway:Ldap")` does, so the live test no longer exercises the production option-binding path and silently omits `ConnectionTimeoutMs` / `ServerCertificateValidationCallback`). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-027 (`DashboardLdapLiveTests` directly consumes `LdapAuthService` / `LdapOptions` from `ZB.MOM.WW.Auth.Ldap` but the IntegrationTests `.csproj` has no direct `PackageReference` — it compiles only via transitive flow through the Server `ProjectReference`). |
|
|
| 9 | Testing coverage | No issues found beyond IntegrationTests-026 — the role-claim and stop-boundary assertions added in this window strengthen coverage; but the module cannot build, so none of the IntegrationTests run until IntegrationTests-026 is fixed. |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-029 (`docs/GatewayTesting.md` "Live LDAP" still describes the old in-`DashboardAuthenticator` branches — "rejected by the candidate bind", "yields no candidate" — that the library cutover moved into the shared `LdapAuthService`; the test comments were updated in this diff but the doc prose was not, contrary to CLAUDE.md's same-commit doc rule). |
|
|
|
|
### 2026-05-20 re-review (commit `a020350`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring `"worker"`, which matches almost any plausible fault message and dilutes the check to "the description is non-empty"). |
|
|
| 2 | mxaccessgw conventions | No issues found. The five new tests honor live opt-in gating, `[Collection]` serialization, "no synthesized events", and the credential-redaction contract for the assertions they make. |
|
|
| 3 | Concurrency & thread safety | No issues found. `GatewaySession.State`/`FinalFault` access in the abnormal-exit poll loop goes through `_syncRoot`; `RecordingServerStreamWriter.Messages` returns a locked snapshot copy. |
|
|
| 4 | Error handling & resilience | No issues found. `ShutDownAsync`'s opt-in `propagateStreamFaults` correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path. |
|
|
| 5 | Security | Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from `DiagnosticMessage` only; it does not assert the credential is absent from the accumulated test output, where the worker `stderr`/`stdout` and the gateway log are echoed). |
|
|
| 6 | Performance & resource management | No issues found. All six `RecordingServerStreamWriter<MxEvent>` instantiations use `using` declarations; `using CancellationTokenSource` is the consistent pattern. |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` documents all five new parity surfaces and the two new env-var defaults (`MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER`/`_PASSWORD`). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-018 (`GatewayServiceFixture.TryGetSession` declares `out GatewaySession session` non-nullable while the caller binds it as `out GatewaySession? session`; the null-forgiving operator inside `SessionRegistry.TryGet` propagates a misleading non-null annotation). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active `StreamEvents` task observed the worker fault; relies entirely on the session-state poll and would silently pass if `MarkFaulted` were ever moved off the stream-consumption path). |
|
|
| 10 | Documentation & comments | No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases). |
|
|
|
|
### 2026-05-20 review (commit `1cd51bb`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a `StreamEvents` task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
|
|
| 2 | mxaccessgw conventions | Live opt-ins, `[Collection]` serialization, and the "don't synthesize events" rule are honored. No issues found. |
|
|
| 3 | Concurrency & thread safety | `LiveResourcesCollection` serializes all three live classes; `RecordingServerStreamWriter` locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
|
|
| 4 | Error handling & resilience | `ShutDownAsync` already isolates cleanup exceptions per category. No issues found. |
|
|
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
|
|
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (`RecordingServerStreamWriter.messageArrived` `SemaphoreSlim` is never disposed; the type owns an `IDisposable` field but is not itself disposable). |
|
|
| 7 | Design-document adherence | No issues found. `docs/GatewayTesting.md` now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 (`[Trait("Category", ...)]` repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between `LiveGalaxyRepositoryFactAttribute` and `GalaxyRepositoryOptions`). |
|
|
| 9 | Testing coverage | Issue found: IntegrationTests-014 (`Unadvise`, `RemoveItem`, `Unregister`, `WriteSecured` ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-`Ok` MXAccess failure as `ProtocolStatusCode.Ok`, contradicting both the assertion and `HResultConverter`). |
|
|
|
|
### 2026-05-18 review (commit `6c64030`)
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForMessageAsync` ignores cancellation). |
|
|
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
|
|
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
|
|
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
|
|
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
|
|
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
|
|
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
|
|
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
|
|
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
|
|
|
|
### 2026-05-24 re-review (commit `42b0037`)
|
|
|
|
Re-review pass at `42b0037` scoped to commit `865c22a` (the only commit
|
|
touching `src/ZB.MOM.WW.MxGateway.IntegrationTests` between `d692232` and
|
|
`42b0037` — all later commits in the range affect dashboard/clients only).
|
|
`865c22a` resolved IntegrationTests-022..024: the
|
|
`ResolveRepositoryRoot` throw replaces the silent
|
|
`Directory.GetCurrentDirectory()` fallback and ships a regression test;
|
|
`DashboardLdapLiveTests` adds the `Role: Admin` claim assertion; and
|
|
`NullDashboardEventBroadcaster` is extracted into
|
|
`TestSupport/NullDashboardEventBroadcaster.cs`.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-025 (the new `ResolveRepositoryRoot_NoMarkers_…` test seeds an "isolated" directory under `Path.GetTempPath()` and walks parents, implicitly assuming no ancestor of the temp dir contains `src/` plus `.git`/`*.sln`/`*.slnx`; the assumption is environment-dependent and can flap on machines whose `TMP` is redirected). |
|
|
| 2 | mxaccessgw conventions | No issues found — file-scoped namespaces, `sealed` types, accurate XML docs, env-var names preserved. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | No issues found. The new `InvalidOperationException` only fires from the live-MXAccess worker-exe resolution path, which is gated by `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS`; non-live tests are unaffected. |
|
|
| 5 | Security | No issues found — the new live-LDAP claim assertion adds no new credential surface. |
|
|
| 6 | Performance & resource management | No issues found. |
|
|
| 7 | Design-document adherence | No issues found. The diff is internal behavior hardening + a private-claim assertion; no public API / env-var / opt-in surface changes, so the CLAUDE.md "update docs in the same change" rule does not require a `docs/GatewayTesting.md` edit. |
|
|
| 8 | Code organization & conventions | No issues found. `NullDashboardEventBroadcaster`'s placement under `TestSupport/` matches the unit-test project layout; the deliberate duplication is documented in IntegrationTests-024's resolution. |
|
|
| 9 | Testing coverage | No issues found — the new role-claim assertion in `DashboardLdapLiveTests` and the new exception-path test in `IntegrationTestEnvironmentTests` both add coverage; no further gaps surface from this diff. |
|
|
| 10 | Documentation & comments | No issues found. XML docs on `ResolveRepositoryRoot` accurately describe the new throw, and the inline IntegrationTests-023 comment correctly explains *why* the role claim is asserted. |
|
|
|
|
### 2026-05-24 review (commit d692232)
|
|
|
|
Re-review pass at `d692232` scoped to: the rename (`dc9c0c9`) of the
|
|
namespaces, csproj, and the `IntegrationTestEnvironment.ResolveRepositoryRoot`
|
|
walker; the `DashboardLdapLiveTests` update to inject a `GroupToRole`
|
|
mapping in place of the dropped `LdapOptions.RequiredGroup` default
|
|
(`27ed651`); and the inline `NullDashboardEventBroadcaster` fake added
|
|
to `WorkerLiveMxAccessSmokeTests` for the new `EventStreamService` ctor
|
|
parameter (`d692232`).
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
|
|
| 2 | mxaccessgw conventions | No issues found — namespaces, env-var names, and live-test attribute discipline preserved. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | Confirmed: the `RpcException(StatusCode.Cancelled)` sibling catch added in `WorkerLiveMxAccessSmokeTests.ShutDownAsync` covers the only site where a gRPC-mapped `OperationCanceledException` can surface — no other call sites need widening. |
|
|
| 5 | Security | No issues found. |
|
|
| 6 | Performance & resource management | No issues found. |
|
|
| 7 | Design-document adherence | No issues found — `docs/GatewayTesting.md` was updated in the same wave to describe the GroupToRole-fixture pattern. |
|
|
| 8 | Code organization & conventions | Issues found: IntegrationTests-022 (`ResolveRepositoryRoot` silently falls back to `Directory.GetCurrentDirectory()` on exhausted walk, which can hide misconfiguration), IntegrationTests-024 (inline `NullDashboardEventBroadcaster` fake is the only such use today but worth watching for duplication once a second consumer arrives). |
|
|
| 9 | Testing coverage | Issues found: IntegrationTests-023 (`DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the `ldap_group` claim but does not assert the emitted `Role: Admin` claim, leaving the role-mapping path untested). |
|
|
| 10 | Documentation & comments | No issues found. |
|
|
|
|
## Findings
|
|
|
|
### IntegrationTests-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
|
|
|
|
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
|
|
|
|
### IntegrationTests-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
|
|
|
|
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
|
|
|
|
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
|
|
|
|
### IntegrationTests-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
|
|
|
|
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
|
|
|
|
### IntegrationTests-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
|
|
|
|
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
|
|
|
|
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
|
|
|
|
### IntegrationTests-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
|
|
|
|
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
|
|
|
|
### IntegrationTests-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
|
|
|
|
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — no `[Collection]` or assembly-level `CollectionBehavior` existed. Added `LiveResourcesCollection.cs` with a `[CollectionDefinition(Name, DisableParallelization = true)]` and applied `[Collection(LiveResourcesCollection.Name)]` to `WorkerLiveMxAccessSmokeTests`, `GalaxyRepositoryLiveTests`, and `DashboardLdapLiveTests`. A named collection (rather than an assembly-wide `DisableTestParallelization`) was chosen so the live classes serialize against each other and within themselves while non-live tests (`IntegrationTestEnvironmentTests`) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
|
|
|
|
### IntegrationTests-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
|
|
|
|
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — `LiveLdapFactAttribute.Enabled` and `LiveGalaxyRepositoryFactAttribute.Enabled` each inlined the ordinal `== "1"` comparison while `LiveMxAccessFactAttribute` delegated to `IntegrationTestEnvironment`. Added `IntegrationTestEnvironment.IsEnabled(string variableName)` as the single implementation; `LiveMxAccessTestsEnabled`, `LiveLdapFactAttribute.Enabled`, and `LiveGalaxyRepositoryFactAttribute.Enabled` now all call it. Verified by build.
|
|
|
|
### IntegrationTests-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
|
|
|
|
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
|
|
|
|
**Resolution:** Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal `ServerCallContext` stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
|
|
|
|
### IntegrationTests-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
|
|
|
|
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
|
|
|
|
**Re-triage:** The named method `WaitForFirstMessageAsync` no longer exists — IntegrationTests-003's resolution renamed/replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages and blocks on a `SemaphoreSlim`. The underlying defect still held: that replacement method also took only a `timeout` and never observed a `CancellationToken`. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's `firstMessage.Task.WaitAsync` detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
|
|
|
|
**Resolution:** Resolved 2026-05-18: Added an optional `CancellationToken` parameter to `WaitForMessageAsync`, linked with the existing timeout source via `CancellationTokenSource.CreateLinkedTokenSource`, so a per-test cancellation aborts the wait promptly. `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now creates a `CancellationTokenSource`, passes its token into the `StreamEvents` `TestServerCallContext` and into `WaitForMessageAsync`, so the stream call and the wait share one cancellation source. On timeout the method already throws a `TimeoutException` whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate `output.WriteLine`). Verified by build; live tests not executed.
|
|
|
|
### IntegrationTests-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts `Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code)`. `HResultConverter.CreateProtocolStatus` (`src/MxGateway.Worker/Conversion/HResultConverter.cs:39`) actually sets `Code = ProtocolStatusCode.MxaccessFailure` whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is *not* `Ok` on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting `Ok`. A reader can reasonably conclude the gateway always reports `Ok` for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
|
|
|
|
**Recommendation:** Reword the invalid-handle comment to "the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero `Hresult` carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as `MxaccessFailure` (per `HResultConverter`), not as a `RpcException`.
|
|
|
|
**Resolution:** 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with `ProtocolStatusCode.MxaccessFailure` and a non-zero hresult carrying the COM failure (per `HResultConverter`), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as `MxaccessFailure`, never as an `RpcException`.
|
|
|
|
### IntegrationTests-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` constructs a `RecordingServerStreamWriter<MxEvent>` and starts a `StreamEvents` task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the `Write` command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an `OnWriteComplete`, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with `new TestServerCallContext()` (no cancellation source), any fault raised by the stream task (an exception from `EventStreamService`, a session-not-found, a backpressure overflow) is swallowed — `streamTask` is later awaited in `ShutDownAsync` only inside a broad `catch (Exception ex)`, which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (`eventWriter`) is dead code in this test that suggests an assertion was intended.
|
|
|
|
**Recommendation:** Either remove the unused `eventWriter`/`StreamEvents` plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an `OnWriteComplete` event for the written item via `eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...)`, matching the round-trip proof used by `scripts/run-client-e2e-tests.ps1 -VerifyWrite`.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Rewrote `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` so the previously-dead `eventWriter`/`StreamEvents` plumbing actually drives an assertion. The test now waits for an `OnWriteComplete` event matching the Write's (server, item) handle pair via `eventWriter.WaitForMessageAsync` (using `IntegrationTestEnvironment.LiveMxAccessEventTimeout`), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a `CancellationTokenSource` and the test asserts `streamTask.IsFaulted == false` before cleanup. `ShutDownAsync` gained an opt-in `propagateStreamFaults` flag so a faulted `StreamEvents` task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so `StreamEvents` observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RecordingServerStreamWriter<T>` owns a `SemaphoreSlim messageArrived` (`IDisposable`) but does not itself implement `IDisposable`, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an `IDisposable` field — the standard hygiene under `Directory.Build.props`'s `TreatWarningsAsErrors=true` is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself `IDisposable`; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
|
|
|
|
**Recommendation:** Make `RecordingServerStreamWriter<T>` implement `IDisposable`, dispose `messageArrived` in `Dispose`, and wrap each instantiation in a `using` block (`using RecordingServerStreamWriter<MxEvent> eventWriter = new();`).
|
|
|
|
**Resolution:** 2026-05-20 — `RecordingServerStreamWriter<T>` now implements `IDisposable` and its `Dispose` releases the `messageArrived` semaphore. All six live tests in `WorkerLiveMxAccessSmokeTests` now allocate the writer with a top-of-method `using` declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
|
|
|
|
### IntegrationTests-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** IntegrationTests-005 was resolved by adding live coverage for `Write` and an invalid-handle `AddItem`, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, and `WriteSecured` ordering. CLAUDE.md singles out `WriteSecured` ("`WriteSecured` failing before a value-bearing NMX body") and `OperationComplete` semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
|
|
|
|
**Recommendation:** Add live MXAccess coverage for the teardown chain (`Unadvise` then `RemoveItem` then `Unregister`, asserting each replies with `ProtocolStatusCode.Ok` and the next operation no longer references the freed handle), and at minimum one `WriteSecured` parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
|
|
|
|
**Resolution:** Resolved 2026-05-20: Added three new `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`, all reusing the existing opt-in env var and `ShutDownAsync` cleanup helper. (1) `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies `Ok` with the matching `MxCommandKind`, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-`Ok` MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves an ArchestrA user id via `AuthenticateUser` (credentials env-overridable through `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER` / `..._PASSWORD`, defaulting to the `admin`/`admin123` GLAuth user from `glauth.md`), issues `WriteSecured` against an advised item, and asserts the reply carries `MxCommandKind.WriteSecured`, the protocol status is one of the documented parity outcomes (`Ok` for an unprotected provider, `MxaccessFailure` when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` opens a session, kills the worker process tree (via a new `TestWorkerProcessFactory.KillAllAndDetach` helper) without going through CloseSession, and polls the session via a new `GatewayServiceFixture.TryGetSession` accessor until it transitions to `SessionState.Faulted` within the live event timeout; asserts the final state is `Faulted`, that `FinalFault` is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). `docs/GatewayTesting.md` was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is unset.
|
|
|
|
### IntegrationTests-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Every live-test method in the three live classes carries an identical `[Trait("Category", "LiveMxAccess")]` (or `LiveLdap` / `LiveGalaxy`) attribute. The trait is uniform within each class and is exactly the information the `[Collection(LiveResourcesCollection.Name)]` class-level attribute also implies. xUnit's `[Trait]` is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's `LiveLdap` includes five copies of the same line.
|
|
|
|
**Recommendation:** Move each `[Trait("Category", ...)]` to the class declaration alongside the existing `[Collection(...)]`, and remove the per-method copies. Verify the trait still surfaces in `--filter Trait=Category=LiveLdap` after the change.
|
|
|
|
**Resolution:** 2026-05-20 — Lifted `[Trait("Category", "LiveMxAccess")]`, `[Trait("Category", "LiveLdap")]`, and `[Trait("Category", "LiveGalaxy")]` to the class declarations of `WorkerLiveMxAccessSmokeTests`, `DashboardLdapLiveTests`, and `GalaxyRepositoryLiveTests` respectively (alongside the existing `[Collection(LiveResourcesCollection.Name)]`), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so `--filter Category=Live*` filters still match.
|
|
|
|
### IntegrationTests-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26`, `src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The default Galaxy Repository connection string `"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;"` is duplicated verbatim between the production `GalaxyRepositoryOptions.ConnectionString` initializer and the test-side `LiveGalaxyRepositoryFactAttribute.ConnectionString` fallback. The docs (`docs/GatewayTesting.md`) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
|
|
|
|
**Recommendation:** Expose the production default through a `public const string` on `GalaxyRepositoryOptions` (e.g. `DefaultConnectionString`) and have `LiveGalaxyRepositoryFactAttribute.ConnectionString` read `Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString`. Single source of truth, build-time guarantee they cannot drift.
|
|
|
|
**Resolution:** 2026-05-20 — Added `public const string GalaxyRepositoryOptions.DefaultConnectionString` carrying the production default, set the `ConnectionString` initializer to reference it, and changed `LiveGalaxyRepositoryFactAttribute.ConnectionString` to fall back to `GalaxyRepositoryOptions.DefaultConnectionString`. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
|
|
|
|
### IntegrationTests-017
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity` proves the subscription is live by waiting for one matching `OnDataChange`, snapshots `dataChangeCountBeforeUnadvise`, then sends `UnAdvise`, waits 500 ms, snapshots `dataChangeCountAfterTeardown`, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:
|
|
|
|
1. After `WaitForMessageAsync` returns the first match, any additional `OnDataChange` for the same `(serverHandle, itemHandle)` published by the provider before the worker processes `UnAdvise` is delivered into the recording writer.
|
|
2. The snapshot at line 362 is taken *immediately before* the `UnAdvise` command is sent (line 370). Events that arrive in the window between that snapshot and the worker processing `UnAdvise` (network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot.
|
|
3. `MXAccess` providers can publish `OnDataChange` at sub-second cadence; the strict-equality assertion has no slack for in-flight events.
|
|
|
|
The test passes today only because the chosen test item (`TestChildObject.TestInt`) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events *after the worker stops the subscription*") would be better expressed by capturing the snapshot after `UnAdvise` returns `Ok` rather than before it is issued.
|
|
|
|
**Recommendation:** Move the "before" snapshot to immediately *after* `UnAdvise` returns `Ok` (the point past which the parity rule applies), or weaken the assertion to "no events with a `WorkerSequence` strictly greater than the last sequence observed within `dataChangeCountBeforeUnadvise + N` events arrived in the post-teardown drain" where `N` accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.
|
|
|
|
**Resolution:** 2026-05-20 — Removed the `dataChangeCountBeforeUnadvise` snapshot taken just before the `UnAdvise` command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken *after* the teardown chain completes. The test now waits one 500 ms settle window for in-flight `OnDataChange` events (which the provider already published before the worker acknowledged `UnAdvise`) to drain, captures `dataChangeCountAfterFirstSettle`, waits another 500 ms, and asserts the count is unchanged in `dataChangeCountAfterSecondSettle`. The parity rule under test ("no further `OnDataChange` after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.
|
|
|
|
### IntegrationTests-018
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session)` declares `session` as a non-nullable `GatewaySession`, but its implementation is `_registry.TryGet(sessionId, out session)`, which (in `SessionRegistry.cs:43`) uses `session!` to silence the nullability warning on `Dictionary.TryGetValue`. On a `false` return the `out` parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as `out GatewaySession? session`, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a `false` return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces `Nullable=enable;TreatWarningsAsErrors=true` (`src/Directory.Build.props`), so the convention is for `TryX` patterns to either annotate the out as `T?` or to mirror BCL `Dictionary.TryGetValue<TKey, TValue>` (which uses `[MaybeNullWhen(false)] out TValue`).
|
|
|
|
**Recommendation:** Change the signature to `public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session)` (mirroring BCL `TryGetValue`) and propagate the same annotation down through `ISessionRegistry.TryGet` / `SessionRegistry.TryGet` so the `session!` fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.
|
|
|
|
**Resolution:** 2026-05-20 — Propagated `[MaybeNullWhen(false)]` through the entire `TryGet*` chain. `GatewayServiceFixture.TryGetSession`, `ISessionManager.TryGetSession` / `SessionManager.TryGetSession`, and `ISessionRegistry.TryGet`/`TryRemove` plus their `SessionRegistry` implementations now carry the BCL `Dictionary.TryGetValue`-style annotation, and the `session!` null-forgiving operator inside `SessionRegistry.TryGet` / `TryRemove` was removed because the annotation makes it redundant. Existing in-tree callers (`SessionManagerTests.cs` line 28) were updated to `out GatewaySession?` to match. The compiler now warns at callsites that read `session` without checking the boolean return, closing the silent contract gap. Verified by `dotnet build src/MxGateway.IntegrationTests/...` (0 warnings), `dotnet build src/MxGateway.Tests/...` (0 warnings), and `dotnet test src/MxGateway.Tests/...` (479 passed).
|
|
|
|
### IntegrationTests-019
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity` resolves a credential pair via `ResolveLiveMxAccessSecuredCredentials`, passes the password into `AuthenticateUser` and `WriteSecured`, and asserts the password is absent from `writeSecuredReply.DiagnosticMessage`. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, `WriteSecured` payloads, and `AuthenticateUser` credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:
|
|
|
|
- The `TestOutputLoggerProvider` writes every gateway-side `ILogger` entry to `ITestOutputHelper`. A regression that logged the request body (or the `WorkerCommandRequest` envelope) would put the password into test output without failing this test.
|
|
- `WriteWorkerOutput` echoes worker `stdout`/`stderr` lines to `ITestOutputHelper`. A worker-side regression that printed the credential (e.g. a debug log added to `MxAccessCommandExecutor`) would land in test output without failing this test.
|
|
- `output.WriteLine(...)` calls in the test body (`AuthenticateUser status=... user_id=...` and `LogReply("WriteSecured", ...)`) currently don't include the request body, but a future maintenance change that printed `command.WriteSecured.Value` or a similar struct dump would silently leak the credential past the existing assertion.
|
|
|
|
Because `ITestOutputHelper` doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.
|
|
|
|
**Recommendation:** Replace the bare `ITestOutputHelper` injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a `StringBuilder`. At the end of the test, assert the buffer does not contain `verifyPassword`. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through `GatewayLogRedactor` (`src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs`) so the credential-bearing commands are redacted at the logger sink the test sees.
|
|
|
|
**Resolution:** 2026-05-20 — Added a `RecordingTestOutputHelper` private class that implements `ITestOutputHelper`, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a `StringBuilder` exposed via a `Captured` property. The WriteSecured parity test now constructs this wrapper, passes it to both `TestWorkerProcessFactory` (so worker `stdout`/`stderr` lines flow through it) and `GatewayServiceFixture` (so the `TestOutputLoggerProvider`'s gateway-`ILogger` entries flow through it), and uses it for every direct `WriteLine`. A new `LogReplyTo(ITestOutputHelper sink, …)` static helper underpins the existing `LogReply` instance method so the test body can route reply logging through the recording wrapper. After the cleanup `finally` block completes, the test asserts `recordedOutput.Captured` does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body `WriteLine` — a future regression that dumped the request body, the `WorkerCommandRequest` envelope, or the `WriteSecured` payload would land in the buffer and fail the assertion.
|
|
|
|
### IntegrationTests-020
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted` asserts the `FinalFault` description contains at least one of these substrings (case-insensitive): `disconnect`, `pipe`, `heartbeat`, `worker`, `end of stream`. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The `"worker"` substring defeats that intent — the gateway routes through `SetFaulted` with messages like *"Worker pipe disconnected."*, *"Worker shutdown timed out."*, *"Worker was killed by the gateway: …"*, *"Worker heartbeat expired. …"*, *"Worker event channel rejected an event."*, *"Worker pipe write failed."*, *"Worker read loop failed."*, *"Worker sent unexpected envelope body …"* — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word *Worker* would still pass this test.
|
|
|
|
CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (`SessionState.Faulted` with an actionable cause), so the test's documented value is verifying *which* of the WorkerClient error codes drove the transition. Today the assertion is effectively `Assert.NotEmpty(observedFault)`.
|
|
|
|
**Recommendation:** Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — `PipeDisconnected` ("pipe", "disconnect") and `EndOfStream` ("end of stream"). Drop the broad `"worker"` term, and drop `"heartbeat"` unless the test deliberately covers the heartbeat path too (it does not — `HeartbeatGraceSeconds = 15` and the poll deadline is `StreamShutdownTimeout = 10` seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert `FinalFault.StartsWith("Worker pipe disconnected")` against the message constant in `WorkerClient.cs:380` so a rename surfaces as a compile-time / test-time failure.
|
|
|
|
**Resolution:** 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the `FinalFault` description to contain `"pipe disconnected"` (matching the `WorkerClient.cs:378-381` `WorkerFrameProtocolErrorCode.EndOfStream` → `WorkerClientErrorCode.PipeDisconnected` → `"Worker pipe disconnected."` message) or `"end of stream"`, dropping the broad `"worker"` term that previously matched every `WorkerClient` fault message (all of which begin with "Worker"), and dropping `"heartbeat"` because the test's `StreamShutdownTimeout` (10 s) is below `HeartbeatGraceSeconds` (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.
|
|
|
|
### IntegrationTests-021
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The abnormal-exit test only polls `session.State` and `session.FinalFault`. It does not assert anything about `streamTask` after the kill. The chain that puts the session into `Faulted` is: the read loop hits EOS → `SetFaulted(PipeDisconnected, …)` → `_events.Writer.TryComplete(fault)` → `ReadEventsAsync` propagates the `WorkerClientException` → `EventStreamService.ProduceEventsAsync`'s `catch (Exception exception) when exception is WorkerClientException` calls `session.MarkFaulted(exception.Message)`. The test verifies the *end state* of that chain but not that the `StreamEvents` call is what produced the transition. If a future change moved the `MarkFaulted` call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that `streamTask.IsFaulted` (or that awaiting it throws a `WorkerClientException`) would protect that contract.
|
|
|
|
The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (`Assert.False(streamTask.IsFaulted, …)`). The abnormal-exit test should add the inverse: the stream task *must* be faulted (or at least completed with a `WorkerClientException`) after the kill.
|
|
|
|
**Recommendation:** After the session-state poll succeeds, assert `streamTask.IsCompleted` (the channel has terminated) and inspect `streamTask.Exception?.InnerException` for a `WorkerClientException` (or assert `streamTask.IsFaulted` and await with `ShouldThrowAsync`). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing `Assert.False(streamTask.IsFaulted, …)` on line 217 — the abnormal-exit case wants the opposite assertion.
|
|
|
|
**Resolution:** 2026-05-20 — After the session-state poll loop confirms `SessionState.Faulted`, the test now awaits `streamTask.WaitAsync(StreamShutdownTimeout)` (with a try/catch that logs the surfaced exception type/message), then asserts `streamTask.IsCompleted` and `streamTask.IsFaulted`. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → `WorkerClient.SetFaulted(PipeDisconnected, …)` → `ReadEventsAsync` propagates the fault → `EventStreamService` calls `session.MarkFaulted` → `MxAccessGatewayService.StreamEvents` re-throws the mapped `RpcException`. A future refactor that moved `MarkFaulted` off the stream-consumption path would leave `streamTask` completing cleanly, which the new `IsFaulted` assertion would now catch (inverse of the existing `Assert.False(streamTask.IsFaulted, …)` in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps `WorkerClientException` to `RpcException` at the public boundary (`MxAccessGatewayService.MapWorkerClientException`); asserting on the surface type alone would be brittle, while the `IsFaulted` check directly tests the contract the recommendation is protecting.
|
|
|
|
### IntegrationTests-022
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The walker introduced in `dc9c0c9` searches parents for a directory containing `src/` plus either `.git` or a `*.slnx`/`*.sln` file, falling back to `Directory.GetCurrentDirectory()` when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a `.git` or `.slnx` marker silently uses the wrong root and then the live `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current `bin/` layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.
|
|
|
|
**Recommendation:** Throw a clear `InvalidOperationException` from `ResolveRepositoryRoot` when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (`MXGATEWAY_LIVE_MXACCESS_WORKER_EXE`) remains the escape hatch for unusual deployments.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Replaced the silent `Directory.GetCurrentDirectory()` fallback in `IntegrationTestEnvironment.ResolveRepositoryRoot` with an `InvalidOperationException` whose message names the start directory, the expected markers (`src/`, `.git`, `*.sln`, `*.slnx`), and the `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` escape hatch. Added regression test `IntegrationTestEnvironmentTests.ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers` that creates an isolated temp directory under `Path.GetTempPath()` containing no markers, asserts the throw, and asserts the message carries the start directory, `.git`, `.sln`, and the worker-exe env-var name so a future refactor that dropped any of these from the diagnostic would fail loudly. TDD red/green confirmed: the new test fails against the pre-fix walker (`Assert.Throws() Failure: No exception was thrown`) and passes after the throw is added. The existing `ResolveRepositoryRoot_AcceptsGitWorktreeFile` test still passes, so the happy path is unaffected.
|
|
|
|
### IntegrationTests-023
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the principal carries an `LdapGroupClaimType` claim containing `GwAdmin` (line 26-28). After the `27ed651` refactor, `DashboardAuthenticator.CreatePrincipal` also emits a `ClaimTypes.Role` claim with the mapped role (`Admin` for the seeded `GwAdmin` → `Admin` mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting `Role: Admin` (e.g. a future refactor of `MapGroupsToRoles` that returned an empty list) would silently pass.
|
|
|
|
**Recommendation:** Add an `Assert.Contains` asserting the principal holds a `ClaimTypes.Role` claim with value `DashboardRoles.Admin` (or `"Admin"`). Mirror the style of the existing `LdapGroupClaimType` assertion.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Added an `Assert.Contains` to `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` that mirrors the existing `LdapGroupClaimType` assertion style, asserting the principal carries a `ClaimTypes.Role` claim with value `DashboardRoles.Admin`. Couples the live test to the `GwAdmin` → `Admin` mapping wired through `CreateAuthenticator`'s `GroupToRole` dictionary, so a future regression to `MapGroupsToRoles` (returning an empty list, dropping the RDN-fallback lookup, breaking the case-insensitive comparer) would now surface as a failed live test rather than a silent pass. The test is gated by `MXGATEWAY_RUN_LIVE_LDAP_TESTS`; build is green and the test continues to skip cleanly when the env var is unset.
|
|
|
|
### IntegrationTests-024
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The inline `NullDashboardEventBroadcaster` private class is the third copy in the repository (the other two live in `EventStreamServiceTests` and `GatewayEndToEndFakeWorkerSmokeTests` under the Tests module — see Tests-025). Each carries a singleton `Instance` field and a no-op `Publish`. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.
|
|
|
|
**Recommendation:** When Tests-025 lands the shared `TestSupport/NullDashboardEventBroadcaster.cs`, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Chose option (b) (extract within IntegrationTests) so the unit-test and integration-test projects stay independently buildable without a shared test-helpers project. Moved the inline private class out of `WorkerLiveMxAccessSmokeTests.cs` into a new public type at `src/ZB.MOM.WW.MxGateway.IntegrationTests/TestSupport/NullDashboardEventBroadcaster.cs` (namespace `ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport`), mirroring the layout the unit-test Tests project established for Tests-007/Tests-021. The fixture wires through the same `NullDashboardEventBroadcaster.Instance` singleton, but the constructor is now private so future integration tests can rely on a single shared no-op. A new `using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport;` in `WorkerLiveMxAccessSmokeTests.cs` is the only change to its `EventStreamService` wiring. Build is green (0 warnings, 0 errors); no regression in the existing tests (4 passed / 15 skipped, identical to the pre-change baseline).
|
|
|
|
### IntegrationTests-025
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironmentTests.cs:57-84` (`ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The new regression test for IntegrationTests-022 builds an "isolated" start directory under `Path.GetTempPath()` (e.g. `C:\Users\<user>\AppData\Local\Temp\<random>\nested` on Windows) and calls `ResolveRepositoryRoot(isolatedStart)`, asserting an `InvalidOperationException` is thrown. The walker walks every parent — `<random>`, `Temp`, `Local`, `AppData`, `<user>`, `Users`, `C:\` — and stops only when it either finds a repository root marker or runs out of parents. The test silently assumes none of those ancestor directories satisfies `IsRepositoryRoot` (a `src/` subdirectory next to `.git` / `*.sln` / `*.slnx`). The assumption is environment-dependent:
|
|
|
|
1. `Path.GetTempPath()` honors the `TMP` / `TEMP` environment variable. A developer or CI runner that redirects `TMP` to a path under a checkout (e.g. `D:\work\repo\tmp`) — or simply has another git checkout at `C:\` (`C:\src` with a `.git`, common on Windows build agents) — will see the walker stop early at the real repo root and return that path instead of throwing. The `Assert.Throws<InvalidOperationException>` then fails with a misleading "No exception was thrown" message that does not name the path the walker actually returned.
|
|
2. The sibling test `ResolveRepositoryRoot_AcceptsGitWorktreeFile` (line 25) is unaffected because it walks from a directory it controls upward into an enclosing directory it also controls — the worktree file marker is encountered before any ancestor of `Path.GetTempPath()` is reached. The new test has the opposite shape: it walks past every directory the test owns into directories it does not control.
|
|
|
|
The current dev box layout (`C:\Users\dohertj2\Desktop\mxaccessgw`) is safe because Temp is at `C:\Users\dohertj2\AppData\Local\Temp` and the walker exits at `C:\` without ever encountering `src/`. The fragility is invisible on this machine and only surfaces if the test ever runs in CI / on a contributor box with a less hermetic file-system layout.
|
|
|
|
**Recommendation:** Isolate the walker from any ambient ancestor by either (a) constructing an `isolatedRoot` directly under a drive root and pointing the walker at a chain entirely under it (e.g. create `<isolatedRoot>\level1\level2\level3` and start the walk at `level3`, then assert the throw — the walker stops at the drive root regardless of what is on it), (b) refactoring `ResolveRepositoryRoot` to accept an injectable `stopBoundary` parameter for tests and pass `isolatedRoot` as the boundary, or (c) replacing the `Assert.Throws` shape with an explicit upward-walk check that the test owns. Option (a) is the smallest change: prepend a sentinel — e.g. create a dummy `<isolatedRoot>\sentinel-no-markers` and assert nothing about Temp ancestors — and pass the test only when the walker reaches that sentinel without finding a marker. The current shape is acceptable on the documented dev box but should not be the sole regression coverage for IntegrationTests-022.
|
|
|
|
**Resolution:** Resolved 2026-05-24 — Took option (b) (inject a stop-boundary) because option (a) does not actually solve the leak: a sentinel chain under `Path.GetTempPath()` still leaves the walker free to ascend past it into Temp / AppData / Users / C:\, so any ambient ancestor with `src/` + `.git`/`.sln`/`.slnx` still wins. Added an optional `stopBoundary` parameter to `IntegrationTestEnvironment.ResolveRepositoryRoot(string startDirectory, string? stopBoundary = null)`. When supplied, the walker checks the boundary for markers and then stops, refusing to ascend past it; production callers (the `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` resolution path) continue to pass `null` so the walk to drive-root behavior is unchanged. Updated both existing tests (`ResolveRepositoryRoot_AcceptsGitWorktreeFile` and `ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers`) to pass their owned temp directory as the boundary, sealing the walker inside a chain the test fully controls. Added a new regression test `ResolveRepositoryRoot_StopBoundary_IsolatesWalkerFromAmbientAncestorMarkers` that deliberately constructs an outer marker-bearing ancestor (`outerRoot/src` + `outerRoot/.git`), an inner boundary, and an isolated start beneath the boundary; first asserts that without the boundary the walker leaks up to `outerRoot` (the precise IntegrationTests-025 failure mode), then asserts that *with* the boundary the same call throws — proving the boundary is the load-bearing isolation. TDD red/green confirmed: the new regression test fails against the pre-fix walker (`Assert.Throws() Failure: No exception was thrown`) and passes once the boundary handling is restored. Re-ran the full `IntegrationTestEnvironmentTests` slice with `TMP` / `TEMP` redirected under a deliberately constructed `<temp>\fake-repo-ancestor` directory carrying `src/` and a `.git` file — the original flake repro from the finding — and confirmed all 5 tests pass (the same redirection produced `Assert.Throws() Failure` on the pre-fix code). Build: 0 warnings / 0 errors.
|
|
|
|
### IntegrationTests-026
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1098-1101`, `src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs:55-60` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The entire IntegrationTests project fails to compile at HEAD (`410acc9`). `GatewayServiceFixture` (in `WorkerLiveMxAccessSmokeTests.cs`) constructs the `GatewayAlarmMonitor` it passes into `MxAccessGatewayService` with the stale three-argument form:
|
|
|
|
```csharp
|
|
new ZB.MOM.WW.MxGateway.Server.Alarms.GatewayAlarmMonitor(
|
|
sessionManager,
|
|
options,
|
|
_loggerFactory.CreateLogger<...GatewayAlarmMonitor>())
|
|
```
|
|
|
|
but the production constructor (evolved in-window by `ebf1d95` "monitor resolves watch-list, sends ForcedMode/failover, reflects provider mode into feed + metrics", with later refinements in `9208225` and `410acc9`) now requires **five** parameters: `GatewayAlarmMonitor(ISessionManager sessionManager, IAlarmWatchListResolver watchListResolver, GatewayMetrics metrics, IOptions<GatewayOptions> options, ILogger<GatewayAlarmMonitor> logger)`. `dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests/...` fails with `CS7036: There is no argument given that corresponds to the required parameter 'options'`. Because this is the only `MxAccessGatewayService` assembly site in the fixture, the whole module — every live opt-in test *and* the non-live `IntegrationTestEnvironmentTests` — cannot build or run. This is a CLAUDE.md "Source Update Workflow" violation: a cross-component Server alarm-monitor change was not propagated to its IntegrationTests caller in the same commit, and "build each affected component" was not honored for the IntegrationTests project. It also silently masks the verification basis for IntegrationTests-022..025's "build is green" resolution claims at this HEAD.
|
|
|
|
**Recommendation:** Update the `GatewayAlarmMonitor` construction in `GatewayServiceFixture` to the current 5-arg signature: supply an `IAlarmWatchListResolver` (a minimal test stub returning an empty/representative watch list, or the production resolver if cheap to construct), the existing `_metrics` (`GatewayMetrics`), the existing `options` wrapped as `IOptions<GatewayOptions>` (e.g. `Options.Create(...)`), and the logger. Then run `dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests/...` to confirm 0 errors and `dotnet test ... --filter FullyQualifiedName~IntegrationTestEnvironmentTests` to confirm the non-live tests pass and the live tests still skip cleanly when the env vars are unset. Add a build of the IntegrationTests project to the verification step whenever `GatewayAlarmMonitor` / `MxAccessGatewayService` constructors change.
|
|
|
|
**Resolution:** Resolved 2026-06-15: Confirmed the project failed to build at HEAD (CS7036 on the stale 3-arg `GatewayAlarmMonitor` ctor call in `GatewayServiceFixture`). Updated the construction to the current 5-arg signature — added a new `TestSupport/EmptyAlarmWatchListResolver` singleton stub (`IAlarmWatchListResolver` returning an empty watch-list, avoiding the production resolver's `IGalaxyRepository` dependency), and passed the fixture's existing `_metrics` (`GatewayMetrics`) and `options` (`IOptions<GatewayOptions>`). `dotnet build` now succeeds with 0 errors/warnings; non-live tests pass (5) and all 15 live tests skip cleanly with the env vars unset.
|
|
|
|
### IntegrationTests-027
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:4-5,134` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** After the cutover, `DashboardLdapLiveTests` directly consumes `ZB.MOM.WW.Auth.Ldap.LdapAuthService` and `ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions` (`using ZB.MOM.WW.Auth.Ldap; using ZB.MOM.WW.Auth.Abstractions.Ldap;` and `new LdapAuthService(ldapOptions)`). But the IntegrationTests `.csproj` declares no direct `PackageReference` to `ZB.MOM.WW.Auth.Ldap` or `ZB.MOM.WW.Auth.Abstractions` — it has only `ProjectReference`s to Contracts and Server. It compiles solely because the Server's `PackageReference`s to those packages flow transitively (the Server csproj sets no `PrivateAssets`). A project that directly references a library's public types should declare a direct dependency on it; the current shape means the build silently depends on the Server never marking those packages `PrivateAssets="compile"` and on the transitive compile-asset flow staying enabled. If either changes, the IntegrationTests build breaks with a confusing CS0246 far from the cause.
|
|
|
|
**Recommendation:** Add explicit `<PackageReference Include="ZB.MOM.WW.Auth.Ldap" Version="0.1.2" />` and `<PackageReference Include="ZB.MOM.WW.Auth.Abstractions" Version="0.1.2" />` (matching the Server's pinned versions, ideally via a shared `Directory.Packages.props` if central package management is in use) to the IntegrationTests project so its direct use of those types is backed by a direct dependency.
|
|
|
|
**Resolution:** Resolved 2026-06-15: Confirmed the csproj had only `ProjectReference`s and pulled `LdapAuthService`/`LdapOptions` transitively. Added direct `PackageReference`s `ZB.MOM.WW.Auth.Abstractions` and `ZB.MOM.WW.Auth.Ldap` at `0.1.2` (matching the Server's pinned versions; no central package management exists in this repo). Build remains clean. (The IntegrationTests-028 fix also added `Microsoft.Extensions.Configuration.Json`/`.Binder` at `10.0.7`, pinned to the resolved transitive version to avoid an NU1605 downgrade.)
|
|
|
|
### IntegrationTests-028
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:120-161`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:35` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Production wires the shared LDAP provider by binding the `MxGateway:Ldap` configuration section straight onto the shared `LdapOptions` via `AddZbLdapAuth(configuration, "MxGateway:Ldap")`. The live test instead hand-rolls a `LibraryLdapOptions` instance by copying the eleven fields of the gateway *shadow* `LdapOptions` defaults (the `LibraryOptions()` helper). Two consequences:
|
|
|
|
1. The shared `LdapOptions` actually exposes **thirteen** settable properties — the hand-copy omits `ConnectionTimeoutMs` and `ServerCertificateValidationCallback` (verified by reflecting `ZB.MOM.WW.Auth.Abstractions` 0.1.2). `ConnectionTimeoutMs` has a non-zero default and directly governs the `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` (`Port = 1`) test's timing, so the live test exercises the *shared default* timeout, not whatever an operator (or the gateway config) would set — diverging from the production-bound value.
|
|
2. It adds a third manual copy of the shadow→shared field mapping on top of the documented "Review C2 DRIFT WARNING" seam in `Server/Configuration/LdapOptions.cs`. A field added to the shared type is silently dropped by this test until someone remembers to extend `LibraryOptions()`.
|
|
|
|
The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the old test shared the same options object production used; the cutover lost that fidelity. CLAUDE.md treats the live tests as the parity check against the real seeded directory — they should bind options the way production does.
|
|
|
|
**Recommendation:** Have the test build the shared `LdapOptions` the same way production does — bind it from the `MxGateway:Ldap` section (e.g. load the gateway `appsettings.json` / a minimal in-memory config and call the same `AddZbLdapAuth` binding path, or resolve the bound `IOptions<LdapOptions>` from a DI container that ran `AddZbLdapAuth`). At minimum, document why the two extra shared fields are intentionally left at their defaults, and add `ConnectionTimeoutMs` to the copy so the unreachable-server test's timeout matches production. Prefer eliminating the hand-copy so the shadow-drift surface does not grow.
|
|
|
|
**Resolution:** Resolved 2026-06-15: Confirmed by reflecting `ZB.MOM.WW.Auth.Abstractions` 0.1.2 that the shared `LdapOptions` exposes 13 settable properties while the hand-copy populated only 11 (omitting `ConnectionTimeoutMs` and `ServerCertificateValidationCallback`). Eliminated the field-by-field hand-copy: `LibraryOptions()` now binds the real `MxGateway:Ldap` section from the Server's `appsettings.json` (resolved via `IntegrationTestEnvironment.ResolveRepositoryRoot`) onto the shared `LdapOptions` with `configuration.GetSection("MxGateway:Ldap").Get<LdapOptions>()` — the same section/binding path production's `AddZbLdapAuth(configuration, "MxGateway:Ldap")` uses. Verified the bind yields `ConnectionTimeoutMs=10000` (the shared default the unreachable-server test relies on) and the dev directory connection (localhost:3893, Transport=None, AllowInsecure). A new shared field is now picked up automatically rather than silently dropped.
|
|
|
|
### IntegrationTests-029
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `docs/GatewayTesting.md:218-224` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The "Live LDAP" section of `docs/GatewayTesting.md` still describes the failure branches in terms of the old `DashboardAuthenticator` internals: "`admin` with a wrong password is rejected by the **candidate bind**" and "an unknown username yields **no candidate**". After the cutover in this diff, the bind/search mechanics (and therefore the "candidate bind" / "candidate is null" branches) live in the shared `LdapAuthService`, not in `DashboardAuthenticator` — which is exactly why the test comments in `DashboardLdapLiveTests.cs` were reworded in this same diff from "Exercises the `LdapException` branch" / "the `candidate is null` branch" to "user-bind-failure branch" / "user-not-found branch". The doc prose was not updated to match. CLAUDE.md requires docs that describe security/auth behavior to change in the same commit as the source; the comments moved but the doc did not, leaving the doc describing branches that no longer exist in `DashboardAuthenticator`.
|
|
|
|
**Recommendation:** Reword the `docs/GatewayTesting.md` "Live LDAP" failure-branch sentences to describe observable behavior without referencing the now-internal "candidate bind" mechanics (e.g. "a wrong password is rejected without leaking the password", "an unknown username fails authentication"), and note that bind/search is delegated to the shared `ZB.MOM.WW.Auth.Ldap` provider so the prose stays accurate after the cutover.
|
|
|
|
**Resolution:** Resolved 2026-06-15: Reworded the "Live LDAP" failure-branch prose to describe observable behavior ("fails authentication without leaking the password", "an unknown username fails authentication") instead of the now-internal "candidate bind" / "no candidate" mechanics, and added a sentence noting `DashboardAuthenticator` delegates the bind/search to the shared `ZB.MOM.WW.Auth.Ldap` provider (`LdapAuthService`) and only maps groups to roles — matching the in-source test-comment cutover. Verified by inspection.
|