fix(integrationtests): repair GatewayAlarmMonitor ctor build break; LDAP bind + docs (IntegrationTests-026..029)

This commit is contained in:
Joseph Doherty
2026-06-15 02:39:11 -04:00
parent 258e09e0de
commit d2c776901b
6 changed files with 197 additions and 27 deletions
+104 -2
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.IntegrationTests` |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Status | Re-reviewed |
| Open findings | 0 |
@@ -14,6 +14,34 @@
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
### 2026-06-15 re-review (commit `410acc9`)
Scope: `git diff 42b0037..HEAD -- src/ZB.MOM.WW.MxGateway.IntegrationTests/`
(5 files). The substantive change is the `DashboardLdapLiveTests` cutover to the
shared `ZB.MOM.WW.Auth.Ldap.LdapAuthService` + `DashboardGroupRoleMapper`
(matching the production `DashboardAuthenticator` ctor split); plus the
`ResolveRepositoryRoot` `stopBoundary` parameter and its new regression test
(IntegrationTests-025 resolution), and XML-doc backfill on
`LiveLdapFactAttribute` / `WorkerLiveMxAccessSmokeTests`. NOTE: the review
brief's "live alarm-subtag smoke test(s)" do not exist in this diff — no new
alarm-subtag tests landed here. Instead the in-window Server alarm-monitor
evolution (`ebf1d95`/`9208225`/`410acc9`) changed `GatewayAlarmMonitor`'s
constructor without updating its IntegrationTests caller, leaving the whole
module non-compiling (IntegrationTests-026).
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-026 (the entire IntegrationTests project fails to compile at HEAD — `WorkerLiveMxAccessSmokeTests` constructs `GatewayAlarmMonitor` with the stale 3-arg form `(sessionManager, options, logger)` while the production ctor now requires 5 args `(ISessionManager, IAlarmWatchListResolver, GatewayMetrics, IOptions<GatewayOptions>, ILogger)`; verified by `dotnet build` → CS7036). |
| 2 | mxaccessgw conventions | No issues found. Live opt-in gating, `[Collection]`/`[Trait]` discipline, "no synthesized events", and the credential-redaction contract for the LDAP failure-path assertions are all preserved; the cutover keeps the existing skip-by-default behaviour. |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | No issues found. The `ServerUnreachable` test still asserts the connect failure is absorbed into a `Fail` result; the fail-closed contract now lives in the shared `LdapAuthService` and the test exercises it via `Port = 1`. |
| 5 | Security | No issues found. The wrong-password / unknown-user / unreachable tests still assert no credential leak into `FailureMessage`; the cutover adds no new credential surface and writes no secrets to evidence/probe logs. |
| 6 | Performance & resource management | No issues found. |
| 7 | Design-document adherence | Issue found: IntegrationTests-028 (the live test hand-rolls a field-by-field `LibraryLdapOptions` from the gateway shadow `LdapOptions` defaults instead of binding `MxGateway:Ldap` the way production's `AddZbLdapAuth(configuration, "MxGateway:Ldap")` does, so the live test no longer exercises the production option-binding path and silently omits `ConnectionTimeoutMs` / `ServerCertificateValidationCallback`). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-027 (`DashboardLdapLiveTests` directly consumes `LdapAuthService` / `LdapOptions` from `ZB.MOM.WW.Auth.Ldap` but the IntegrationTests `.csproj` has no direct `PackageReference` — it compiles only via transitive flow through the Server `ProjectReference`). |
| 9 | Testing coverage | No issues found beyond IntegrationTests-026 — the role-claim and stop-boundary assertions added in this window strengthen coverage; but the module cannot build, so none of the IntegrationTests run until IntegrationTests-026 is fixed. |
| 10 | Documentation & comments | Issue found: IntegrationTests-029 (`docs/GatewayTesting.md` "Live LDAP" still describes the old in-`DashboardAuthenticator` branches — "rejected by the candidate bind", "yields no candidate" — that the library cutover moved into the shared `LdapAuthService`; the test comments were updated in this diff but the doc prose was not, contrary to CLAUDE.md's same-commit doc rule). |
### 2026-05-20 re-review (commit `a020350`)
| # | Category | Result |
@@ -506,3 +534,77 @@ The current dev box layout (`C:\Users\dohertj2\Desktop\mxaccessgw`) is safe beca
**Recommendation:** Isolate the walker from any ambient ancestor by either (a) constructing an `isolatedRoot` directly under a drive root and pointing the walker at a chain entirely under it (e.g. create `<isolatedRoot>\level1\level2\level3` and start the walk at `level3`, then assert the throw — the walker stops at the drive root regardless of what is on it), (b) refactoring `ResolveRepositoryRoot` to accept an injectable `stopBoundary` parameter for tests and pass `isolatedRoot` as the boundary, or (c) replacing the `Assert.Throws` shape with an explicit upward-walk check that the test owns. Option (a) is the smallest change: prepend a sentinel — e.g. create a dummy `<isolatedRoot>\sentinel-no-markers` and assert nothing about Temp ancestors — and pass the test only when the walker reaches that sentinel without finding a marker. The current shape is acceptable on the documented dev box but should not be the sole regression coverage for IntegrationTests-022.
**Resolution:** Resolved 2026-05-24 — Took option (b) (inject a stop-boundary) because option (a) does not actually solve the leak: a sentinel chain under `Path.GetTempPath()` still leaves the walker free to ascend past it into Temp / AppData / Users / C:\, so any ambient ancestor with `src/` + `.git`/`.sln`/`.slnx` still wins. Added an optional `stopBoundary` parameter to `IntegrationTestEnvironment.ResolveRepositoryRoot(string startDirectory, string? stopBoundary = null)`. When supplied, the walker checks the boundary for markers and then stops, refusing to ascend past it; production callers (the `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` resolution path) continue to pass `null` so the walk to drive-root behavior is unchanged. Updated both existing tests (`ResolveRepositoryRoot_AcceptsGitWorktreeFile` and `ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers`) to pass their owned temp directory as the boundary, sealing the walker inside a chain the test fully controls. Added a new regression test `ResolveRepositoryRoot_StopBoundary_IsolatesWalkerFromAmbientAncestorMarkers` that deliberately constructs an outer marker-bearing ancestor (`outerRoot/src` + `outerRoot/.git`), an inner boundary, and an isolated start beneath the boundary; first asserts that without the boundary the walker leaks up to `outerRoot` (the precise IntegrationTests-025 failure mode), then asserts that *with* the boundary the same call throws — proving the boundary is the load-bearing isolation. TDD red/green confirmed: the new regression test fails against the pre-fix walker (`Assert.Throws() Failure: No exception was thrown`) and passes once the boundary handling is restored. Re-ran the full `IntegrationTestEnvironmentTests` slice with `TMP` / `TEMP` redirected under a deliberately constructed `<temp>\fake-repo-ancestor` directory carrying `src/` and a `.git` file — the original flake repro from the finding — and confirmed all 5 tests pass (the same redirection produced `Assert.Throws() Failure` on the pre-fix code). Build: 0 warnings / 0 errors.
### IntegrationTests-026
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1098-1101`, `src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs:55-60` |
| Status | Resolved |
**Description:** The entire IntegrationTests project fails to compile at HEAD (`410acc9`). `GatewayServiceFixture` (in `WorkerLiveMxAccessSmokeTests.cs`) constructs the `GatewayAlarmMonitor` it passes into `MxAccessGatewayService` with the stale three-argument form:
```csharp
new ZB.MOM.WW.MxGateway.Server.Alarms.GatewayAlarmMonitor(
sessionManager,
options,
_loggerFactory.CreateLogger<...GatewayAlarmMonitor>())
```
but the production constructor (evolved in-window by `ebf1d95` "monitor resolves watch-list, sends ForcedMode/failover, reflects provider mode into feed + metrics", with later refinements in `9208225` and `410acc9`) now requires **five** parameters: `GatewayAlarmMonitor(ISessionManager sessionManager, IAlarmWatchListResolver watchListResolver, GatewayMetrics metrics, IOptions<GatewayOptions> options, ILogger<GatewayAlarmMonitor> logger)`. `dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests/...` fails with `CS7036: There is no argument given that corresponds to the required parameter 'options'`. Because this is the only `MxAccessGatewayService` assembly site in the fixture, the whole module — every live opt-in test *and* the non-live `IntegrationTestEnvironmentTests` — cannot build or run. This is a CLAUDE.md "Source Update Workflow" violation: a cross-component Server alarm-monitor change was not propagated to its IntegrationTests caller in the same commit, and "build each affected component" was not honored for the IntegrationTests project. It also silently masks the verification basis for IntegrationTests-022..025's "build is green" resolution claims at this HEAD.
**Recommendation:** Update the `GatewayAlarmMonitor` construction in `GatewayServiceFixture` to the current 5-arg signature: supply an `IAlarmWatchListResolver` (a minimal test stub returning an empty/representative watch list, or the production resolver if cheap to construct), the existing `_metrics` (`GatewayMetrics`), the existing `options` wrapped as `IOptions<GatewayOptions>` (e.g. `Options.Create(...)`), and the logger. Then run `dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests/...` to confirm 0 errors and `dotnet test ... --filter FullyQualifiedName~IntegrationTestEnvironmentTests` to confirm the non-live tests pass and the live tests still skip cleanly when the env vars are unset. Add a build of the IntegrationTests project to the verification step whenever `GatewayAlarmMonitor` / `MxAccessGatewayService` constructors change.
**Resolution:** Resolved 2026-06-15: Confirmed the project failed to build at HEAD (CS7036 on the stale 3-arg `GatewayAlarmMonitor` ctor call in `GatewayServiceFixture`). Updated the construction to the current 5-arg signature — added a new `TestSupport/EmptyAlarmWatchListResolver` singleton stub (`IAlarmWatchListResolver` returning an empty watch-list, avoiding the production resolver's `IGalaxyRepository` dependency), and passed the fixture's existing `_metrics` (`GatewayMetrics`) and `options` (`IOptions<GatewayOptions>`). `dotnet build` now succeeds with 0 errors/warnings; non-live tests pass (5) and all 15 live tests skip cleanly with the env vars unset.
### IntegrationTests-027
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:4-5,134` |
| Status | Resolved |
**Description:** After the cutover, `DashboardLdapLiveTests` directly consumes `ZB.MOM.WW.Auth.Ldap.LdapAuthService` and `ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions` (`using ZB.MOM.WW.Auth.Ldap; using ZB.MOM.WW.Auth.Abstractions.Ldap;` and `new LdapAuthService(ldapOptions)`). But the IntegrationTests `.csproj` declares no direct `PackageReference` to `ZB.MOM.WW.Auth.Ldap` or `ZB.MOM.WW.Auth.Abstractions` — it has only `ProjectReference`s to Contracts and Server. It compiles solely because the Server's `PackageReference`s to those packages flow transitively (the Server csproj sets no `PrivateAssets`). A project that directly references a library's public types should declare a direct dependency on it; the current shape means the build silently depends on the Server never marking those packages `PrivateAssets="compile"` and on the transitive compile-asset flow staying enabled. If either changes, the IntegrationTests build breaks with a confusing CS0246 far from the cause.
**Recommendation:** Add explicit `<PackageReference Include="ZB.MOM.WW.Auth.Ldap" Version="0.1.2" />` and `<PackageReference Include="ZB.MOM.WW.Auth.Abstractions" Version="0.1.2" />` (matching the Server's pinned versions, ideally via a shared `Directory.Packages.props` if central package management is in use) to the IntegrationTests project so its direct use of those types is backed by a direct dependency.
**Resolution:** Resolved 2026-06-15: Confirmed the csproj had only `ProjectReference`s and pulled `LdapAuthService`/`LdapOptions` transitively. Added direct `PackageReference`s `ZB.MOM.WW.Auth.Abstractions` and `ZB.MOM.WW.Auth.Ldap` at `0.1.2` (matching the Server's pinned versions; no central package management exists in this repo). Build remains clean. (The IntegrationTests-028 fix also added `Microsoft.Extensions.Configuration.Json`/`.Binder` at `10.0.7`, pinned to the resolved transitive version to avoid an NU1605 downgrade.)
### IntegrationTests-028
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:120-161`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:35` |
| Status | Resolved |
**Description:** Production wires the shared LDAP provider by binding the `MxGateway:Ldap` configuration section straight onto the shared `LdapOptions` via `AddZbLdapAuth(configuration, "MxGateway:Ldap")`. The live test instead hand-rolls a `LibraryLdapOptions` instance by copying the eleven fields of the gateway *shadow* `LdapOptions` defaults (the `LibraryOptions()` helper). Two consequences:
1. The shared `LdapOptions` actually exposes **thirteen** settable properties — the hand-copy omits `ConnectionTimeoutMs` and `ServerCertificateValidationCallback` (verified by reflecting `ZB.MOM.WW.Auth.Abstractions` 0.1.2). `ConnectionTimeoutMs` has a non-zero default and directly governs the `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` (`Port = 1`) test's timing, so the live test exercises the *shared default* timeout, not whatever an operator (or the gateway config) would set — diverging from the production-bound value.
2. It adds a third manual copy of the shadow→shared field mapping on top of the documented "Review C2 DRIFT WARNING" seam in `Server/Configuration/LdapOptions.cs`. A field added to the shared type is silently dropped by this test until someone remembers to extend `LibraryOptions()`.
The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the old test shared the same options object production used; the cutover lost that fidelity. CLAUDE.md treats the live tests as the parity check against the real seeded directory — they should bind options the way production does.
**Recommendation:** Have the test build the shared `LdapOptions` the same way production does — bind it from the `MxGateway:Ldap` section (e.g. load the gateway `appsettings.json` / a minimal in-memory config and call the same `AddZbLdapAuth` binding path, or resolve the bound `IOptions<LdapOptions>` from a DI container that ran `AddZbLdapAuth`). At minimum, document why the two extra shared fields are intentionally left at their defaults, and add `ConnectionTimeoutMs` to the copy so the unreachable-server test's timeout matches production. Prefer eliminating the hand-copy so the shadow-drift surface does not grow.
**Resolution:** Resolved 2026-06-15: Confirmed by reflecting `ZB.MOM.WW.Auth.Abstractions` 0.1.2 that the shared `LdapOptions` exposes 13 settable properties while the hand-copy populated only 11 (omitting `ConnectionTimeoutMs` and `ServerCertificateValidationCallback`). Eliminated the field-by-field hand-copy: `LibraryOptions()` now binds the real `MxGateway:Ldap` section from the Server's `appsettings.json` (resolved via `IntegrationTestEnvironment.ResolveRepositoryRoot`) onto the shared `LdapOptions` with `configuration.GetSection("MxGateway:Ldap").Get<LdapOptions>()` — the same section/binding path production's `AddZbLdapAuth(configuration, "MxGateway:Ldap")` uses. Verified the bind yields `ConnectionTimeoutMs=10000` (the shared default the unreachable-server test relies on) and the dev directory connection (localhost:3893, Transport=None, AllowInsecure). A new shared field is now picked up automatically rather than silently dropped.
### IntegrationTests-029
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/GatewayTesting.md:218-224` |
| Status | Resolved |
**Description:** The "Live LDAP" section of `docs/GatewayTesting.md` still describes the failure branches in terms of the old `DashboardAuthenticator` internals: "`admin` with a wrong password is rejected by the **candidate bind**" and "an unknown username yields **no candidate**". After the cutover in this diff, the bind/search mechanics (and therefore the "candidate bind" / "candidate is null" branches) live in the shared `LdapAuthService`, not in `DashboardAuthenticator` — which is exactly why the test comments in `DashboardLdapLiveTests.cs` were reworded in this same diff from "Exercises the `LdapException` branch" / "the `candidate is null` branch" to "user-bind-failure branch" / "user-not-found branch". The doc prose was not updated to match. CLAUDE.md requires docs that describe security/auth behavior to change in the same commit as the source; the comments moved but the doc did not, leaving the doc describing branches that no longer exist in `DashboardAuthenticator`.
**Recommendation:** Reword the `docs/GatewayTesting.md` "Live LDAP" failure-branch sentences to describe observable behavior without referencing the now-internal "candidate bind" mechanics (e.g. "a wrong password is rejected without leaking the password", "an unknown username fails authentication"), and note that bind/search is delegated to the shared `ZB.MOM.WW.Auth.Ldap` provider so the prose stays accurate after the cutover.
**Resolution:** Resolved 2026-06-15: Reworded the "Live LDAP" failure-branch prose to describe observable behavior ("fails authentication without leaking the password", "an unknown username fails authentication") instead of the now-internal "candidate bind" / "no candidate" mechanics, and added a sentence noting `DashboardAuthenticator` delegates the bind/search to the shared `ZB.MOM.WW.Auth.Ldap` provider (`LdapAuthService`) and only maps groups to roles — matching the in-source test-comment cutover. Verified by inspection.