feat(security): cookie session idle-timeout + LDAP-free role-mapping refresh (#15, M2.19)

Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external
NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless
service-account group-search. A live LDAP group re-query for an active session therefore
requires a new lib method and is OUT OF SCOPE (cannot modify the external package).
Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals):

- /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a
  zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too.
- SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the
  refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims
  with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions).
- CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin
  CookieAuthenticationEvents.OnValidatePrincipal adapter:
    * idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut;
      consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value).
    * role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option,
      default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt
      via the shared builder, the anchor advances, principal is replaced + cookie renewed.
      Revoked DB mappings drop the user's roles mid-session.
    * fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws)
      — mirrors the documented "LDAP failure: active sessions continue with current roles".
- Documented residual limitation in Component-Security.md: central role-mapping/scope
  changes apply within ~15 min without LDAP; live directory group-membership changes are
  picked up only at next login (needs a passwordless group-search on the external
  ZB.MOM.WW.Auth.Ldap lib — tracked follow-up).

Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests
— idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold
no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.
This commit is contained in:
Joseph Doherty
2026-06-16 07:54:23 -04:00
parent a0d9379a4f
commit 8fe7f46df6
9 changed files with 852 additions and 40 deletions
@@ -19,8 +19,8 @@
{"id": 47, "ref": "M2.15", "subject": "M2.15 #29: register site active-node purge gate (DI)", "class": "small", "status": "completed", "commits": ["e1ee37e"]},
{"id": 48, "ref": "M2.16", "subject": "M2.16 #30: Health Monitoring consumes FailedWriteCount", "class": "small", "status": "completed", "commits": ["d81f747", "c9244d8"]},
{"id": 49, "ref": "M2.17", "subject": "M2.17 #31: reconcile StateTransitionValidator delete-from-NotDeployed", "class": "small", "status": "completed", "commits": ["c104356"]},
{"id": 50, "ref": "M2.18", "subject": "M2.18 #26: debug-stream stream-first ordering + replay/dedup", "class": "high-risk", "status": "completed", "commits": ["d8519cb"]},
{"id": 51, "ref": "M2.19", "subject": "M2.19 #15: LDAP periodic re-query for interactive sessions (spike+impl)", "class": "high-risk", "status": "pending"}
{"id": 50, "ref": "M2.18", "subject": "M2.18 #26: debug-stream stream-first ordering + replay/dedup", "class": "high-risk", "status": "completed", "commits": ["d8519cb", "a0d9379"]},
{"id": 51, "ref": "M2.19", "subject": "M2.19 #15: LDAP periodic re-query for interactive sessions (spike+impl)", "class": "high-risk", "status": "completed", "note": "Spike outcome: shared ILdapAuthService exposes only AuthenticateAsync (no passwordless group-search) -> live LDAP group re-query out of scope (external pkg, tracked follow-up). Implemented always-achievable layers: stored zb:group + zb:lastrolerefresh claims at login, shared SessionClaimBuilder (DRY login+refresh), CookieSessionValidator + OnValidatePrincipal (idle-timeout reject@30m, DB-only role-mapping refresh@15m, fail-soft keep-session on refresh error). Residual limitation documented in Component-Security.md.", "commits": ["9cfa660"]}
],
"deferred": [
{"ref": "#16", "subject": "Transport stale-instance enumeration", "to": "M8 (Transport)"},
+27 -5
View File
@@ -32,9 +32,31 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs.
### Token Lifecycle
- **JWT expiry**: 15 minutes. On each request, if the cookie-embedded JWT is near expiry, the app re-queries LDAP for current group memberships and issues a fresh JWT, writing an updated cookie. Roles are never more than 15 minutes stale.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the token is not refreshed and the user must re-login. Tracked via a last-activity timestamp in the token.
- **Sliding refresh**: Active users stay logged in indefinitely — the token refreshes every 15 minutes as long as requests are made within the 30-minute idle window.
> **Implementation note (M2.19, #15).** The interactive Central UI login path signs in
> with **bare cookie claims**, not a cookie-embedded JWT. The session lifecycle below is
> therefore enforced by the cookie middleware (`ExpireTimeSpan` + `SlidingExpiration`) plus
> a `CookieAuthenticationEvents.OnValidatePrincipal` handler — see **Session Validation
> (`OnValidatePrincipal`)** below. The embedded-JWT model remains the documented design
> intent and is the mechanism for any non-cookie bearer surface (e.g. `/auth/token`), but
> it is **not** the transport for the cookie principal.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the session is rejected and the user must re-login. Tracked via a `LastActivity` last-activity timestamp claim. The cookie's `ExpireTimeSpan` is set to the idle timeout and `SlidingExpiration` renews it on activity, so the cookie window and the explicit `OnValidatePrincipal` idle check use the **same** value and cannot contradict each other.
- **Role-mapping refresh (LDAP-free)**: Configurable, default **15 minutes** (`SecurityOptions.RoleRefreshThresholdMinutes`). At login the session stores the user's raw LDAP groups (one `zb:group` claim each) plus a `zb:lastrolerefresh` anchor. Once the anchor is older than the threshold, `OnValidatePrincipal` re-runs the **DB-backed** `RoleMapper` on the stored groups — **with no LDAP call** — rebuilds the role/scope claims via the shared claim-builder, advances the anchor, and re-issues the cookie. Central role-mapping (DB) changes — including a **revoked** mapping that drops the user's roles, and changed site-scope rules — take effect within this window. Roles derived from central mappings are never more than ~15 minutes stale.
#### Session Validation (`OnValidatePrincipal`)
- The cookie principal is built at login by a **single shared claim-builder** (`SessionClaimBuilder`). The `OnValidatePrincipal` role-refresh path rebuilds the principal through the **same** builder, so the login and refresh claim shapes cannot drift.
- **Failure policy**: the refresh is best-effort. Any error during the refresh (e.g. the configuration database is unreachable) **keeps the existing principal with its current roles** — it never signs the user out and never throws out of the request pipeline. This mirrors the **Active sessions** stance under *LDAP Connection Failure* below. Only the explicit idle-timeout path rejects the principal.
> **Residual limitation — live LDAP group-membership changes (follow-up).** The
> mid-session refresh re-maps the **stored** groups against the central database; it does
> **not** re-query LDAP, so a change to the user's actual **group membership** in the
> directory is picked up only at **next login**. A live group re-query for an active
> session would require a new passwordless service-account group-search method on the
> shared `ZB.MOM.WW.Auth.Ldap` library, which is an **external NuGet package** and exposes
> only `AuthenticateAsync(username, password, ct)` (no standalone group search). Adding
> that method is tracked as a follow-up. Until then: central role-mapping/scope changes are
> reflected within ~15 minutes; directory group-membership changes require re-login.
### Load Balancer Compatibility
- The authentication cookie carries a self-contained JWT — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store.
@@ -43,8 +65,8 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
## LDAP Connection Failure
- **New logins**: If the LDAP/AD server is unreachable, login attempts **fail**. Users cannot be authenticated without LDAP.
- **Active sessions**: Users with valid (not-yet-expired) JWTs can **continue operating** with their current roles. The token refresh is skipped until LDAP is available again. This avoids disrupting engineers mid-work during a brief LDAP outage.
- **Recovery**: When LDAP becomes reachable again, the next token refresh cycle re-queries group memberships and issues a fresh token with current roles.
- **Active sessions**: Users with a valid (not-idle-timed-out) session can **continue operating** with their current roles during an LDAP outage. Interactive cookie sessions never re-query LDAP mid-session (the mid-session role-mapping refresh is DB-only — see *Session Validation* above), so a brief LDAP outage does not disrupt engineers mid-work; central role-mapping changes still apply within the refresh window regardless of LDAP availability.
- **Recovery (group-membership changes)**: Because the mid-session refresh is LDAP-free, a change to a user's **directory group membership** is picked up at the user's **next login** (when LDAP is queried again), not mid-session — see the *Residual limitation* note above.
## Roles