diff --git a/Component-Security.md b/Component-Security.md index fa095ae..018f9fa 100644 --- a/Component-Security.md +++ b/Component-Security.md @@ -17,9 +17,29 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind ## Authentication -- **Mechanism**: Windows Integrated Authentication (Kerberos/NTLM) against Active Directory. -- **Session**: Authenticated user identity is maintained for the duration of the UI session. -- **No local user store**: All identity and group information comes from AD. +- **Mechanism**: The Central UI presents a username/password login form. The app validates credentials by binding to the LDAP/AD server with the provided credentials, then queries the user's group memberships. +- **No local user store**: All identity and group information comes from AD. No credentials are cached locally. +- **No Windows Integrated Authentication**: The app authenticates directly against LDAP/AD, not via Kerberos/NTLM. + +## Session Management + +### JWT Tokens +- On successful authentication, the app issues a **JWT** signed with a shared symmetric key (HMAC-SHA256). Both central cluster nodes use the same signing key from configuration, so either node can issue and validate tokens. +- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs. All authorization decisions are made from token claims without hitting the database. + +### Token Lifecycle +- **JWT expiry**: 15 minutes. On each request, if the token is near expiry, the app re-queries LDAP for current group memberships and issues a fresh token with updated claims. Roles are never more than 15 minutes stale. +- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the token is not refreshed and the user must re-login. Tracked via a last-activity timestamp in the token. +- **Sliding refresh**: Active users stay logged in indefinitely — the token refreshes every 15 minutes as long as requests are made within the 30-minute idle window. + +### Load Balancer Compatibility +- JWT tokens are self-contained — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store. Central failover is transparent to users with valid tokens. + +## LDAP Connection Failure + +- **New logins**: If the LDAP/AD server is unreachable, login attempts **fail**. Users cannot be authenticated without LDAP. +- **Active sessions**: Users with valid (not-yet-expired) JWTs can **continue operating** with their current roles. The token refresh is skipped until LDAP is available again. This avoids disrupting engineers mid-work during a brief LDAP outage. +- **Recovery**: When LDAP becomes reachable again, the next token refresh cycle re-queries group memberships and issues a fresh token with current roles. ## Roles diff --git a/HighLevelReqs.md b/HighLevelReqs.md index 8e4c1d9..217b883 100644 --- a/HighLevelReqs.md +++ b/HighLevelReqs.md @@ -375,7 +375,7 @@ The central cluster hosts a **configuration and management UI** (no live machine ## 9. Security & Access Control ### 9.1 Authentication -- **UI users** authenticate via **LDAP/Active Directory** directly (Windows Integrated Authentication). +- **UI users** authenticate via **username/password** validated directly against **LDAP/Active Directory**. Sessions are maintained via JWT tokens. - **External system API callers** authenticate via **API key** (see Section 7). ### 9.2 Authorization diff --git a/docs/plans/2026-03-16-security-auth-refinement-design.md b/docs/plans/2026-03-16-security-auth-refinement-design.md new file mode 100644 index 0000000..e81a193 --- /dev/null +++ b/docs/plans/2026-03-16-security-auth-refinement-design.md @@ -0,0 +1,51 @@ +# Security & Auth Refinement — Design + +**Date**: 2026-03-16 +**Component**: Security & Auth (`Component-Security.md`) +**Status**: Approved + +## Problem + +The Security & Auth doc defined roles and LDAP mapping but lacked specification for the authentication mechanism (previously stated Kerberos/NTLM, changed to direct LDAP bind), session management, token format, idle timeout, LDAP failure handling, and load balancer compatibility. + +## Decisions + +### Authentication Mechanism +- **Direct LDAP bind** with username/password. No Windows Integrated Authentication (Kerberos/NTLM). +- User provides credentials in a login form. App validates against LDAP/AD and retrieves group memberships. +- No local credential store or caching. + +### Session Management — JWT +- **JWT with shared symmetric signing key** (HMAC-SHA256). Both central nodes use the same key from configuration. +- **Claims**: user display name, username, roles, permitted site IDs (for site-scoped Deployment). All authorization from token claims — no per-request database lookup. +- **Load balancer compatible** — no server-side session state, no sticky sessions needed. + +### Token Lifecycle +- **15-minute JWT expiry with sliding refresh**. On refresh, app re-queries LDAP for current group memberships and reissues token with updated claims. Roles never more than 15 minutes stale. +- **30-minute idle timeout** (configurable). If no requests within the idle window, user must re-login. +- Active users stay logged in indefinitely via sliding refresh. + +### LDAP Failure Handling +- **Fail closed for new logins** — can't authenticate without LDAP. +- **Grace period for active sessions** — valid JWTs continue to work with current roles. Token refresh skipped until LDAP recovers. Avoids disrupting active work during brief outages. + +### Signing Key +- **Shared symmetric key** (HMAC-SHA256) in configuration. Both nodes are trusted issuers. Asymmetric keys rejected as unnecessary complexity for a two-node trusted cluster. + +## Affected Documents + +| Document | Change | +|----------|--------| +| `Component-Security.md` | Replaced Windows Integrated Auth with direct LDAP bind. Added Session Management, Token Lifecycle, Load Balancer Compatibility, and LDAP Connection Failure sections. | +| `HighLevelReqs.md` | Updated authentication description (Section 9.1) to reflect username/password with JWT. | + +## Alternatives Considered + +- **Windows Integrated Authentication (Kerberos/NTLM)**: Rejected by user — app authenticates directly against LDAP/AD. +- **Server-side sessions with cookie**: Rejected — doesn't work with load balancer without sticky sessions or shared session store. +- **Asymmetric JWT signing (RSA/ECDSA)**: Rejected — both nodes are trusted issuers, no third-party validation needed. +- **Two-token pattern (access + refresh)**: Rejected — sliding single JWT with short expiry is simpler and achieves the same goal. +- **No idle timeout (rely on JWT expiry)**: Rejected — user wanted explicit idle timeout separate from token refresh cycle. +- **Fail closed for active sessions on LDAP outage**: Rejected — would disrupt engineers mid-deployment during brief LDAP outages. +- **Credentials cached for LDAP outage resilience**: Rejected — adds local credential store complexity; correct behavior is to deny new logins when identity can't be verified. +- **Per-request role lookup from database**: Rejected — unnecessary DB query on every request when roles refresh every 15 minutes via LDAP.