Files
scadalink-design/docs/plans/2026-03-16-security-auth-refinement-design.md
Joseph Doherty cbc78465e0 Refine Security & Auth: LDAP bind, JWT sessions, idle timeout, failure handling
Replace Windows Integrated Auth with direct LDAP bind (username/password login form).
Add JWT-based sessions with HMAC-SHA256 shared key for load balancer compatibility.
15-minute token refresh re-queries LDAP for current group memberships. 30-minute
configurable idle timeout. LDAP failure: new logins fail, active sessions continue
with current roles until LDAP recovers.
2026-03-16 08:16:29 -04:00

3.4 KiB

Security & Auth Refinement — Design

Date: 2026-03-16 Component: Security & Auth (Component-Security.md) Status: Approved

Problem

The Security & Auth doc defined roles and LDAP mapping but lacked specification for the authentication mechanism (previously stated Kerberos/NTLM, changed to direct LDAP bind), session management, token format, idle timeout, LDAP failure handling, and load balancer compatibility.

Decisions

Authentication Mechanism

  • Direct LDAP bind with username/password. No Windows Integrated Authentication (Kerberos/NTLM).
  • User provides credentials in a login form. App validates against LDAP/AD and retrieves group memberships.
  • No local credential store or caching.

Session Management — JWT

  • JWT with shared symmetric signing key (HMAC-SHA256). Both central nodes use the same key from configuration.
  • Claims: user display name, username, roles, permitted site IDs (for site-scoped Deployment). All authorization from token claims — no per-request database lookup.
  • Load balancer compatible — no server-side session state, no sticky sessions needed.

Token Lifecycle

  • 15-minute JWT expiry with sliding refresh. On refresh, app re-queries LDAP for current group memberships and reissues token with updated claims. Roles never more than 15 minutes stale.
  • 30-minute idle timeout (configurable). If no requests within the idle window, user must re-login.
  • Active users stay logged in indefinitely via sliding refresh.

LDAP Failure Handling

  • Fail closed for new logins — can't authenticate without LDAP.
  • Grace period for active sessions — valid JWTs continue to work with current roles. Token refresh skipped until LDAP recovers. Avoids disrupting active work during brief outages.

Signing Key

  • Shared symmetric key (HMAC-SHA256) in configuration. Both nodes are trusted issuers. Asymmetric keys rejected as unnecessary complexity for a two-node trusted cluster.

Affected Documents

Document Change
Component-Security.md Replaced Windows Integrated Auth with direct LDAP bind. Added Session Management, Token Lifecycle, Load Balancer Compatibility, and LDAP Connection Failure sections.
HighLevelReqs.md Updated authentication description (Section 9.1) to reflect username/password with JWT.

Alternatives Considered

  • Windows Integrated Authentication (Kerberos/NTLM): Rejected by user — app authenticates directly against LDAP/AD.
  • Server-side sessions with cookie: Rejected — doesn't work with load balancer without sticky sessions or shared session store.
  • Asymmetric JWT signing (RSA/ECDSA): Rejected — both nodes are trusted issuers, no third-party validation needed.
  • Two-token pattern (access + refresh): Rejected — sliding single JWT with short expiry is simpler and achieves the same goal.
  • No idle timeout (rely on JWT expiry): Rejected — user wanted explicit idle timeout separate from token refresh cycle.
  • Fail closed for active sessions on LDAP outage: Rejected — would disrupt engineers mid-deployment during brief LDAP outages.
  • Credentials cached for LDAP outage resilience: Rejected — adds local credential store complexity; correct behavior is to deny new logins when identity can't be verified.
  • Per-request role lookup from database: Rejected — unnecessary DB query on every request when roles refresh every 15 minutes via LDAP.