Files
scadaproj/docs/plans/2026-06-02-auth-audit-normalization-phase1.md
T

146 lines
14 KiB
Markdown

# Phase 1 (Auth adoption) — elaborated steps + Task 1.0 findings
Companion to `2026-06-02-auth-audit-normalization.md`. Produced by the Task 1.0 read-only
exploration gate (4 parallel explorers: library surface + 3 repos). All paths verified
2026-06-02 against source.
## Cutover target — `ZB.MOM.WW.Auth` public surface
| Package | Consumer entry points |
|---|---|
| `.Abstractions` | **NB: `IGroupRoleMapper<TRole>`/`GroupRoleMapping<TRole>`/`CanonicalRole` live in namespace `ZB.MOM.WW.Auth.Abstractions.Roles`** (verified during Task 1.1). `ILdapAuthService`, `LdapOptions` (`Transport: LdapTransport{Ldaps,StartTls,None}`, `AllowInsecure`, `UserNameAttribute`, `GroupAttribute`, `ServiceAccountDn/Password`, `SearchBase`, `ConnectionTimeoutMs`, `ServerCertificateValidationCallback`), `LdapAuthResult(Succeeded,Username,DisplayName,Groups,Failure)`, `LdapAuthFailure`, `CanonicalRole{Viewer,Operator,Engineer,Designer,Deployer,Administrator}`, `IGroupRoleMapper<TRole>` (**no default impl — consumer writes it**) → `GroupRoleMapping<TRole>(Roles, Scope:object?)`, plus API-key abstractions (`IApiKeyVerifier`, `ApiKeyVerification`, `ApiKeyIdentity`, `IApiKeyStore`/`IApiKeyAdminStore`/`IApiKeyAuditStore`, `ApiKeyOptions{TokenPrefix,PepperSecretName,SqlitePath,RunMigrationsOnStartup}`) |
| `.Ldap` | `LdapAuthService(LdapOptions)` : `ILdapAuthService`. Bind-then-search, fail-closed, never throws. `LdapOptionsValidator` (TLS-or-AllowInsecure) auto-registered. |
| `.ApiKeys` | `ApiKeyVerifier(ApiKeyOptions, IApiKeyStore, IApiKeyPepperProvider, TimeProvider?)`, `ApiKeyParser.TryParse` (`<prefix>_<keyId>_<secret>`), `ApiKeySecretGenerator.NewSecret()`, default SQLite stores, `ConfigurationApiKeyPepperProvider`. **Extracted from MxGateway — near-1:1 with its pipeline.** |
| `.AspNetCore` | `ZbClaimTypes{Name,Role,DisplayName,Username,ScopeId}`, `ZbCookieDefaults.Apply(opts, requireHttps, idleTimeout)`, DI: `AddZbLdapAuth(services, config, sectionPath)`, `AddZbApiKeyAuth(services, config, sectionPath)`. |
## Per-app current state (verified) and elaborated cutover
### OtOpcUa — packages: Abstractions + Ldap + AspNetCore (no ApiKeys)
Current LDAP: `src/Server/ZB.MOM.WW.OtOpcUa.Security/Ldap/LdapAuthService.cs` (impl), `ILdapAuthService.cs`,
`LdapOptions.cs` (**section `Security:Ldap`**, `UseTls` bool, `Enabled`, `DevStubMode`, embedded `GroupToRole` dict),
`LdapAuthResult.cs` (already carries `Roles`). Role mapping is **config + DB**: `RoleMapper.Map` (config
`GroupToRole`) + `RoleMapper.Merge` with DB `LdapGroupRoleMappingService`/`LdapGroupRoleMapping` (system-wide rows).
Native roles `AdminRole{ConfigViewer,ConfigEditor,FleetAdmin}` (control-plane only; data-plane is a separate
`NodePermissions` bitmask). DI: two `TryAddSingleton<ILdapAuthService,LdapAuthService>` sites
(`Security/ServiceCollectionExtensions.cs:42` + `Host/Program.cs:106`). Cookie `ZB.MOM.WW.OtOpcUa.Auth`,
single Cookie scheme (JWT inside cookie). **Second LDAP consumer:** OPC UA data-plane
`LdapOpcUaUserAuthenticator` + `OpcUaApplicationHost.HandleImpersonation` call the LDAP service too.
- **1.1 mapper:** implement `IGroupRoleMapper<AdminRole>` (or `<string>`) wrapping `RoleMapper.Map` + DB `Merge`.
- **1.2 Ldap:** replace `LdapAuthService` with `Auth.Ldap`; restructure flow to `ILdapAuthService → Groups → IGroupRoleMapper → roles → claims`; **preserve `DevStubMode` app-side** (library has no stub); wire BOTH consumers (login endpoint + OPC UA impersonation).
- **1.4 config:** `UseTls``Transport` enum (section already `Security:Ldap` — see Finding #1).
- **1.5 cookie/claims:** use `ZbClaimTypes` + `ZbCookieDefaults.Apply`; keep cookie name.
- **1.7 roles:** `ConfigViewer→Viewer`, `ConfigEditor→Designer`, `FleetAdmin→Administrator(+Deployer; publish⊂FleetAdmin)`. Data-plane `NodePermissions` unaffected.
### MxAccessGateway — packages: all 4 (ApiKeys **source**, cuts over first)
Current API keys (`src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/`): `ApiKeyParser` (`mxgw_<id>_<secret>`),
`ApiKeySecretHasher` (HMAC-SHA256 + pepper `MxGateway:ApiKeyPepper`), `ApiKeySecretGenerator`, `ApiKeyVerifier`
(`FixedTimeEquals`), SQLite stores, `ConstraintEnforcer` + rich `ApiKeyConstraints`, gRPC
`GatewayGrpcAuthorizationInterceptor` + `GatewayScopes`. DI `AddSqliteAuthStore()`. → **near-1:1 with `Auth.ApiKeys`.**
LDAP: `Dashboard/DashboardAuthenticator.cs` (`MxGateway:Ldap`, `UseTls`), `GroupToRole` under `MxGateway:Dashboard`,
roles `Admin`/`Viewer`, cookie `MxGatewayDashboard`.
- **1.1 mapper:** `IGroupRoleMapper<string>` wrapping `DashboardAuthenticator.MapGroupsToRoles`.
- **1.2 Ldap:** replace `DashboardAuthenticator`'s LDAP internals with `Auth.Ldap` (keep dashboard claims/principal build).
- **1.3 ApiKeys:** delete the local parser/hasher/generator/verifier/stores; re-point to `Auth.ApiKeys`; **keep** `ConstraintEnforcer` + gRPC interceptor + scopes on top (constraints carried as the opaque blob). Lowest-risk ApiKeys cutover (it's the donor).
- **1.4 config:** `UseTls``Transport`.
- **1.5/1.7:** `ZbClaimTypes`/cookie defaults; `Viewer→Viewer`, `Admin→Administrator`.
### ScadaBridge — packages: all 4 (Ldap **source**; ApiKeys consumer)
Current LDAP (`src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs`): the hardened reference (RFC-4514 DN escape,
filter escape, per-op timeout, fail-closed group lookup, username trim, service-account-bind distinction). Config is
**flat** `ScadaBridge:Security:Ldap*` in `SecurityOptions.cs` with **`LdapTransport` enum already** (`Ldaps/StartTls/None`),
`AllowInsecureLdap`, `LdapUserIdAttribute`, `LdapGroupAttribute`, validated by `SecurityOptionsValidator : OptionsValidatorBase`.
Role mapping **DB-backed** with **site-scoping**: `RoleMapper.MapGroupsToRolesAsync``RoleMappingResult(Roles, PermittedSiteIds, IsSystemWideDeployment)` over `LdapGroupMapping` + `SiteScopeRule` (SQL Server). Roles
`Admin/Design/Deployment/Audit/AuditReadOnly`; SoD via `OperationalAudit{Admin,Audit,AuditReadOnly}` + `AuditExport{Admin,Audit}`.
Cookie `ZB.MOM.WW.ScadaBridge.Auth`; JWT-in-cookie via `JwtTokenService`.
**Inbound API keys** (`InboundAPI/ApiKeyValidator.cs`): **raw `X-API-Key`**, **deterministic** HMAC (`ApiKeyHasher`, no per-row salt, by-value lookup), `ApiKey{Name,KeyHash,IsEnabled}` in **SQL Server**, **per-method approval** via `ApiMethod.ApprovedApiKeyIds`**architecturally different from the library's keyId/scope/SQLite model.**
- **1.1 mapper:** `IGroupRoleMapper<string>` wrapping `RoleMapper.MapGroupsToRolesAsync`, carrying `PermittedSiteIds`/`IsSystemWideDeployment` in `GroupRoleMapping.Scope`.
- **1.2 Ldap:** ScadaBridge is the donor — confirm `Auth.Ldap` behaviour-matches, then re-point `LdapAuthService` usages to the library type. Lowest-risk Ldap cutover.
- **1.3 ApiKeys:** **see Finding #3 — bigger than a token reformat; needs a scope decision.**
- **1.4 config:** nest flat `Security:Ldap*` under a sub-section + rename `LdapUserIdAttribute→UserNameAttribute`, `LdapGroupAttribute→GroupAttribute`, `LdapTransport→Transport` (+ `SecurityOptionsValidator` + appsettings). Enum already matches.
- **1.7 roles:** `Admin→Administrator`, `Design→Designer`, `Deployment→Deployer`, `Audit→Administrator` (collapse), `AuditReadOnly→Viewer` (collapse) — removes the `OperationalAudit`/`AuditExport` SoD (accepted).
## Key findings that change the plan
1. **OtOpcUa LDAP section is `Security:Ldap`, not `Authentication:Ldap`.** Both `components/auth/GAPS.md §1`
and the auth current-state doc are wrong; the code (and the prior fix in memory) use `Security:Ldap`.
→ Task 1.4 for OtOpcUa is only `UseTls``Transport`, not a section move.
2. **OtOpcUa "double-singleton bug" is already mitigated.** Both registration sites use `TryAddSingleton`
(dedupes); the `Enabled` flag is an intentional fail-closed master switch. → Not a blocking fix; verify and
keep `Enabled`. Removes a risk the plan flagged.
3. **ScadaBridge inbound API keys are a re-architecture, not a token reformat.** The library's ApiKeys model
(`<prefix>_<keyId>_<secret>` Bearer, keyId lookup + constant-time compare, SQLite store, scopes + opaque
constraints) is fundamentally different from ScadaBridge's (raw `X-API-Key`, deterministic by-value HMAC
lookup, SQL Server `ApiKey{Name,KeyHash}`, per-method approval list). Wholesale adoption means re-architecting
inbound-API auth AND resolving a SQLite-vs-SQL-Server storage mismatch. **Needs a scope decision (Decision A).**
4. **OtOpcUa role mapping is config + DB**, not just config (`RoleMapper.Map` baseline + DB `Merge`). The
`IGroupRoleMapper` impl must combine both. OtOpcUa also has `DevStubMode` (no library equivalent — keep app-side)
and a **second LDAP consumer** (OPC UA data-plane impersonation) that must be re-wired too.
5. **MxGateway ApiKeys cutover is the donor path — lowest risk** (delete locals, re-point to library; keep
`ConstraintEnforcer`/gRPC/scopes on top). Confirms the GAPS sequencing (gateway first).
## Task 1.2 (LDAP cutover) — implemented + reviewed (2026-06-02)
Commits: OtOpcUa `257caa7`, MxGateway `c3b466e`, ScadaBridge `ac34dac`. All targeted tests green.
Security review verdict: **sound, no credential-leak regression** in any repo (insecure-transport
guards fire correctly; DevStubMode cannot leak to prod; claim shapes preserved). All three returned
CHANGES-REQUESTED for fixable issues:
- **OtOpcUa** (no Critical): (I1) insecure-transport guard is login-time only — add startup
validation gated on `Enabled` for defense-in-depth, verify prod overlays still boot; (I2) integration
stub pre-populates `Roles` so the Groups→mapper path isn't actually exercised — fix the stub; (I3)
document/test the zero-role fail-closed fallback.
- **MxGateway** (2 Critical): (C1) library strips group DNs to short RDN names before the
`LdapGroupClaimType` claim → verify prior behaviour, document, drop the now-dead full-DN branch in the
mapper, add a claim-value assertion; (C2) gateway's local `LdapOptions` is now a shadow copy (validated
but unused at runtime) → fold to the shared type or document the drift. (I1) shared `LdapOptionsValidator`
has **no `Enabled=false` guard** → validates even when LDAP is disabled (real for MxGateway, which can
disable dashboard LDAP).
- **ScadaBridge** (2 Critical): (C1) `ConfigSecretsTests` still checks the OLD flat key → passes
vacuously, no longer guards secret-in-config — repoint to nested key; (C2) `production-checklist.md`
still lists deleted flat keys → update; (I) unsafe `(RoleMappingResult)Scope!` cast → null-guard.
**Cross-cutting decision — shared library `LdapOptionsValidator` `Enabled` guard:** the validator runs
regardless of `Enabled`, requiring Server/SearchBase/ServiceAccountDn even when LDAP is off. Correct fix =
add an `if (!Enabled) return Success` guard to the shared validator and republish `0.1.1`, re-pinning all
consumers. (Alternative: each consumer always supplies those fields. The library fix is the principled one.)
## Task 1.2/1.4 — DONE (reviewed + fixed, 2026-06-02)
Library hardened to **`0.1.1`** (`LdapOptionsValidator` skips when `Enabled=false`), republished, re-pinned in all 3 repos.
Fix commits: OtOpcUa `c4f315e` (startup insecure-transport guard gated on Enabled/DevStub + `Transport: Ldaps`
declared in the 3 prod overlays + test fidelity), MxGateway `f4dc11b` (group-claim shape documented as
non-breaking — claim read nowhere in prod; shadow `LdapOptions` kept with a drift-warning doc), ScadaBridge
`4db8c37` (secret-test repointed to nested key, prod checklist updated, `Scope` cast guarded). All targeted
suites green. **1.2 (LDAP) + 1.4 (config) complete across all 3 repos.**
Remaining Phase 1: **1.3 ApiKeys** (MxGateway donor cutover — low risk; ScadaBridge full re-architecture —
largest single item: SQLite store + Bearer format + scopes + key re-issuance), **1.5** claims/cookies,
**1.6** dev base DN, **1.7** canonical roles.
## Resolved decisions (2026-06-02)
- **Decision A — ScadaBridge inbound API keys depth → (a) FULL ADOPT.** Re-architect inbound-API auth to the
library's model: `<prefix>_<keyId>_<secret>` Bearer token format, keyId lookup + constant-time compare,
scopes/constraints, and **move inbound API keys into the library's SQLite store** (separate from the SQL Server
config DB). This is the largest, highest-risk item in Phase 1. Implications to handle in Task 1.3:
- New SQLite auth DB for ScadaBridge inbound keys (path via `ApiKeyOptions.SqlitePath`); migrate/retire the
SQL Server `ApiKey{Name,KeyHash}` table + `ApiMethod.ApprovedApiKeyIds` linkage.
- Re-model **per-method approval** as the library's scopes/constraints (or the opaque constraint blob) — the
`ApiMethod.ApprovedApiKeyIds` set becomes per-key scope grants.
- Switch the inbound transport from `X-API-Key` header to `Authorization: Bearer <token>` (a client-visible
contract change — extends the already-accepted token-format change; needs the interop check + a doc/CHANGELOG note).
- Existing raw keys cannot be migrated (deterministic-by-value hash, no keyId/secret split) → **re-issue** all
inbound API keys; call this out in the cutover runbook.
- **Decision B — canonical role mappings → confirmed as tabled above** (OtOpcUa `ConfigViewer→Viewer`,
`ConfigEditor→Designer`, `FleetAdmin→Administrator+Deployer`; MxGateway `Viewer/Admin`; ScadaBridge
`Admin→Administrator`, `Design→Designer`, `Deployment→Deployer`, `Audit→Administrator`, `AuditReadOnly→Viewer`).
- **Decision C — dev escape hatches → keep app-side, unchanged.** OtOpcUa `DevStubMode` and MxGateway
`AllowAnonymousLocalhost`/loopback bypass have no library equivalent; preserve them in each app outside the
shared `Auth.Ldap` path.