Files
scadaproj/docs/plans/2026-06-02-auth-audit-normalization-phase1.md
T

19 KiB

Phase 1 (Auth adoption) — elaborated steps + Task 1.0 findings

Companion to 2026-06-02-auth-audit-normalization.md. Produced by the Task 1.0 read-only exploration gate (4 parallel explorers: library surface + 3 repos). All paths verified 2026-06-02 against source.

Cutover target — ZB.MOM.WW.Auth public surface

Package Consumer entry points
.Abstractions NB: IGroupRoleMapper<TRole>/GroupRoleMapping<TRole>/CanonicalRole live in namespace ZB.MOM.WW.Auth.Abstractions.Roles (verified during Task 1.1). ILdapAuthService, LdapOptions (Transport: LdapTransport{Ldaps,StartTls,None}, AllowInsecure, UserNameAttribute, GroupAttribute, ServiceAccountDn/Password, SearchBase, ConnectionTimeoutMs, ServerCertificateValidationCallback), LdapAuthResult(Succeeded,Username,DisplayName,Groups,Failure), LdapAuthFailure, CanonicalRole{Viewer,Operator,Engineer,Designer,Deployer,Administrator}, IGroupRoleMapper<TRole> (no default impl — consumer writes it) → GroupRoleMapping<TRole>(Roles, Scope:object?), plus API-key abstractions (IApiKeyVerifier, ApiKeyVerification, ApiKeyIdentity, IApiKeyStore/IApiKeyAdminStore/IApiKeyAuditStore, ApiKeyOptions{TokenPrefix,PepperSecretName,SqlitePath,RunMigrationsOnStartup})
.Ldap LdapAuthService(LdapOptions) : ILdapAuthService. Bind-then-search, fail-closed, never throws. LdapOptionsValidator (TLS-or-AllowInsecure) auto-registered.
.ApiKeys ApiKeyVerifier(ApiKeyOptions, IApiKeyStore, IApiKeyPepperProvider, TimeProvider?), ApiKeyParser.TryParse (<prefix>_<keyId>_<secret>), ApiKeySecretGenerator.NewSecret(), default SQLite stores, ConfigurationApiKeyPepperProvider. Extracted from MxGateway — near-1:1 with its pipeline.
.AspNetCore ZbClaimTypes{Name,Role,DisplayName,Username,ScopeId}, ZbCookieDefaults.Apply(opts, requireHttps, idleTimeout), DI: AddZbLdapAuth(services, config, sectionPath), AddZbApiKeyAuth(services, config, sectionPath).

Per-app current state (verified) and elaborated cutover

OtOpcUa — packages: Abstractions + Ldap + AspNetCore (no ApiKeys)

Current LDAP: src/Server/ZB.MOM.WW.OtOpcUa.Security/Ldap/LdapAuthService.cs (impl), ILdapAuthService.cs, LdapOptions.cs (section Security:Ldap, UseTls bool, Enabled, DevStubMode, embedded GroupToRole dict), LdapAuthResult.cs (already carries Roles). Role mapping is config + DB: RoleMapper.Map (config GroupToRole) + RoleMapper.Merge with DB LdapGroupRoleMappingService/LdapGroupRoleMapping (system-wide rows). Native roles AdminRole{ConfigViewer,ConfigEditor,FleetAdmin} (control-plane only; data-plane is a separate NodePermissions bitmask). DI: two TryAddSingleton<ILdapAuthService,LdapAuthService> sites (Security/ServiceCollectionExtensions.cs:42 + Host/Program.cs:106). Cookie ZB.MOM.WW.OtOpcUa.Auth, single Cookie scheme (JWT inside cookie). Second LDAP consumer: OPC UA data-plane LdapOpcUaUserAuthenticator + OpcUaApplicationHost.HandleImpersonation call the LDAP service too.

  • 1.1 mapper: implement IGroupRoleMapper<AdminRole> (or <string>) wrapping RoleMapper.Map + DB Merge.
  • 1.2 Ldap: replace LdapAuthService with Auth.Ldap; restructure flow to ILdapAuthService → Groups → IGroupRoleMapper → roles → claims; preserve DevStubMode app-side (library has no stub); wire BOTH consumers (login endpoint + OPC UA impersonation).
  • 1.4 config: UseTlsTransport enum (section already Security:Ldap — see Finding #1).
  • 1.5 cookie/claims: use ZbClaimTypes + ZbCookieDefaults.Apply; keep cookie name.
  • 1.7 roles: ConfigViewer→Viewer, ConfigEditor→Designer, FleetAdmin→Administrator(+Deployer; publish⊂FleetAdmin). Data-plane NodePermissions unaffected.

MxAccessGateway — packages: all 4 (ApiKeys source, cuts over first)

Current API keys (src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/): ApiKeyParser (mxgw_<id>_<secret>), ApiKeySecretHasher (HMAC-SHA256 + pepper MxGateway:ApiKeyPepper), ApiKeySecretGenerator, ApiKeyVerifier (FixedTimeEquals), SQLite stores, ConstraintEnforcer + rich ApiKeyConstraints, gRPC GatewayGrpcAuthorizationInterceptor + GatewayScopes. DI AddSqliteAuthStore(). → near-1:1 with Auth.ApiKeys. LDAP: Dashboard/DashboardAuthenticator.cs (MxGateway:Ldap, UseTls), GroupToRole under MxGateway:Dashboard, roles Admin/Viewer, cookie MxGatewayDashboard.

  • 1.1 mapper: IGroupRoleMapper<string> wrapping DashboardAuthenticator.MapGroupsToRoles.
  • 1.2 Ldap: replace DashboardAuthenticator's LDAP internals with Auth.Ldap (keep dashboard claims/principal build).
  • 1.3 ApiKeys: delete the local parser/hasher/generator/verifier/stores; re-point to Auth.ApiKeys; keep ConstraintEnforcer + gRPC interceptor + scopes on top (constraints carried as the opaque blob). Lowest-risk ApiKeys cutover (it's the donor).
  • 1.4 config: UseTlsTransport.
  • 1.5/1.7: ZbClaimTypes/cookie defaults; Viewer→Viewer, Admin→Administrator.

ScadaBridge — packages: all 4 (Ldap source; ApiKeys consumer)

Current LDAP (src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs): the hardened reference (RFC-4514 DN escape, filter escape, per-op timeout, fail-closed group lookup, username trim, service-account-bind distinction). Config is flat ScadaBridge:Security:Ldap* in SecurityOptions.cs with LdapTransport enum already (Ldaps/StartTls/None), AllowInsecureLdap, LdapUserIdAttribute, LdapGroupAttribute, validated by SecurityOptionsValidator : OptionsValidatorBase. Role mapping DB-backed with site-scoping: RoleMapper.MapGroupsToRolesAsyncRoleMappingResult(Roles, PermittedSiteIds, IsSystemWideDeployment) over LdapGroupMapping + SiteScopeRule (SQL Server). Roles Admin/Design/Deployment/Audit/AuditReadOnly; SoD via OperationalAudit{Admin,Audit,AuditReadOnly} + AuditExport{Admin,Audit}. Cookie ZB.MOM.WW.ScadaBridge.Auth; JWT-in-cookie via JwtTokenService. Inbound API keys (InboundAPI/ApiKeyValidator.cs): raw X-API-Key, deterministic HMAC (ApiKeyHasher, no per-row salt, by-value lookup), ApiKey{Name,KeyHash,IsEnabled} in SQL Server, per-method approval via ApiMethod.ApprovedApiKeyIdsarchitecturally different from the library's keyId/scope/SQLite model.

  • 1.1 mapper: IGroupRoleMapper<string> wrapping RoleMapper.MapGroupsToRolesAsync, carrying PermittedSiteIds/IsSystemWideDeployment in GroupRoleMapping.Scope.
  • 1.2 Ldap: ScadaBridge is the donor — confirm Auth.Ldap behaviour-matches, then re-point LdapAuthService usages to the library type. Lowest-risk Ldap cutover.
  • 1.3 ApiKeys: see Finding #3 — bigger than a token reformat; needs a scope decision.
  • 1.4 config: nest flat Security:Ldap* under a sub-section + rename LdapUserIdAttribute→UserNameAttribute, LdapGroupAttribute→GroupAttribute, LdapTransport→Transport (+ SecurityOptionsValidator + appsettings). Enum already matches.
  • 1.7 roles: Admin→Administrator, Design→Designer, Deployment→Deployer, Audit→Administrator (collapse), AuditReadOnly→Viewer (collapse) — removes the OperationalAudit/AuditExport SoD (accepted).

Key findings that change the plan

  1. OtOpcUa LDAP section is Security:Ldap, not Authentication:Ldap. Both components/auth/GAPS.md §1 and the auth current-state doc are wrong; the code (and the prior fix in memory) use Security:Ldap. → Task 1.4 for OtOpcUa is only UseTlsTransport, not a section move.
  2. OtOpcUa "double-singleton bug" is already mitigated. Both registration sites use TryAddSingleton (dedupes); the Enabled flag is an intentional fail-closed master switch. → Not a blocking fix; verify and keep Enabled. Removes a risk the plan flagged.
  3. ScadaBridge inbound API keys are a re-architecture, not a token reformat. The library's ApiKeys model (<prefix>_<keyId>_<secret> Bearer, keyId lookup + constant-time compare, SQLite store, scopes + opaque constraints) is fundamentally different from ScadaBridge's (raw X-API-Key, deterministic by-value HMAC lookup, SQL Server ApiKey{Name,KeyHash}, per-method approval list). Wholesale adoption means re-architecting inbound-API auth AND resolving a SQLite-vs-SQL-Server storage mismatch. Needs a scope decision (Decision A).
  4. OtOpcUa role mapping is config + DB, not just config (RoleMapper.Map baseline + DB Merge). The IGroupRoleMapper impl must combine both. OtOpcUa also has DevStubMode (no library equivalent — keep app-side) and a second LDAP consumer (OPC UA data-plane impersonation) that must be re-wired too.
  5. MxGateway ApiKeys cutover is the donor path — lowest risk (delete locals, re-point to library; keep ConstraintEnforcer/gRPC/scopes on top). Confirms the GAPS sequencing (gateway first).

Task 1.2 (LDAP cutover) — implemented + reviewed (2026-06-02)

Commits: OtOpcUa 257caa7, MxGateway c3b466e, ScadaBridge ac34dac. All targeted tests green. Security review verdict: sound, no credential-leak regression in any repo (insecure-transport guards fire correctly; DevStubMode cannot leak to prod; claim shapes preserved). All three returned CHANGES-REQUESTED for fixable issues:

  • OtOpcUa (no Critical): (I1) insecure-transport guard is login-time only — add startup validation gated on Enabled for defense-in-depth, verify prod overlays still boot; (I2) integration stub pre-populates Roles so the Groups→mapper path isn't actually exercised — fix the stub; (I3) document/test the zero-role fail-closed fallback.
  • MxGateway (2 Critical): (C1) library strips group DNs to short RDN names before the LdapGroupClaimType claim → verify prior behaviour, document, drop the now-dead full-DN branch in the mapper, add a claim-value assertion; (C2) gateway's local LdapOptions is now a shadow copy (validated but unused at runtime) → fold to the shared type or document the drift. (I1) shared LdapOptionsValidator has no Enabled=false guard → validates even when LDAP is disabled (real for MxGateway, which can disable dashboard LDAP).
  • ScadaBridge (2 Critical): (C1) ConfigSecretsTests still checks the OLD flat key → passes vacuously, no longer guards secret-in-config — repoint to nested key; (C2) production-checklist.md still lists deleted flat keys → update; (I) unsafe (RoleMappingResult)Scope! cast → null-guard.

Cross-cutting decision — shared library LdapOptionsValidator Enabled guard: the validator runs regardless of Enabled, requiring Server/SearchBase/ServiceAccountDn even when LDAP is off. Correct fix = add an if (!Enabled) return Success guard to the shared validator and republish 0.1.1, re-pinning all consumers. (Alternative: each consumer always supplies those fields. The library fix is the principled one.)

Task 1.2/1.4 — DONE (reviewed + fixed, 2026-06-02)

Library hardened to 0.1.1 (LdapOptionsValidator skips when Enabled=false), republished, re-pinned in all 3 repos. Fix commits: OtOpcUa c4f315e (startup insecure-transport guard gated on Enabled/DevStub + Transport: Ldaps declared in the 3 prod overlays + test fidelity), MxGateway f4dc11b (group-claim shape documented as non-breaking — claim read nowhere in prod; shadow LdapOptions kept with a drift-warning doc), ScadaBridge 4db8c37 (secret-test repointed to nested key, prod checklist updated, Scope cast guarded). All targeted suites green. 1.2 (LDAP) + 1.4 (config) complete across all 3 repos.

Remaining Phase 1: 1.3 ApiKeys (MxGateway donor cutover — low risk; ScadaBridge full re-architecture — largest single item: SQLite store + Bearer format + scopes + key re-issuance), 1.5 claims/cookies, 1.6 dev base DN, 1.7 canonical roles.

Task 1.3 ApiKeys — MxGateway DONE; ScadaBridge pending (2026-06-02)

Library bumped to 0.1.2: Auth.ApiKeys SQLite migrator now stamps schema version 2 (was 1) to match the donor gateway's deployed gateway-auth.db — without it the gateway would fail to boot (migrator threw on a newer on-disk version). Final schema byte-identical since v1; no key re-issuance. Republished, re-pinned in MxGateway. (+2 migrator tests.)

MxGateway 1.3 — DONE + APPROVED (commit 05009d7): deleted 28 local pipeline files, adopted Auth.ApiKeys 0.1.2 via AddZbApiKeyAuth; kept ConstraintEnforcer/gRPC interceptor/scopes/CLI/dashboard on top via a GatewayApiKeyIdentityMapper (library identity → gateway identity-with-EffectiveConstraints). Review: no Critical; no auth bypass, schema compat + crypto parity + gRPC status mapping verified. Non-blocking follow-ups: (a) dashboard mutations now write two audit rows (library + dashboard-*) — fine, note for Phase 2 audit bridging; (b) nit: GatewayApiKeyIdentityMapper uses Constraints as string (opaque coupling) — consider a guard/contract test.

ScadaBridge 1.3 — PENDING: the full inbound-API re-architecture (SQL Server → SQLite store, X-API-Key → Bearer, per-method-approval → scopes/constraints, all inbound keys re-issued). Largest/highest-risk single item in the program; warrants its own focused pass (likely decomposed).

ScadaBridge ApiKeys re-architecture — spec (FULL ADOPT, 2026-06-02)

Decision: full adopt the library SQLite store + scopes model. Single consistent contract all layers build to:

  • Token format: Authorization: Bearer sbk_<keyId>_<secret> (prefix sbk). Replaces the raw X-API-Key header.
  • Scope model = method name. A key's Scopes set = the API-method names it may call. ApiMethod.ApprovedApiKeyIds (CSV of key int IDs) is retired; per-method approval moves to the key's scopes. Auth check at the endpoint: identity.Scopes.Contains(methodName).
  • Storage: inbound keys move to the library's SQLite store (new ScadaBridge:InboundApi:ApiKeyStore sqlite path
    • pepper via ApiKeyOptions.PepperSecretName, RunMigrationsOnStartup). The SQL Server ApiKey entity is retired; ApiMethod is KEPT minus ApprovedApiKeyIds (EF migration drops the column). InboundApiRepository loses its ApiKey methods + GetApprovedKeysForMethodAsync.
  • Auth path (InboundAPI): endpoint reads Bearer, calls library IApiKeyVerifier.VerifyAsync, then the scope check. PRESERVE the security invariants: 401 (missing/invalid/disabled), 403 identical message for both "method not found" and "not in scope" (enumeration-safety, InboundAPI-011), constant-time compare (library does it), active-node 503 + body-cap 413 filters unchanged, audit actor = key DisplayName. Delete ApiKeyValidator hashing + ApiKeyHasher.
  • Management (ManagementActor + CLI security api-key + Commons messages): drive the library IApiKeyAdminStore + ApiKeySecretGenerator. create returns sbk_<keyId>_<secret> once (plaintext-once preserved); methods a key may call = its scopes, set on create/update (e.g. --methods a,b or grant/revoke-method commands). list returns id/name/enabled (no secret), update --enabled, delete/revoke. Audit preserved.
  • CentralUI: ApiKeys.razor (list/create/toggle/delete via admin store; show token once), ApiKeyForm.razor (edit the key's method-scopes), ApiMethodForm.razor (method-side "approved keys" now reads/writes key scopes across keys).
  • Breaking change: all inbound keys re-issued (new format); clients switch X-API-KeyAuthorization: Bearer. Needs a runbook + CHANGELOG. Re-pin ScadaBridge Auth packages to 0.1.2.

Sub-tasks (sequential where files overlap): (A) storage retire + EF migration + library wiring/options; (B) auth-path rewrite (Bearer + verifier + scope check); (C) management (ManagementActor + CLI + messages); (D) CentralUI pages; (E) runbook/CHANGELOG + integration test sweep. A→(B,C)→D→E. Sequencing note: doing it additively (add library path, switch auth, rewire mgmt/UI, retire SQL Server entity LAST) keeps the build green at each step.

Re-arch progress

  • A+B foundation — DONE + reviewed+fixed (commits a94558c, 1fcc4f5; re-pinned to 0.1.2). Library AddZbApiKeyAuth wired additively (ScadaBridge:InboundApi:ApiKeyStore, prefix sbk, reuses inbound pepper); inbound endpoint now uses the library verifier + Bearer + Scopes.Contains(methodName). Security invariants preserved: 401 generic / 403 identical body for not-found AND not-in-scope (enumeration-safe, pinned to a literal in tests), scope-check-before-DB (no timing oracle), fail-fast pepper preflight (Central), audit actor = DisplayName. Old SQL Server path still compiles (retired in E). 163/163 InboundAPI tests green. NOTE for E: the library's ApiKeySecretGenerator.NewSecret() is internal — seed/create keys via the public ApiKeyAdminCommands.CreateKeyAsync seam (returns the assembled sbk_… token).
  • C (management), D (CentralUI), E (retire SQL Server ApiKey + ApiMethod.ApprovedApiKeyIds migration + runbook/CHANGELOG) — PENDING.

Resolved decisions (2026-06-02)

  • Decision A — ScadaBridge inbound API keys depth → (a) FULL ADOPT. Re-architect inbound-API auth to the library's model: <prefix>_<keyId>_<secret> Bearer token format, keyId lookup + constant-time compare, scopes/constraints, and move inbound API keys into the library's SQLite store (separate from the SQL Server config DB). This is the largest, highest-risk item in Phase 1. Implications to handle in Task 1.3:
    • New SQLite auth DB for ScadaBridge inbound keys (path via ApiKeyOptions.SqlitePath); migrate/retire the SQL Server ApiKey{Name,KeyHash} table + ApiMethod.ApprovedApiKeyIds linkage.
    • Re-model per-method approval as the library's scopes/constraints (or the opaque constraint blob) — the ApiMethod.ApprovedApiKeyIds set becomes per-key scope grants.
    • Switch the inbound transport from X-API-Key header to Authorization: Bearer <token> (a client-visible contract change — extends the already-accepted token-format change; needs the interop check + a doc/CHANGELOG note).
    • Existing raw keys cannot be migrated (deterministic-by-value hash, no keyId/secret split) → re-issue all inbound API keys; call this out in the cutover runbook.
  • Decision B — canonical role mappings → confirmed as tabled above (OtOpcUa ConfigViewer→Viewer, ConfigEditor→Designer, FleetAdmin→Administrator+Deployer; MxGateway Viewer/Admin; ScadaBridge Admin→Administrator, Design→Designer, Deployment→Deployer, Audit→Administrator, AuditReadOnly→Viewer).
  • Decision C — dev escape hatches → keep app-side, unchanged. OtOpcUa DevStubMode and MxGateway AllowAnonymousLocalhost/loopback bypass have no library equivalent; preserve them in each app outside the shared Auth.Ldap path.