Files
scadaproj/docs/plans/2026-06-02-auth-audit-normalization-design.md
T
Joseph Doherty 6ec1ea7d65 docs: design for full Auth+Audit normalization across 3 sister projects
Approved brainstorming output: two-library program (publish + adopt
ZB.MOM.WW.Auth then ZB.MOM.WW.Audit across OtOpcUa, MxAccessGateway,
ScadaBridge), library-major waterfall, ending with audit Actor wired
from the Auth principal. Local-only delivery; verified feed/source state.
2026-06-02 00:04:33 -04:00

12 KiB
Raw Blame History

Design — Auth + Audit normalization across all three sister projects

Date: 2026-06-02 Status: Approved (brainstorming complete) — handing off to writing-plans. Scope owner decision: full two-library normalization (see Scope decisions).

Summary

Bring two shared libraries that already live in scadaproj but are unpublished and adopted by no appZB.MOM.WW.Auth (4 packages) and ZB.MOM.WW.Audit (1 package) — to full adoption across OtOpcUa, MxAccessGateway, and ScadaBridge, ending with every audit emit site carrying the genuine Auth-resolved principal as AuditEvent.Actor.

The original request was "implement the audit component in all sister projects." Because audit GAPS #4 (Actor = the ZB.MOM.WW.Auth principal) requires an authenticated principal at every emit site, and because the owner chose the maximal scope at every fork, the job expands to a two-library program: full Auth adoption (auth GAPS #1#8) first, then full Audit adoption (audit GAPS #1#6) with #4 wiring Actor from the now-live principal.

Verified starting state (source-checked 2026-06-02)

  • Both libraries exist and are pack-ready in scadaproj/ZB.MOM.WW.Auth/ (4 csproj + build/pack.sh + build/push.sh, 172 tests) and scadaproj/ZB.MOM.WW.Audit/ (build/pack.sh, 19 tests). Both at version 0.1.0, both central-package-management.
  • Neither is on the Gitea feed. All five package registration endpoints return HTTP 404. No .nupkg is built locally.
  • Adopted by zero apps. No sibling repo references ZB.MOM.WW.Auth* or ZB.MOM.WW.Audit.
  • Feed source-mapping is missing in all three repos. Each NuGet.config packageSourceMapping lists Health/Telemetry/Configuration but not Auth or Audit, so each repo needs mapping lines added (mirror MxGateway commit 437ab65, which did this for Configuration).
  • The MxGateway audit coordination gate (audit GAPS #2) is CLEAR. MxGateway.Server already references ZB.MOM.WW.Telemetry.Serilog 0.1.0; the Serilog/Telemetry/Configuration work is merged to main. MxGateway audit adoption is unblocked.
  • Established adoption rhythm (Telemetry, Configuration): publish lib to feed → add feed mapping + version pin → behaviour-preserving consumer cutover → land on the repo's local default branch (not pushed to remote).

Per repo memory, prior "published"/"adopted" claims in this workspace have repeatedly been optimistic; every claim above was re-verified against the feed and source on 2026-06-02.

Scope decisions

Fork Decision
How deep into the audit GAPS backlog? Everything incl. #4 Actor→Auth (all of #1#6).
How to satisfy #4 given Auth is unadopted? Adopt Auth first, then audit (two-library program).
How much of the Auth backlog? Full Auth normalization (auth GAPS #1#8, all 3 repos).
How to walk the work matrix? Library-major waterfall (Phase 1 Auth → Phase 2 Audit → Phase 3 wiring).
Remote integration model Local-only; no git push, no PRs (safest for production auth paths; flip per repo later if desired).

Architecture — four phases

Phase 0  Publish & feed-map      pack + push both libs to Gitea feed (fix the 404s);
         (foundation)            add NuGet.config source-mappings + version pins in all 3 repos.

Phase 1  Auth adoption           auth GAPS #1#8 across all 3 repos, in GAPS sequence:
         (largest, sec-sensitive) #3 IGroupRoleMapper seam → #1 Ldap + #2 ApiKeys cutover →
                                  #4 config schema (A1/A2) + #5 claims/cookies → #6 dev base DN →
                                  #8 canonical roles. Each lands behind tests.

Phase 2  Audit adoption          audit GAPS #1#3 core + #5/#6 cleanups across all 3 repos.
         (behaviour-preserving)

Phase 3  Actor→Auth wiring        audit GAPS #4: route the now-live Auth principal into Actor
         (the payoff)            at every emit site. Closes the loop Audit.Actor == Auth principal.

The waterfall is enforced by task dependencies (Phase 0 → 1 → 2 → 3). Phase 1 must fully land before Phase 3 can wire a stable principal; Phase 2 sits after Phase 1 so emit sites aren't touched twice.

Delivery model

  • One feature branch per repo per library phase (feat/adopt-zb-auth, then feat/adopt-zb-audit), behaviour-preserving except where a GAPS item is explicitly net-new.
  • Publish-first: both packages on the feed and verified resolvable before any consumer edit.
  • Land on each repo's local default branch, gated by that repo's tests + new contract tests.
  • Local-only (no push). Each phase is a revertable branch merge.
  • The libraries themselves are plain files in scadaproj (not nested git repos) — publishing is pack + push only; no commits to the libs unless a parity gap forces a fix.

Phase 0 — publish & feed-map (task #7)

  1. dotnet pack -c Release both libraries; push.sh to the Gitea feed (https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json).
  2. Verify all five packages return HTTP 200 from the registration endpoint.
  3. In each repo: add packageSourceMapping patterns (ZB.MOM.WW.Auth, ZB.MOM.WW.Auth.*, ZB.MOM.WW.Audit) to the gitea source, and version pins (Directory.Packages.props for OtOpcUa/ScadaBridge; inline Version="0.1.0" for MxGateway).
  4. dotnet restore resolves the new patterns in all three repos.

Phase 1 — Auth adoption (task #8, blocked by #7)

Consumer cutover (libs are already extracted). GAPS order: #3 seam → #1 Ldap + #2 ApiKeys → #4 config schema + #5 claims/cookies → #6 dev base DN → #8 canonical roles.

OtOpcUa MxAccessGateway ScadaBridge
Packages Abstractions + Ldap + AspNetCore (no ApiKeys — OPC UA transport security) all 4 (source for ApiKeys — cuts over first) all 4 (source for Ldap; ApiKeys consumer after gw)
Role mapper (#3) config-backed (GroupToRole) config-backed DB-backed (LdapGroupMapping)
Config migration (#4) A1: UseTlsTransport enum (section already nested) A1: UseTlsTransport enum A2 (biggest): flat Security:Ldap*→nested; rename LdapUserIdAttributeUserNameAttribute, LdapGroupAttributeGroupAttribute
Cookies/claims (#5) Blazor Admin control-plane cookie keep MxGatewayDashboard name, share claims keep ZB.MOM.WW.ScadaBridge.Auth name, share claims
Canonical roles (#8) no first-class Deployer (publish ⊂ FleetAdmin) no Designer/Deployer roles collapse: AuditReadOnly→Viewer, Audit→Administrator (auditor/admin SoD loss — GAPS-accepted)

Two deliberate behaviour changes (accepted):

  1. ScadaBridge API-key token format (D2): raw X-API-Key → structured <prefix>_<id>_<secret>. A genuine wire change for inbound API clients — acceptable pre-prod, requires an interop check.
  2. Canonical-roles collapse in ScadaBridge removes auditor/admin separation-of-duties.

Known live issue to fix during OtOpcUa cutover: LdapAuthService Enabled/double-singleton wiring is still open even though the Security:Ldap section binding was fixed — fold the fix into the OtOpcUa LDAP cutover.

Risk gate: parity tests reproducing each app's current authn decisions (bind-then-search, fail-closed group lookup, RFC-4514 + filter escaping, constant-time compare, peppered HMAC-SHA256) must be green before any cutover merges.

Phase 2 — Audit adoption (task #9, blocked by #8)

Behaviour-preserving seam/record/enum adoption.

Repo Core work (GAPS #1#3) Keep bespoke
OtOpcUa (#1, #5) Replace Commons/.../AuditEvent.cs with canonical record; AuditWriterActor : IAuditWriter; derive Outcome at emit sites (OpcUaAccessDenied/CrossClusterNamespaceAttempt→Denied, config verbs→Success); bridge NodeId/CorrelationId value-types Akka singleton transport, 500/5s batching, two-layer dedup, ConfigAuditLog EF entity + idempotency index
MxGateway (#2, #6) Map IApiKeyAuditStore/ApiKeyAuditEntryIAuditWriter/AuditEvent; generate EventId; "system"/"cli" Actor fallback; Category="ApiKey"; constraint-denied→Denied SQLite store, 3 producer call sites (only injected type changes), append-only table
ScadaBridge (#3) Outright rename IAuditPayloadFilterIAuditRedactor; adopt canonical AuditOutcome enum; confirm writer contract (byte-identical) — keep bespoke ~25-field record as storage shape Entire Site/Central pipeline, 4 domain enums, CLI export/verify, Blazor UI, redaction policy

Resolved open GAPS decisions:

  1. ScadaBridge rename vs. aliasoutright rename (compiler-verified across the HIGH blast radius).
  2. MxGateway DetailsDetailsJsonwrap as a small JSON object (keeps the field valid JSON).
  3. OtOpcUa Outcome storagenew nullable Outcome column + EF migration (first-class, queryable).
  4. OtOpcUa SP pathleave bespoke + document; do fix the ClusterId-filter/actor mismatch in ClusterAudit.razor so structured rows are visible.

Cleanups in scope: #5 (OtOpcUa SP reconcile + ClusterId visibility fix), #6 (MxGateway CorrelationId capture + structured Target).

Behaviour fix: MxGateway's AppendAsync currently may propagate; wrap it so the adopted IAuditWriter never throws (honors the best-effort contract).

Phase 3 — Actor→Auth wiring (task #10, blocked by #8 + #9)

With Auth live (Phase 1) and the canonical record adopted (Phase 2), route the resolved principal into AuditEvent.Actor everywhere:

  • Seam: one small IAuditActorAccessor — HTTP paths read HttpContext.User; non-HTTP paths (Akka actors, CLI) thread the operation principal or fall back. The single place that changes if the principal source ever changes again.
  • OtOpcUa → LDAP-resolved user. MxGateway → API-key name (system/cli fallback retained for keyless CLI events). ScadaBridge → principal at ManagementActor/inbound boundary.

Contracts, testing & risk gates

Hard seam contracts:

  • IAuditWriter — best-effort, MUST NOT throw, swallow internal failures. OtOpcUa actor , ScadaBridge ; MxGateway needs the never-throw wrap (above).
  • IAuditRedactor — pure, never throws, over-redacts on failure. ScadaBridge's SafeDefaultAuditPayloadFilter is the reference; rename preserves it.

Cross-boundary surface: Auth/Audit adoption is in-process and does not touch the cross-repo wire contracts (gateway .proto files, OPC UA address-space shape) — except the ScadaBridge API-key token-format change, the one item needing an interop check rather than just a green unit build. A green build in one repo does not prove interop.

Per-phase verification (evidence before "done"):

  • Phase 0: all 5 packages HTTP 200; dotnet restore green in all 3 repos.
  • Phase 1: existing auth tests + new parity tests green per repo before merge; SB token-format integration check.
  • Phase 2: existing audit tests + new Outcome/EventId/rename tests; OtOpcUa Outcome migration applies forward.
  • Phase 3: Actor == authenticated principal on authenticated paths; fallback retained on keyless/system paths.
  • Library suites (Audit 19, Auth 172) re-run if any lib is touched. If a parity gap forces a lib fix, bump 0.1.00.1.1 and re-publish rather than editing a published version.

Tasks

Task Item Blocked by
#7 Phase 0 — publish both libs + feed-map all 3 repos
#8 Phase 1 — adopt ZB.MOM.WW.Auth across all 3 repos (auth GAPS #1#8) #7
#9 Phase 2 — adopt ZB.MOM.WW.Audit across all 3 repos (audit GAPS #1#3, #5, #6) #8
#10 Phase 3 — wire Actor from the Auth principal (audit GAPS #4) #8, #9

References

  • components/auth/GAPS.md, components/auth/spec/, components/auth/current-state/*
  • components/audit/GAPS.md, components/audit/shared-contract/ZB.MOM.WW.Audit.md, components/audit/current-state/*
  • Libraries: ZB.MOM.WW.Auth/, ZB.MOM.WW.Audit/
  • Prior adoption precedent: components/configuration/GAPS.md, components/observability/GAPS.md