diff --git a/docs/plans/2026-06-02-auth-audit-normalization-design.md b/docs/plans/2026-06-02-auth-audit-normalization-design.md new file mode 100644 index 0000000..77ca5f1 --- /dev/null +++ b/docs/plans/2026-06-02-auth-audit-normalization-design.md @@ -0,0 +1,195 @@ +# Design — Auth + Audit normalization across all three sister projects + +**Date:** 2026-06-02 +**Status:** Approved (brainstorming complete) — handing off to writing-plans. +**Scope owner decision:** full two-library normalization (see [Scope decisions](#scope-decisions)). + +## Summary + +Bring two shared libraries that already live in `scadaproj` but are **unpublished and +adopted by no app** — `ZB.MOM.WW.Auth` (4 packages) and `ZB.MOM.WW.Audit` (1 package) — +to **full adoption across OtOpcUa, MxAccessGateway, and ScadaBridge**, ending with every +audit emit site carrying the genuine Auth-resolved principal as `AuditEvent.Actor`. + +The original request was "implement the audit component in all sister projects." Because +audit GAPS #4 (Actor = the `ZB.MOM.WW.Auth` principal) requires an authenticated principal +at every emit site, and because the owner chose the maximal scope at every fork, the job +expands to a **two-library program**: full Auth adoption (auth GAPS #1–#8) first, then full +Audit adoption (audit GAPS #1–#6) with #4 wiring `Actor` from the now-live principal. + +## Verified starting state (source-checked 2026-06-02) + +- **Both libraries exist and are pack-ready** in `scadaproj/ZB.MOM.WW.Auth/` (4 csproj + + `build/pack.sh` + `build/push.sh`, 172 tests) and `scadaproj/ZB.MOM.WW.Audit/` + (`build/pack.sh`, 19 tests). Both at version `0.1.0`, both central-package-management. +- **Neither is on the Gitea feed.** All five package registration endpoints return + **HTTP 404**. No `.nupkg` is built locally. +- **Adopted by zero apps.** No sibling repo references `ZB.MOM.WW.Auth*` or `ZB.MOM.WW.Audit`. +- **Feed source-mapping is missing in all three repos.** Each `NuGet.config` + `packageSourceMapping` lists Health/Telemetry/Configuration but **not** Auth or Audit, so + each repo needs mapping lines added (mirror MxGateway commit `437ab65`, which did this for + Configuration). +- **The MxGateway audit coordination gate (audit GAPS #2) is CLEAR.** `MxGateway.Server` + already references `ZB.MOM.WW.Telemetry.Serilog 0.1.0`; the Serilog/Telemetry/Configuration + work is merged to `main`. MxGateway audit adoption is unblocked. +- Established adoption rhythm (Telemetry, Configuration): publish lib to feed → add feed + mapping + version pin → behaviour-preserving consumer cutover → land on the repo's local + default branch (not pushed to remote). + +> Per repo memory, prior "published"/"adopted" claims in this workspace have repeatedly been +> optimistic; every claim above was re-verified against the feed and source on 2026-06-02. + +## Scope decisions + +| Fork | Decision | +|---|---| +| How deep into the audit GAPS backlog? | **Everything incl. #4 Actor→Auth** (all of #1–#6). | +| How to satisfy #4 given Auth is unadopted? | **Adopt Auth first, then audit** (two-library program). | +| How much of the Auth backlog? | **Full Auth normalization** (auth GAPS #1–#8, all 3 repos). | +| How to walk the work matrix? | **Library-major waterfall** (Phase 1 Auth → Phase 2 Audit → Phase 3 wiring). | +| Remote integration model | **Local-only**; no `git push`, no PRs (safest for production auth paths; flip per repo later if desired). | + +## Architecture — four phases + +``` +Phase 0 Publish & feed-map pack + push both libs to Gitea feed (fix the 404s); + (foundation) add NuGet.config source-mappings + version pins in all 3 repos. + +Phase 1 Auth adoption auth GAPS #1–#8 across all 3 repos, in GAPS sequence: + (largest, sec-sensitive) #3 IGroupRoleMapper seam → #1 Ldap + #2 ApiKeys cutover → + #4 config schema (A1/A2) + #5 claims/cookies → #6 dev base DN → + #8 canonical roles. Each lands behind tests. + +Phase 2 Audit adoption audit GAPS #1–#3 core + #5/#6 cleanups across all 3 repos. + (behaviour-preserving) + +Phase 3 Actor→Auth wiring audit GAPS #4: route the now-live Auth principal into Actor + (the payoff) at every emit site. Closes the loop Audit.Actor == Auth principal. +``` + +The waterfall is enforced by task dependencies (Phase 0 → 1 → 2 → 3). Phase 1 must fully +land before Phase 3 can wire a *stable* principal; Phase 2 sits after Phase 1 so emit sites +aren't touched twice. + +### Delivery model + +- One **feature branch per repo per library phase** (`feat/adopt-zb-auth`, then + `feat/adopt-zb-audit`), behaviour-preserving except where a GAPS item is explicitly net-new. +- **Publish-first**: both packages on the feed and verified resolvable before any consumer edit. +- **Land on each repo's local default branch**, gated by that repo's tests + new contract tests. +- **Local-only** (no push). Each phase is a revertable branch merge. +- The libraries themselves are plain files in `scadaproj` (not nested git repos) — publishing + is `pack` + `push` only; no commits to the libs unless a parity gap forces a fix. + +## Phase 0 — publish & feed-map *(task #7)* + +1. `dotnet pack -c Release` both libraries; `push.sh` to the Gitea feed + (`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`). +2. Verify all five packages return HTTP 200 from the registration endpoint. +3. In each repo: add `packageSourceMapping` patterns (`ZB.MOM.WW.Auth`, `ZB.MOM.WW.Auth.*`, + `ZB.MOM.WW.Audit`) to the gitea source, and version pins (`Directory.Packages.props` for + OtOpcUa/ScadaBridge; inline `Version="0.1.0"` for MxGateway). +4. `dotnet restore` resolves the new patterns in all three repos. + +## Phase 1 — Auth adoption *(task #8, blocked by #7)* + +Consumer cutover (libs are already extracted). GAPS order: #3 seam → #1 Ldap + #2 ApiKeys → +#4 config schema + #5 claims/cookies → #6 dev base DN → #8 canonical roles. + +| | OtOpcUa | MxAccessGateway | ScadaBridge | +|---|---|---|---| +| Packages | Abstractions + Ldap + AspNetCore (no ApiKeys — OPC UA transport security) | all 4 (**source** for ApiKeys — cuts over first) | all 4 (**source** for Ldap; ApiKeys consumer after gw) | +| Role mapper (#3) | config-backed (`GroupToRole`) | config-backed | **DB-backed** (`LdapGroupMapping`) | +| Config migration (#4) | A1: `UseTls`→`Transport` enum (section already nested) | A1: `UseTls`→`Transport` enum | **A2 (biggest)**: flat `Security:Ldap*`→nested; rename `LdapUserIdAttribute`→`UserNameAttribute`, `LdapGroupAttribute`→`GroupAttribute` | +| Cookies/claims (#5) | Blazor Admin control-plane cookie | keep `MxGatewayDashboard` name, share claims | keep `ZB.MOM.WW.ScadaBridge.Auth` name, share claims | +| Canonical roles (#8) | no first-class `Deployer` (publish ⊂ `FleetAdmin`) | no `Designer`/`Deployer` | **roles collapse**: `AuditReadOnly`→Viewer, `Audit`→Administrator (auditor/admin SoD loss — GAPS-accepted) | + +**Two deliberate behaviour changes (accepted):** +1. **ScadaBridge API-key token format** (D2): raw `X-API-Key` → structured + `__`. A genuine wire change for inbound API clients — acceptable + pre-prod, requires an interop check. +2. **Canonical-roles collapse** in ScadaBridge removes auditor/admin separation-of-duties. + +**Known live issue to fix during OtOpcUa cutover:** `LdapAuthService` `Enabled`/double-singleton +wiring is still open even though the `Security:Ldap` section binding was fixed — fold the fix +into the OtOpcUa LDAP cutover. + +**Risk gate:** parity tests reproducing each app's current authn decisions (bind-then-search, +fail-closed group lookup, RFC-4514 + filter escaping, constant-time compare, peppered +HMAC-SHA256) must be green before any cutover merges. + +## Phase 2 — Audit adoption *(task #9, blocked by #8)* + +Behaviour-preserving seam/record/enum adoption. + +| Repo | Core work (GAPS #1–#3) | Keep bespoke | +|---|---|---| +| **OtOpcUa** (#1, #5) | Replace `Commons/.../AuditEvent.cs` with canonical record; `AuditWriterActor : IAuditWriter`; derive `Outcome` at emit sites (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→Denied, config verbs→Success); bridge `NodeId`/`CorrelationId` value-types | Akka singleton transport, 500/5s batching, two-layer dedup, `ConfigAuditLog` EF entity + idempotency index | +| **MxGateway** (#2, #6) | Map `IApiKeyAuditStore`/`ApiKeyAuditEntry`→`IAuditWriter`/`AuditEvent`; generate `EventId`; `"system"`/`"cli"` Actor fallback; `Category="ApiKey"`; `constraint-denied`→Denied | SQLite store, 3 producer call sites (only injected type changes), append-only table | +| **ScadaBridge** (#3) | Outright rename `IAuditPayloadFilter`→`IAuditRedactor`; adopt canonical `AuditOutcome` enum; confirm writer contract (byte-identical) — keep bespoke ~25-field record as storage shape | Entire Site/Central pipeline, 4 domain enums, CLI export/verify, Blazor UI, redaction policy | + +**Resolved open GAPS decisions:** +1. **ScadaBridge rename vs. alias** → **outright rename** (compiler-verified across the HIGH blast radius). +2. **MxGateway `Details`→`DetailsJson`** → **wrap as a small JSON object** (keeps the field valid JSON). +3. **OtOpcUa `Outcome` storage** → **new nullable `Outcome` column + EF migration** (first-class, queryable). +4. **OtOpcUa SP path** → **leave bespoke + document**; *do* fix the `ClusterId`-filter/actor + mismatch in `ClusterAudit.razor` so structured rows are visible. + +**Cleanups in scope:** #5 (OtOpcUa SP reconcile + `ClusterId` visibility fix), #6 (MxGateway +`CorrelationId` capture + structured `Target`). + +**Behaviour fix:** MxGateway's `AppendAsync` currently may propagate; wrap it so the adopted +`IAuditWriter` never throws (honors the best-effort contract). + +## Phase 3 — Actor→Auth wiring *(task #10, blocked by #8 + #9)* + +With Auth live (Phase 1) and the canonical record adopted (Phase 2), route the resolved +principal into `AuditEvent.Actor` everywhere: + +- **Seam:** one small `IAuditActorAccessor` — HTTP paths read `HttpContext.User`; non-HTTP + paths (Akka actors, CLI) thread the operation principal or fall back. The single place that + changes if the principal source ever changes again. +- OtOpcUa → LDAP-resolved user. MxGateway → API-key name (system/cli fallback retained for + keyless CLI events). ScadaBridge → principal at `ManagementActor`/inbound boundary. + +## Contracts, testing & risk gates + +**Hard seam contracts:** +- `IAuditWriter` — best-effort, MUST NOT throw, swallow internal failures. OtOpcUa actor ✅, + ScadaBridge ✅; MxGateway needs the never-throw wrap (above). +- `IAuditRedactor` — pure, never throws, over-redacts on failure. ScadaBridge's + `SafeDefaultAuditPayloadFilter` is the reference; rename preserves it. + +**Cross-boundary surface:** Auth/Audit adoption is in-process and does **not** touch the +cross-repo wire contracts (gateway `.proto` files, OPC UA address-space shape) — **except** the +ScadaBridge API-key token-format change, the one item needing an interop check rather than just +a green unit build. A green build in one repo does not prove interop. + +**Per-phase verification (evidence before "done"):** +- **Phase 0:** all 5 packages HTTP 200; `dotnet restore` green in all 3 repos. +- **Phase 1:** existing auth tests + new parity tests green per repo before merge; SB + token-format integration check. +- **Phase 2:** existing audit tests + new `Outcome`/`EventId`/rename tests; OtOpcUa `Outcome` + migration applies forward. +- **Phase 3:** `Actor == authenticated principal` on authenticated paths; fallback retained on + keyless/system paths. +- **Library suites** (Audit 19, Auth 172) re-run if any lib is touched. If a parity gap forces + a lib fix, bump `0.1.0`→`0.1.1` and re-publish rather than editing a published version. + +## Tasks + +| Task | Item | Blocked by | +|---|---|---| +| #7 | Phase 0 — publish both libs + feed-map all 3 repos | — | +| #8 | Phase 1 — adopt ZB.MOM.WW.Auth across all 3 repos (auth GAPS #1–#8) | #7 | +| #9 | Phase 2 — adopt ZB.MOM.WW.Audit across all 3 repos (audit GAPS #1–#3, #5, #6) | #8 | +| #10 | Phase 3 — wire Actor from the Auth principal (audit GAPS #4) | #8, #9 | + +## References + +- `components/auth/GAPS.md`, `components/auth/spec/`, `components/auth/current-state/*` +- `components/audit/GAPS.md`, `components/audit/shared-contract/ZB.MOM.WW.Audit.md`, + `components/audit/current-state/*` +- Libraries: `ZB.MOM.WW.Auth/`, `ZB.MOM.WW.Audit/` +- Prior adoption precedent: `components/configuration/GAPS.md`, + `components/observability/GAPS.md`