Files
scadaproj/docs/plans/2026-06-02-auth-audit-normalization-design.md
T
Joseph Doherty 6ec1ea7d65 docs: design for full Auth+Audit normalization across 3 sister projects
Approved brainstorming output: two-library program (publish + adopt
ZB.MOM.WW.Auth then ZB.MOM.WW.Audit across OtOpcUa, MxAccessGateway,
ScadaBridge), library-major waterfall, ending with audit Actor wired
from the Auth principal. Local-only delivery; verified feed/source state.
2026-06-02 00:04:33 -04:00

196 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design — Auth + Audit normalization across all three sister projects
**Date:** 2026-06-02
**Status:** Approved (brainstorming complete) — handing off to writing-plans.
**Scope owner decision:** full two-library normalization (see [Scope decisions](#scope-decisions)).
## Summary
Bring two shared libraries that already live in `scadaproj` but are **unpublished and
adopted by no app** — `ZB.MOM.WW.Auth` (4 packages) and `ZB.MOM.WW.Audit` (1 package) —
to **full adoption across OtOpcUa, MxAccessGateway, and ScadaBridge**, ending with every
audit emit site carrying the genuine Auth-resolved principal as `AuditEvent.Actor`.
The original request was "implement the audit component in all sister projects." Because
audit GAPS #4 (Actor = the `ZB.MOM.WW.Auth` principal) requires an authenticated principal
at every emit site, and because the owner chose the maximal scope at every fork, the job
expands to a **two-library program**: full Auth adoption (auth GAPS #1#8) first, then full
Audit adoption (audit GAPS #1#6) with #4 wiring `Actor` from the now-live principal.
## Verified starting state (source-checked 2026-06-02)
- **Both libraries exist and are pack-ready** in `scadaproj/ZB.MOM.WW.Auth/` (4 csproj +
`build/pack.sh` + `build/push.sh`, 172 tests) and `scadaproj/ZB.MOM.WW.Audit/`
(`build/pack.sh`, 19 tests). Both at version `0.1.0`, both central-package-management.
- **Neither is on the Gitea feed.** All five package registration endpoints return
**HTTP 404**. No `.nupkg` is built locally.
- **Adopted by zero apps.** No sibling repo references `ZB.MOM.WW.Auth*` or `ZB.MOM.WW.Audit`.
- **Feed source-mapping is missing in all three repos.** Each `NuGet.config`
`packageSourceMapping` lists Health/Telemetry/Configuration but **not** Auth or Audit, so
each repo needs mapping lines added (mirror MxGateway commit `437ab65`, which did this for
Configuration).
- **The MxGateway audit coordination gate (audit GAPS #2) is CLEAR.** `MxGateway.Server`
already references `ZB.MOM.WW.Telemetry.Serilog 0.1.0`; the Serilog/Telemetry/Configuration
work is merged to `main`. MxGateway audit adoption is unblocked.
- Established adoption rhythm (Telemetry, Configuration): publish lib to feed → add feed
mapping + version pin → behaviour-preserving consumer cutover → land on the repo's local
default branch (not pushed to remote).
> Per repo memory, prior "published"/"adopted" claims in this workspace have repeatedly been
> optimistic; every claim above was re-verified against the feed and source on 2026-06-02.
## Scope decisions
| Fork | Decision |
|---|---|
| How deep into the audit GAPS backlog? | **Everything incl. #4 Actor→Auth** (all of #1#6). |
| How to satisfy #4 given Auth is unadopted? | **Adopt Auth first, then audit** (two-library program). |
| How much of the Auth backlog? | **Full Auth normalization** (auth GAPS #1#8, all 3 repos). |
| How to walk the work matrix? | **Library-major waterfall** (Phase 1 Auth → Phase 2 Audit → Phase 3 wiring). |
| Remote integration model | **Local-only**; no `git push`, no PRs (safest for production auth paths; flip per repo later if desired). |
## Architecture — four phases
```
Phase 0 Publish & feed-map pack + push both libs to Gitea feed (fix the 404s);
(foundation) add NuGet.config source-mappings + version pins in all 3 repos.
Phase 1 Auth adoption auth GAPS #1#8 across all 3 repos, in GAPS sequence:
(largest, sec-sensitive) #3 IGroupRoleMapper seam → #1 Ldap + #2 ApiKeys cutover →
#4 config schema (A1/A2) + #5 claims/cookies → #6 dev base DN →
#8 canonical roles. Each lands behind tests.
Phase 2 Audit adoption audit GAPS #1#3 core + #5/#6 cleanups across all 3 repos.
(behaviour-preserving)
Phase 3 Actor→Auth wiring audit GAPS #4: route the now-live Auth principal into Actor
(the payoff) at every emit site. Closes the loop Audit.Actor == Auth principal.
```
The waterfall is enforced by task dependencies (Phase 0 → 1 → 2 → 3). Phase 1 must fully
land before Phase 3 can wire a *stable* principal; Phase 2 sits after Phase 1 so emit sites
aren't touched twice.
### Delivery model
- One **feature branch per repo per library phase** (`feat/adopt-zb-auth`, then
`feat/adopt-zb-audit`), behaviour-preserving except where a GAPS item is explicitly net-new.
- **Publish-first**: both packages on the feed and verified resolvable before any consumer edit.
- **Land on each repo's local default branch**, gated by that repo's tests + new contract tests.
- **Local-only** (no push). Each phase is a revertable branch merge.
- The libraries themselves are plain files in `scadaproj` (not nested git repos) — publishing
is `pack` + `push` only; no commits to the libs unless a parity gap forces a fix.
## Phase 0 — publish & feed-map *(task #7)*
1. `dotnet pack -c Release` both libraries; `push.sh` to the Gitea feed
(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`).
2. Verify all five packages return HTTP 200 from the registration endpoint.
3. In each repo: add `packageSourceMapping` patterns (`ZB.MOM.WW.Auth`, `ZB.MOM.WW.Auth.*`,
`ZB.MOM.WW.Audit`) to the gitea source, and version pins (`Directory.Packages.props` for
OtOpcUa/ScadaBridge; inline `Version="0.1.0"` for MxGateway).
4. `dotnet restore` resolves the new patterns in all three repos.
## Phase 1 — Auth adoption *(task #8, blocked by #7)*
Consumer cutover (libs are already extracted). GAPS order: #3 seam → #1 Ldap + #2 ApiKeys →
#4 config schema + #5 claims/cookies → #6 dev base DN → #8 canonical roles.
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Packages | Abstractions + Ldap + AspNetCore (no ApiKeys — OPC UA transport security) | all 4 (**source** for ApiKeys — cuts over first) | all 4 (**source** for Ldap; ApiKeys consumer after gw) |
| Role mapper (#3) | config-backed (`GroupToRole`) | config-backed | **DB-backed** (`LdapGroupMapping`) |
| Config migration (#4) | A1: `UseTls``Transport` enum (section already nested) | A1: `UseTls``Transport` enum | **A2 (biggest)**: flat `Security:Ldap*`→nested; rename `LdapUserIdAttribute``UserNameAttribute`, `LdapGroupAttribute``GroupAttribute` |
| Cookies/claims (#5) | Blazor Admin control-plane cookie | keep `MxGatewayDashboard` name, share claims | keep `ZB.MOM.WW.ScadaBridge.Auth` name, share claims |
| Canonical roles (#8) | no first-class `Deployer` (publish ⊂ `FleetAdmin`) | no `Designer`/`Deployer` | **roles collapse**: `AuditReadOnly`→Viewer, `Audit`→Administrator (auditor/admin SoD loss — GAPS-accepted) |
**Two deliberate behaviour changes (accepted):**
1. **ScadaBridge API-key token format** (D2): raw `X-API-Key` → structured
`<prefix>_<id>_<secret>`. A genuine wire change for inbound API clients — acceptable
pre-prod, requires an interop check.
2. **Canonical-roles collapse** in ScadaBridge removes auditor/admin separation-of-duties.
**Known live issue to fix during OtOpcUa cutover:** `LdapAuthService` `Enabled`/double-singleton
wiring is still open even though the `Security:Ldap` section binding was fixed — fold the fix
into the OtOpcUa LDAP cutover.
**Risk gate:** parity tests reproducing each app's current authn decisions (bind-then-search,
fail-closed group lookup, RFC-4514 + filter escaping, constant-time compare, peppered
HMAC-SHA256) must be green before any cutover merges.
## Phase 2 — Audit adoption *(task #9, blocked by #8)*
Behaviour-preserving seam/record/enum adoption.
| Repo | Core work (GAPS #1#3) | Keep bespoke |
|---|---|---|
| **OtOpcUa** (#1, #5) | Replace `Commons/.../AuditEvent.cs` with canonical record; `AuditWriterActor : IAuditWriter`; derive `Outcome` at emit sites (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→Denied, config verbs→Success); bridge `NodeId`/`CorrelationId` value-types | Akka singleton transport, 500/5s batching, two-layer dedup, `ConfigAuditLog` EF entity + idempotency index |
| **MxGateway** (#2, #6) | Map `IApiKeyAuditStore`/`ApiKeyAuditEntry``IAuditWriter`/`AuditEvent`; generate `EventId`; `"system"`/`"cli"` Actor fallback; `Category="ApiKey"`; `constraint-denied`→Denied | SQLite store, 3 producer call sites (only injected type changes), append-only table |
| **ScadaBridge** (#3) | Outright rename `IAuditPayloadFilter``IAuditRedactor`; adopt canonical `AuditOutcome` enum; confirm writer contract (byte-identical) — keep bespoke ~25-field record as storage shape | Entire Site/Central pipeline, 4 domain enums, CLI export/verify, Blazor UI, redaction policy |
**Resolved open GAPS decisions:**
1. **ScadaBridge rename vs. alias****outright rename** (compiler-verified across the HIGH blast radius).
2. **MxGateway `Details`→`DetailsJson`****wrap as a small JSON object** (keeps the field valid JSON).
3. **OtOpcUa `Outcome` storage****new nullable `Outcome` column + EF migration** (first-class, queryable).
4. **OtOpcUa SP path****leave bespoke + document**; *do* fix the `ClusterId`-filter/actor
mismatch in `ClusterAudit.razor` so structured rows are visible.
**Cleanups in scope:** #5 (OtOpcUa SP reconcile + `ClusterId` visibility fix), #6 (MxGateway
`CorrelationId` capture + structured `Target`).
**Behaviour fix:** MxGateway's `AppendAsync` currently may propagate; wrap it so the adopted
`IAuditWriter` never throws (honors the best-effort contract).
## Phase 3 — Actor→Auth wiring *(task #10, blocked by #8 + #9)*
With Auth live (Phase 1) and the canonical record adopted (Phase 2), route the resolved
principal into `AuditEvent.Actor` everywhere:
- **Seam:** one small `IAuditActorAccessor` — HTTP paths read `HttpContext.User`; non-HTTP
paths (Akka actors, CLI) thread the operation principal or fall back. The single place that
changes if the principal source ever changes again.
- OtOpcUa → LDAP-resolved user. MxGateway → API-key name (system/cli fallback retained for
keyless CLI events). ScadaBridge → principal at `ManagementActor`/inbound boundary.
## Contracts, testing & risk gates
**Hard seam contracts:**
- `IAuditWriter` — best-effort, MUST NOT throw, swallow internal failures. OtOpcUa actor ✅,
ScadaBridge ✅; MxGateway needs the never-throw wrap (above).
- `IAuditRedactor` — pure, never throws, over-redacts on failure. ScadaBridge's
`SafeDefaultAuditPayloadFilter` is the reference; rename preserves it.
**Cross-boundary surface:** Auth/Audit adoption is in-process and does **not** touch the
cross-repo wire contracts (gateway `.proto` files, OPC UA address-space shape) — **except** the
ScadaBridge API-key token-format change, the one item needing an interop check rather than just
a green unit build. A green build in one repo does not prove interop.
**Per-phase verification (evidence before "done"):**
- **Phase 0:** all 5 packages HTTP 200; `dotnet restore` green in all 3 repos.
- **Phase 1:** existing auth tests + new parity tests green per repo before merge; SB
token-format integration check.
- **Phase 2:** existing audit tests + new `Outcome`/`EventId`/rename tests; OtOpcUa `Outcome`
migration applies forward.
- **Phase 3:** `Actor == authenticated principal` on authenticated paths; fallback retained on
keyless/system paths.
- **Library suites** (Audit 19, Auth 172) re-run if any lib is touched. If a parity gap forces
a lib fix, bump `0.1.0``0.1.1` and re-publish rather than editing a published version.
## Tasks
| Task | Item | Blocked by |
|---|---|---|
| #7 | Phase 0 — publish both libs + feed-map all 3 repos | — |
| #8 | Phase 1 — adopt ZB.MOM.WW.Auth across all 3 repos (auth GAPS #1#8) | #7 |
| #9 | Phase 2 — adopt ZB.MOM.WW.Audit across all 3 repos (audit GAPS #1#3, #5, #6) | #8 |
| #10 | Phase 3 — wire Actor from the Auth principal (audit GAPS #4) | #8, #9 |
## References
- `components/auth/GAPS.md`, `components/auth/spec/`, `components/auth/current-state/*`
- `components/audit/GAPS.md`, `components/audit/shared-contract/ZB.MOM.WW.Audit.md`,
`components/audit/current-state/*`
- Libraries: `ZB.MOM.WW.Auth/`, `ZB.MOM.WW.Audit/`
- Prior adoption precedent: `components/configuration/GAPS.md`,
`components/observability/GAPS.md`