docs: design for full Auth+Audit normalization across 3 sister projects

Approved brainstorming output: two-library program (publish + adopt
ZB.MOM.WW.Auth then ZB.MOM.WW.Audit across OtOpcUa, MxAccessGateway,
ScadaBridge), library-major waterfall, ending with audit Actor wired
from the Auth principal. Local-only delivery; verified feed/source state.
This commit is contained in:
Joseph Doherty
2026-06-02 00:04:33 -04:00
parent c3ab37523a
commit 6ec1ea7d65
@@ -0,0 +1,195 @@
# Design — Auth + Audit normalization across all three sister projects
**Date:** 2026-06-02
**Status:** Approved (brainstorming complete) — handing off to writing-plans.
**Scope owner decision:** full two-library normalization (see [Scope decisions](#scope-decisions)).
## Summary
Bring two shared libraries that already live in `scadaproj` but are **unpublished and
adopted by no app** — `ZB.MOM.WW.Auth` (4 packages) and `ZB.MOM.WW.Audit` (1 package) —
to **full adoption across OtOpcUa, MxAccessGateway, and ScadaBridge**, ending with every
audit emit site carrying the genuine Auth-resolved principal as `AuditEvent.Actor`.
The original request was "implement the audit component in all sister projects." Because
audit GAPS #4 (Actor = the `ZB.MOM.WW.Auth` principal) requires an authenticated principal
at every emit site, and because the owner chose the maximal scope at every fork, the job
expands to a **two-library program**: full Auth adoption (auth GAPS #1#8) first, then full
Audit adoption (audit GAPS #1#6) with #4 wiring `Actor` from the now-live principal.
## Verified starting state (source-checked 2026-06-02)
- **Both libraries exist and are pack-ready** in `scadaproj/ZB.MOM.WW.Auth/` (4 csproj +
`build/pack.sh` + `build/push.sh`, 172 tests) and `scadaproj/ZB.MOM.WW.Audit/`
(`build/pack.sh`, 19 tests). Both at version `0.1.0`, both central-package-management.
- **Neither is on the Gitea feed.** All five package registration endpoints return
**HTTP 404**. No `.nupkg` is built locally.
- **Adopted by zero apps.** No sibling repo references `ZB.MOM.WW.Auth*` or `ZB.MOM.WW.Audit`.
- **Feed source-mapping is missing in all three repos.** Each `NuGet.config`
`packageSourceMapping` lists Health/Telemetry/Configuration but **not** Auth or Audit, so
each repo needs mapping lines added (mirror MxGateway commit `437ab65`, which did this for
Configuration).
- **The MxGateway audit coordination gate (audit GAPS #2) is CLEAR.** `MxGateway.Server`
already references `ZB.MOM.WW.Telemetry.Serilog 0.1.0`; the Serilog/Telemetry/Configuration
work is merged to `main`. MxGateway audit adoption is unblocked.
- Established adoption rhythm (Telemetry, Configuration): publish lib to feed → add feed
mapping + version pin → behaviour-preserving consumer cutover → land on the repo's local
default branch (not pushed to remote).
> Per repo memory, prior "published"/"adopted" claims in this workspace have repeatedly been
> optimistic; every claim above was re-verified against the feed and source on 2026-06-02.
## Scope decisions
| Fork | Decision |
|---|---|
| How deep into the audit GAPS backlog? | **Everything incl. #4 Actor→Auth** (all of #1#6). |
| How to satisfy #4 given Auth is unadopted? | **Adopt Auth first, then audit** (two-library program). |
| How much of the Auth backlog? | **Full Auth normalization** (auth GAPS #1#8, all 3 repos). |
| How to walk the work matrix? | **Library-major waterfall** (Phase 1 Auth → Phase 2 Audit → Phase 3 wiring). |
| Remote integration model | **Local-only**; no `git push`, no PRs (safest for production auth paths; flip per repo later if desired). |
## Architecture — four phases
```
Phase 0 Publish & feed-map pack + push both libs to Gitea feed (fix the 404s);
(foundation) add NuGet.config source-mappings + version pins in all 3 repos.
Phase 1 Auth adoption auth GAPS #1#8 across all 3 repos, in GAPS sequence:
(largest, sec-sensitive) #3 IGroupRoleMapper seam → #1 Ldap + #2 ApiKeys cutover →
#4 config schema (A1/A2) + #5 claims/cookies → #6 dev base DN →
#8 canonical roles. Each lands behind tests.
Phase 2 Audit adoption audit GAPS #1#3 core + #5/#6 cleanups across all 3 repos.
(behaviour-preserving)
Phase 3 Actor→Auth wiring audit GAPS #4: route the now-live Auth principal into Actor
(the payoff) at every emit site. Closes the loop Audit.Actor == Auth principal.
```
The waterfall is enforced by task dependencies (Phase 0 → 1 → 2 → 3). Phase 1 must fully
land before Phase 3 can wire a *stable* principal; Phase 2 sits after Phase 1 so emit sites
aren't touched twice.
### Delivery model
- One **feature branch per repo per library phase** (`feat/adopt-zb-auth`, then
`feat/adopt-zb-audit`), behaviour-preserving except where a GAPS item is explicitly net-new.
- **Publish-first**: both packages on the feed and verified resolvable before any consumer edit.
- **Land on each repo's local default branch**, gated by that repo's tests + new contract tests.
- **Local-only** (no push). Each phase is a revertable branch merge.
- The libraries themselves are plain files in `scadaproj` (not nested git repos) — publishing
is `pack` + `push` only; no commits to the libs unless a parity gap forces a fix.
## Phase 0 — publish & feed-map *(task #7)*
1. `dotnet pack -c Release` both libraries; `push.sh` to the Gitea feed
(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`).
2. Verify all five packages return HTTP 200 from the registration endpoint.
3. In each repo: add `packageSourceMapping` patterns (`ZB.MOM.WW.Auth`, `ZB.MOM.WW.Auth.*`,
`ZB.MOM.WW.Audit`) to the gitea source, and version pins (`Directory.Packages.props` for
OtOpcUa/ScadaBridge; inline `Version="0.1.0"` for MxGateway).
4. `dotnet restore` resolves the new patterns in all three repos.
## Phase 1 — Auth adoption *(task #8, blocked by #7)*
Consumer cutover (libs are already extracted). GAPS order: #3 seam → #1 Ldap + #2 ApiKeys →
#4 config schema + #5 claims/cookies → #6 dev base DN → #8 canonical roles.
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Packages | Abstractions + Ldap + AspNetCore (no ApiKeys — OPC UA transport security) | all 4 (**source** for ApiKeys — cuts over first) | all 4 (**source** for Ldap; ApiKeys consumer after gw) |
| Role mapper (#3) | config-backed (`GroupToRole`) | config-backed | **DB-backed** (`LdapGroupMapping`) |
| Config migration (#4) | A1: `UseTls``Transport` enum (section already nested) | A1: `UseTls``Transport` enum | **A2 (biggest)**: flat `Security:Ldap*`→nested; rename `LdapUserIdAttribute``UserNameAttribute`, `LdapGroupAttribute``GroupAttribute` |
| Cookies/claims (#5) | Blazor Admin control-plane cookie | keep `MxGatewayDashboard` name, share claims | keep `ZB.MOM.WW.ScadaBridge.Auth` name, share claims |
| Canonical roles (#8) | no first-class `Deployer` (publish ⊂ `FleetAdmin`) | no `Designer`/`Deployer` | **roles collapse**: `AuditReadOnly`→Viewer, `Audit`→Administrator (auditor/admin SoD loss — GAPS-accepted) |
**Two deliberate behaviour changes (accepted):**
1. **ScadaBridge API-key token format** (D2): raw `X-API-Key` → structured
`<prefix>_<id>_<secret>`. A genuine wire change for inbound API clients — acceptable
pre-prod, requires an interop check.
2. **Canonical-roles collapse** in ScadaBridge removes auditor/admin separation-of-duties.
**Known live issue to fix during OtOpcUa cutover:** `LdapAuthService` `Enabled`/double-singleton
wiring is still open even though the `Security:Ldap` section binding was fixed — fold the fix
into the OtOpcUa LDAP cutover.
**Risk gate:** parity tests reproducing each app's current authn decisions (bind-then-search,
fail-closed group lookup, RFC-4514 + filter escaping, constant-time compare, peppered
HMAC-SHA256) must be green before any cutover merges.
## Phase 2 — Audit adoption *(task #9, blocked by #8)*
Behaviour-preserving seam/record/enum adoption.
| Repo | Core work (GAPS #1#3) | Keep bespoke |
|---|---|---|
| **OtOpcUa** (#1, #5) | Replace `Commons/.../AuditEvent.cs` with canonical record; `AuditWriterActor : IAuditWriter`; derive `Outcome` at emit sites (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→Denied, config verbs→Success); bridge `NodeId`/`CorrelationId` value-types | Akka singleton transport, 500/5s batching, two-layer dedup, `ConfigAuditLog` EF entity + idempotency index |
| **MxGateway** (#2, #6) | Map `IApiKeyAuditStore`/`ApiKeyAuditEntry``IAuditWriter`/`AuditEvent`; generate `EventId`; `"system"`/`"cli"` Actor fallback; `Category="ApiKey"`; `constraint-denied`→Denied | SQLite store, 3 producer call sites (only injected type changes), append-only table |
| **ScadaBridge** (#3) | Outright rename `IAuditPayloadFilter``IAuditRedactor`; adopt canonical `AuditOutcome` enum; confirm writer contract (byte-identical) — keep bespoke ~25-field record as storage shape | Entire Site/Central pipeline, 4 domain enums, CLI export/verify, Blazor UI, redaction policy |
**Resolved open GAPS decisions:**
1. **ScadaBridge rename vs. alias****outright rename** (compiler-verified across the HIGH blast radius).
2. **MxGateway `Details`→`DetailsJson`****wrap as a small JSON object** (keeps the field valid JSON).
3. **OtOpcUa `Outcome` storage****new nullable `Outcome` column + EF migration** (first-class, queryable).
4. **OtOpcUa SP path****leave bespoke + document**; *do* fix the `ClusterId`-filter/actor
mismatch in `ClusterAudit.razor` so structured rows are visible.
**Cleanups in scope:** #5 (OtOpcUa SP reconcile + `ClusterId` visibility fix), #6 (MxGateway
`CorrelationId` capture + structured `Target`).
**Behaviour fix:** MxGateway's `AppendAsync` currently may propagate; wrap it so the adopted
`IAuditWriter` never throws (honors the best-effort contract).
## Phase 3 — Actor→Auth wiring *(task #10, blocked by #8 + #9)*
With Auth live (Phase 1) and the canonical record adopted (Phase 2), route the resolved
principal into `AuditEvent.Actor` everywhere:
- **Seam:** one small `IAuditActorAccessor` — HTTP paths read `HttpContext.User`; non-HTTP
paths (Akka actors, CLI) thread the operation principal or fall back. The single place that
changes if the principal source ever changes again.
- OtOpcUa → LDAP-resolved user. MxGateway → API-key name (system/cli fallback retained for
keyless CLI events). ScadaBridge → principal at `ManagementActor`/inbound boundary.
## Contracts, testing & risk gates
**Hard seam contracts:**
- `IAuditWriter` — best-effort, MUST NOT throw, swallow internal failures. OtOpcUa actor ✅,
ScadaBridge ✅; MxGateway needs the never-throw wrap (above).
- `IAuditRedactor` — pure, never throws, over-redacts on failure. ScadaBridge's
`SafeDefaultAuditPayloadFilter` is the reference; rename preserves it.
**Cross-boundary surface:** Auth/Audit adoption is in-process and does **not** touch the
cross-repo wire contracts (gateway `.proto` files, OPC UA address-space shape) — **except** the
ScadaBridge API-key token-format change, the one item needing an interop check rather than just
a green unit build. A green build in one repo does not prove interop.
**Per-phase verification (evidence before "done"):**
- **Phase 0:** all 5 packages HTTP 200; `dotnet restore` green in all 3 repos.
- **Phase 1:** existing auth tests + new parity tests green per repo before merge; SB
token-format integration check.
- **Phase 2:** existing audit tests + new `Outcome`/`EventId`/rename tests; OtOpcUa `Outcome`
migration applies forward.
- **Phase 3:** `Actor == authenticated principal` on authenticated paths; fallback retained on
keyless/system paths.
- **Library suites** (Audit 19, Auth 172) re-run if any lib is touched. If a parity gap forces
a lib fix, bump `0.1.0``0.1.1` and re-publish rather than editing a published version.
## Tasks
| Task | Item | Blocked by |
|---|---|---|
| #7 | Phase 0 — publish both libs + feed-map all 3 repos | — |
| #8 | Phase 1 — adopt ZB.MOM.WW.Auth across all 3 repos (auth GAPS #1#8) | #7 |
| #9 | Phase 2 — adopt ZB.MOM.WW.Audit across all 3 repos (audit GAPS #1#3, #5, #6) | #8 |
| #10 | Phase 3 — wire Actor from the Auth principal (audit GAPS #4) | #8, #9 |
## References
- `components/auth/GAPS.md`, `components/auth/spec/`, `components/auth/current-state/*`
- `components/audit/GAPS.md`, `components/audit/shared-contract/ZB.MOM.WW.Audit.md`,
`components/audit/current-state/*`
- Libraries: `ZB.MOM.WW.Auth/`, `ZB.MOM.WW.Audit/`
- Prior adoption precedent: `components/configuration/GAPS.md`,
`components/observability/GAPS.md`