# Phase 2 (Audit adoption) — Task 2.0 gate findings + DEEP re-scope (for review) Companion to `2026-06-02-auth-audit-normalization.md`. Produced by the **Task 2.0 read-only verification gate** (3 parallel explorers, all paths verified 2026-06-02 against live code on each repo's `feat/adopt-zb-auth` HEAD). **Status: PAUSED for user review before any audit code is written.** **Decisions taken (2026-06-02):** - **Depth = DEEP adopt (canonical record).** Each app's audit record becomes the library's 9-field `ZB.MOM.WW.Audit.AuditEvent`; domain-specific fields relocate into `DetailsJson`; each app consumes the library's `IAuditWriter`/`IAuditRedactor`/`AuditOutcome` types. (User chose this over the gate-recommended lighter "Align" — consistent with the standing maximal/full-adopt directive.) - **Cadence = re-scope + PAUSE for review.** This doc is the review artifact; implementation does not start until the user signs off (especially on the ScadaBridge cost, below). > **Why a re-scope was needed:** the plan's Phase 2 task specs were written from optimistic > `components/audit/current-state/*` docs (see [[component-status-claims-are-optimistic]]). The gate > found all three repos' specs are materially off — file refs moved (MxGateway), the target path is > dormant (OtOpcUa), and the "outright rename" is structurally impossible (ScadaBridge). --- ## The canonical contract (shared `ZB.MOM.WW.Audit` 0.1.0) `AuditEvent` (sealed record): REQUIRED `EventId:Guid`, `OccurredAtUtc:DateTimeOffset` (UTC-normalized on set), `Actor:string`, `Action:string`, `Outcome:AuditOutcome`; OPTIONAL `Category:string?`, `Target:string?`, `SourceNode:string?`, `CorrelationId:Guid?`, `DetailsJson:string?`. **Nine fields.** `AuditOutcome { Success, Failure, Denied }`. `IAuditWriter.WriteAsync(AuditEvent, CancellationToken)` — best-effort, never throws. `IAuditRedactor.Apply(AuditEvent) -> AuditEvent` — pure, never throws. The package is pinned (central PM / explicit) + feed-mapped in all three repos; **referenced by none yet.** --- ## OtOpcUa — DEEP (Tasks 2.1 + 2.2) · risk: LOW–MEDIUM **Verified current state:** Commons `AuditEvent` is an **8-field positional record** — `(Guid EventId, string Category, string Action, string Actor, DateTime OccurredAtUtc, string? DetailsJson, NodeId SourceNode, CorrelationId CorrelationId)` — where `NodeId`/`CorrelationId` are `readonly record struct` newtypes over `string`/`Guid`. It is an **Akka message** delivered via `DistributedPubSub` (`provider=cluster`) with **default (reflection) serialization** — no custom serializer. **The structured actor path is DORMANT: zero production emit sites** construct/`Tell` an `AuditEvent` today (only the tests do); all live audit goes through the bespoke **stored-procedure path** (`sp_NodeApplied`/`sp_PublishGeneration`/ `sp_ValidateDraft`/`sp_RollbackToGeneration` INSERT directly with `ClusterId`/`GenerationId`, NULL `EventId`). `AuditWriterActor` (`ControlPlane/Audit/AuditWriterActor.cs`): 500/5s batching, two-layer dedup (in-buffer `Dictionary` + DB filtered-unique `UX_ConfigAuditLog_EventId`), mapping at `:75-84`. `ConfigAuditLog` (10 cols, no `Outcome`; `ISJSON` CHECK on `DetailsJson`). `ClusterAudit.razor:78` filters `a.ClusterId == ClusterId`, but the actor sets `NodeId` not `ClusterId`, so structured rows are invisible. Package pinned `0.1.0` in `Directory.Packages.props`, feed-mapped, unreferenced. **Deep design — this is the easy one (the record is already ~canonical):** - **2.1 (high-risk: actor + contract):** Delete Commons `AuditEvent.cs`; reference `ZB.MOM.WW.Audit.AuditEvent` from `ZB.MOM.WW.OtOpcUa.Commons` + `…ControlPlane`. Field map: `EventId`→`EventId`; `OccurredAtUtc` `DateTime`→`DateTimeOffset` (widen at construction); `Actor`/`Action`/`Category`/`DetailsJson` direct; `SourceNode` (unwrap `NodeId.Value`→`string?`); `CorrelationId` (unwrap `.Value`→`Guid?`); `Target` unused (null) — OtOpcUa has no extra domain fields to push into `DetailsJson`, so **no field relocation**. Add the NEW required `Outcome` (derive: `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→`Denied`; config verbs→ `Success`; no `Failure` in OtOpcUa's vocabulary). `AuditWriterActor : IAuditWriter` (`WriteAsync` wraps the fire-and-forget `Tell`, returns `Task.CompletedTask` — trivially best-effort). Keep batching/dedup. Mapping at `:75-84` becomes `NodeId = evt.SourceNode`, `CorrelationId = evt.CorrelationId`, `Outcome = evt.Outcome`, `EventType = $"{evt.Category}:{evt.Action}"` (storage keeps the composite). Value-type unwrap happens at the (test + future) construction sites. **Akka wire note:** the message type changes shape → a rolling-deploy wire break IN PRINCIPLE, but **moot** (no live emit traffic). Flag in the commit; no dual-accept window needed. - **2.2 (high-risk: EF migration + UI query):** add nullable `Outcome` to `ConfigAuditLog` (+ DbContext mapping `:429-463`) + EF migration `AddConfigAuditLogOutcome` (chains after `20260602112419_CanonicalizeAdminRoles`). Fix `ClusterAudit.razor:78` so `ClusterId == null && NodeId` resolves to the cluster (OR-predicate joining `ClusterNodes`, or populate `ClusterId` at flush). SP path stays bespoke (documented). - **Package refs:** `…Commons` (record + `AuditOutcome`), `…ControlPlane` (`IAuditWriter`), `…Configuration` (only if `Outcome` is stored as the enum type; otherwise store `string?`/`int?` and skip). - **Effort:** ~record swap 5m + actor seam 5m + Outcome derivation 5m (2.1); column+migration+query 5m (2.2). --- ## MxGateway — DEEP (Task 2.3, re-scoped) · risk: MEDIUM–HIGH (was "standard") **Verified current state — the plan's file refs are STALE:** Phase 1 (Task 1.3) **moved** `IApiKeyAuditStore` + `ApiKeyAuditEntry` + `SqliteApiKeyAuditStore` **into the shared library** (`ZB.MOM.WW.Auth.Abstractions`/`…ApiKeys` 0.1.2) — they no longer exist in MxGateway. `ApiKeyAuditEntry` = **5 fields** `(string? KeyId, string EventType, string? RemoteAddress, DateTimeOffset CreatedUtc, string? Details)`, persisted to the SQLite `api_key_audit` table (5 cols). `IApiKeyAuditStore` = `AppendAsync` + `ListRecentAsync` (the dashboard "recent audit" view reads via `ListRecentAsync`). **Three producers, but one is library-internal:** - `ApiKeyAdminCommands` (**library-internal**, in `ZB.MOM.WW.Auth.ApiKeys`) — emits CLI/admin verbs (`init-db`/`create-key`/`revoke-key`/`rotate-key`/`delete-key`/`set-scopes`/`enable-key`/`disable-key`), keyless for `init-db`, `RemoteAddress` null on the CLI path. **MxGateway cannot edit these call sites.** - `DashboardApiKeyManagementService` (MxGateway-local) — `dashboard-*` verbs, real `KeyId` + `RemoteAddress`. - `ConstraintEnforcer.RecordDenialAsync` (MxGateway-local) — single `constraint-denied` EventType, `RemoteAddress` hardcoded null, `Details = "{commandKind}: {target}: {ConstraintName}: {Message}"`. `AppendAsync` currently **propagates** exceptions (no best-effort wrap). Serilog migration **landed** (no blocker). `ZB.MOM.WW.Audit` unreferenced; `nuget.config` already maps the package. **Deep design — the library-internal CLI producer forces an adapter:** - Add `` to `…Server`. - New **MxGateway-owned canonical store** `audit_event` (SQLite, 9 canonical columns + `details_json`) with its own migrator — the existing `api_key_audit` lives in the **library-owned** auth DB schema, so we do NOT alter that schema. Implement `IAuditWriter` over the new store (best-effort try/catch — fixes the no-wrap gap). - **Adapter for the library-internal CLI events:** register a MxGateway `IApiKeyAuditStore` impl whose `AppendAsync(ApiKeyAuditEntry)` maps → canonical `AuditEvent` (`EventId=NewGuid`; `KeyId`→`Actor` with `"cli"`/`"system"` fallback; `EventType`→`Action`; `CreatedUtc`→`OccurredAtUtc`; `RemoteAddress`→`SourceNode`; `Outcome=Success`; `Category="ApiKey"`; `Target=KeyId`; `Details`→`DetailsJson` wrapped `{"detail":"…"}`) and forwards to `IAuditWriter`. Its `ListRecentAsync` reads the canonical store and maps back to `ApiKeyAuditEntry` (so the existing dashboard recent-audit view keeps working) **or** the dashboard view is repointed to canonical. - **Local producers** (`DashboardApiKeyManagementService`, `ConstraintEnforcer`) rewritten to build canonical `AuditEvent`s directly via `IAuditWriter` (`constraint-denied`→`Outcome.Denied`; capture `CorrelationId` from `MxCommandRequest.ClientCorrelationId` (constraint path — needs threading down) / `HttpContext.TraceIdentifier` (dashboard); structured `Target` from `commandKind`/`target` (GAPS #6)). - **Open question for review:** retire `api_key_audit` (canonical store becomes the sole audit table) vs keep it coexisting. Retiring is cleaner-deep but touches the library's store wiring; coexisting is lower-risk. - **Effort/classification:** re-scoped from "standard ~5m" to **high-risk** (new store + migrator + adapter + producer rewrites + dashboard read path + DI + tests). Realistically 2–3 sub-commits. --- ## ScadaBridge — DEEP (Task 2.5, re-scoped) · risk: **VERY HIGH — audit-subsystem re-architecture** **This is the one to scrutinize at review.** The gate definitively answered the plan's central claim is FALSE. **Verified current state:** ScadaBridge's `AuditEvent` (`…Commons/Entities/Audit/AuditEvent.cs`) is a **24-field** record — `EventId, OccurredAtUtc(DateTime), IngestedAtUtc, Channel(AuditChannel), Kind(AuditKind), CorrelationId, ExecutionId, ParentExecutionId, SourceSiteId, SourceNode, SourceInstanceId, SourceScript, Actor, Target, Status(AuditStatus), HttpStatus, DurationMs, ErrorMessage, ErrorDetail, RequestSummary, ResponseSummary, PayloadTruncated, Extra, ForwardState(AuditForwardState?)`. It is the **storage shape of a partitioned SQL Server audit table** with these as **queryable columns**. `IAuditPayloadFilter.Apply(ScadaBridgeAuditEvent) -> ScadaBridgeAuditEvent` (NOT the library's record — a reflection contract test `PayloadFilterContractTests` pins the typing). `IAuditWriter`/`ICentralAuditWriter` are likewise typed to the 24-field record. **`AuditStatus` drives the site→central forwarding STATE MACHINE** (`Pending→Submitted→Forwarded→Reconciled`; `Delivered`/`Failed`/`Parked`/`Discarded`) and the **filter's error-cap logic** (`IsErrorStatus`). The Central reporting/UI queries by `Channel`/`Kind`/`Status`/`Site`. **Phase 1 did NOT touch any audit-pipeline file** (zero drift). Blast radius of just the interface rename: ~10 files / ~20 sites; the contract test pins it. **What DEEP adoption concretely requires here (full honesty):** Replacing the 24-field record with the 9-field canonical + pushing ~15 domain fields into `DetailsJson` means **re-architecting the entire audit subsystem**, because those fields are not decorative — they are load-bearing: 1. **Storage:** migrate the partitioned SQL Server audit table from ~24 typed columns to the 9 canonical columns + a JSON `DetailsJson` column. Massive, lossy-on-queryability data migration; partitioning scheme likely must change; `IngestedAtUtc`/`ForwardState` are operational columns the forwarder UPDATEs. 3. **Forwarding state machine breaks:** `Status`/`ForwardState` move into opaque JSON — you cannot `UPDATE` a JSON-embedded field as a column, and the reconciliation queries `WHERE Status/ForwardState = …` stop working. The site→central forwarder would have to be redesigned (e.g., promote Status back out of JSON, defeating the point). 4. **Redactor breaks:** `DefaultAuditPayloadFilter` reads `Channel`/`Status`/`RequestSummary`/`ResponseSummary`/ `ErrorDetail`/`Extra`/`PayloadTruncated` to choose truncation caps — on a 9-field canonical record those are gone (opaque in `DetailsJson`), so the filter must be rewritten to parse JSON. 5. **Reporting/UI breaks:** Central audit-log queries/filters by Channel/Kind/Status/Site lose SQL queryability. 6. ~Dozens of call sites + the contract test + the perf hot-path test. **Honest assessment:** ScadaBridge DEEP ≈ the **largest single undertaking in the whole program** (bigger than the Phase-1 ApiKeys re-arch). The audit component's own GAPS doc says *"Align, don't replace"* for exactly this reason. **Bounded alternative to weigh at review (recommended if "deep" is to be kept tractable):** make the canonical `ZB.MOM.WW.Audit.AuditEvent` the **seam/transport + cross-project reporting** shape (the redactor and an `IAuditWriter` operate on the canonical record; domain richness rides in `DetailsJson`), while the **SQL storage keeps its typed queryable columns** populated by a storage-side projection (canonical+DetailsJson → columns) and the forwarding state machine continues to key on the `Status`/`ForwardState` columns. This delivers "deep" at the seam/record level (library types consumed; domain fields in `DetailsJson` for the canonical view) **without** gutting the partitioned store, the state machine, the filter, or the reporting — a far safer "deep." --- ## Cross-cutting - **Branch model:** `feat/adopt-zb-audit` per app, **stacked on `feat/adopt-zb-auth` HEAD** (Phase 3 wires the audit `Actor` from the Phase-1 Auth principal, so audit must build on auth). Local-only, never pushed. - **No library change / republish** needed for the chosen designs (MxGateway adapts in-repo) — so no Gitea token required unless the user later wants the canonical mapping pushed into a shared lib. - **Phase 3 (unchanged in intent):** `IAuditActorAccessor` seam + wire `AuditEvent.Actor` from the Auth principal at every authenticated emit site; keep `"system"`/`"cli"` fallbacks for keyless paths. ## Re-scoped task list (for review) | # | Repo | Re-scoped scope | Class | Risk | |---|---|---|---|---| | 2.1 | OtOpcUa | Commons record → canonical `AuditEvent`; `AuditWriterActor : IAuditWriter`; `Outcome` derivation; Akka-wire note (dormant) | high-risk | Low–Med | | 2.2 | OtOpcUa | `ConfigAuditLog.Outcome` column + EF migration + `ClusterAudit` visibility fix; SP path bespoke | high-risk | Low–Med | | 2.3 | MxGateway | new canonical SQLite `audit_event` store + migrator; `IAuditWriter`; `IApiKeyAuditStore`→canonical adapter (for library-internal CLI events) incl. `ListRecentAsync`; rewrite local producers; CorrelationId/Target capture; DI; tests | **high-risk** (↑ from standard) | Med–High | | 2.5 | ScadaBridge | **DEEP = audit-subsystem re-arch** (24-field→9-field record everywhere; domain fields→`DetailsJson`; SQL partitioned-table migration; forwarding state machine + filter + reporting rewrite; contract/perf tests) — **OR** the bounded "deep-at-the-seam" alternative above | **very-high-risk** | **VERY HIGH** | ## Implementation status (2026-06-02, deep adoption underway) - **✅ OtOpcUa 2.1 + 2.2 DONE** (`feat/adopt-zb-audit`, spec ✅ + code ✅): `933dd1a` — deleted bespoke Commons `AuditEvent`, adopted library `ZB.MOM.WW.Audit.AuditEvent`, `AuditWriterActor : IAuditWriter` (best-effort `WriteAsync` wraps `Self.Tell`), `AuditOutcomeMapper.FromAction` derivation, batching/dedup intact; `b7f5e88` — nullable `Outcome` column + migration `20260602135350_AddConfigAuditLogOutcome` (additive, chains after CanonicalizeAdminRoles, no pending model changes) + `ClusterAudit` fix via shared `ClusterAuditQuery` (OR-predicate joining `ClusterNode` membership). SP path untouched. ControlPlane 45/45, Configuration 80/80 (+3), AdminUI 121/121. Minor backlog: no `IX_ConfigAuditLog_NodeId` (irrelevant while structured path dormant). - **✅ MxGateway 2.3 DONE** (`feat/adopt-zb-audit`, spec ✅ + code ✅): `a5944bb` — new MxGateway-owned canonical SQLite `audit_event` store (same auth DB file via the library's `AuthSqliteConnectionFactory`; library tables untouched), `CanonicalAuditWriter : IAuditWriter` (best-effort, never throws — closes the library's no-wrap gap), `CanonicalForwardingApiKeyAuditStore : IApiKeyAuditStore` adapter (maps `ApiKeyAuditEntry`→canonical w/ system/cli fallback + constraint-denied→Denied + DetailsJson wrap; `ListRecent` round-trips for the dashboard view), DI overrides the library's `TryAddSingleton`'d store; `7ea8358` — Dashboard + ConstraintEnforcer rewritten to emit canonical `AuditEvent` directly via `IAuditWriter` with structured `Target` + (dashboard) `CorrelationId`. 587 pass, 3 pre-existing FakeWorker reds, +10 tests. `api_key_audit` left unused (documented). Minor backlog: dup `WrapDetail`, per-op `EnsureTable`, a test temp-dir leak, unfiltered `ListRecent` category. - **✅ ScadaBridge 2.5 — DONE (FULL re-arch, user-chosen).** Decomposed into C1–C7 (design in `2026-06-02-scadabridge-audit-rearch.md`), all spec+code reviewed, MSSQL-verified, local-only on `feat/adopt-zb-audit`. Canonical record everywhere; site SQLite two-table (canonical + forwarding sidecar); central `dbo.AuditLog` collapsed to 10 canonical cols + persisted computed cols (`CollapseAuditLogToCanonical` migration); redactor/outcome/UI/export/CLI all canonical. Forwarding state machine preserved (sidecar) + queryability preserved (persisted computed columns) — the design's key insight that central is append-only made pure-9-col central feasible without gutting forwarding. ## Open items to confirm at review 1. **ScadaBridge:** full audit re-architecture (pure 9-col storage) vs the **bounded "deep-at-the-seam"** variant (canonical record at the seam/reporting boundary; keep typed storage columns + state machine). Strongly recommend the bounded variant. 2. **MxGateway:** retire `api_key_audit` (canonical store is sole) vs keep it coexisting. 3. **OtOpcUa:** confirm leaving the SP path bespoke (structured path is dormant; canonicalization is forward-looking prep) is acceptable, and the `ClusterAudit` fix approach (OR-predicate vs populate `ClusterId`). 4. **Sequencing:** OtOpcUa (2.1→2.2) and MxGateway (2.3) are independent + tractable; ScadaBridge (2.5) is the gating risk — do it last, and as staged reviewed sub-commits regardless of variant.