From abe06a216309548645c242add2f8b2db7516ed28 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 2 Jun 2026 09:13:09 -0400 Subject: [PATCH] =?UTF-8?q?plan(phase2):=20Task=202.0=20gate=20DONE=20?= =?UTF-8?q?=E2=80=94=20verified=20plan=20specs=20materially=20off=20(MxGw?= =?UTF-8?q?=20store=20moved=20to=20lib,=20OtOpcUa=20path=20dormant,=20SB?= =?UTF-8?q?=20rename=20structurally=20impossible);=20user=20chose=20DEEP?= =?UTF-8?q?=20adopt=20+=20pause;=20corrected=20deep=20design=20in=20-phase?= =?UTF-8?q?2-deep.md;=20PAUSED=20for=20review?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...02-auth-audit-normalization-phase2-deep.md | 183 ++++++++++++++++++ .../2026-06-02-auth-audit-normalization.md | 10 + ...-02-auth-audit-normalization.md.tasks.json | 2 +- 3 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md diff --git a/docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md b/docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md new file mode 100644 index 0000000..502c120 --- /dev/null +++ b/docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md @@ -0,0 +1,183 @@ +# Phase 2 (Audit adoption) — Task 2.0 gate findings + DEEP re-scope (for review) + +Companion to `2026-06-02-auth-audit-normalization.md`. Produced by the **Task 2.0 read-only +verification gate** (3 parallel explorers, all paths verified 2026-06-02 against live code on each +repo's `feat/adopt-zb-auth` HEAD). **Status: PAUSED for user review before any audit code is written.** + +**Decisions taken (2026-06-02):** +- **Depth = DEEP adopt (canonical record).** Each app's audit record becomes the library's 9-field + `ZB.MOM.WW.Audit.AuditEvent`; domain-specific fields relocate into `DetailsJson`; each app consumes + the library's `IAuditWriter`/`IAuditRedactor`/`AuditOutcome` types. (User chose this over the + gate-recommended lighter "Align" — consistent with the standing maximal/full-adopt directive.) +- **Cadence = re-scope + PAUSE for review.** This doc is the review artifact; implementation does not + start until the user signs off (especially on the ScadaBridge cost, below). + +> **Why a re-scope was needed:** the plan's Phase 2 task specs were written from optimistic +> `components/audit/current-state/*` docs (see [[component-status-claims-are-optimistic]]). The gate +> found all three repos' specs are materially off — file refs moved (MxGateway), the target path is +> dormant (OtOpcUa), and the "outright rename" is structurally impossible (ScadaBridge). + +--- + +## The canonical contract (shared `ZB.MOM.WW.Audit` 0.1.0) + +`AuditEvent` (sealed record): REQUIRED `EventId:Guid`, `OccurredAtUtc:DateTimeOffset` (UTC-normalized +on set), `Actor:string`, `Action:string`, `Outcome:AuditOutcome`; OPTIONAL `Category:string?`, +`Target:string?`, `SourceNode:string?`, `CorrelationId:Guid?`, `DetailsJson:string?`. **Nine fields.** +`AuditOutcome { Success, Failure, Denied }`. `IAuditWriter.WriteAsync(AuditEvent, CancellationToken)` — +best-effort, never throws. `IAuditRedactor.Apply(AuditEvent) -> AuditEvent` — pure, never throws. +The package is pinned (central PM / explicit) + feed-mapped in all three repos; **referenced by none yet.** + +--- + +## OtOpcUa — DEEP (Tasks 2.1 + 2.2) · risk: LOW–MEDIUM + +**Verified current state:** Commons `AuditEvent` is an **8-field positional record** — +`(Guid EventId, string Category, string Action, string Actor, DateTime OccurredAtUtc, string? DetailsJson, +NodeId SourceNode, CorrelationId CorrelationId)` — where `NodeId`/`CorrelationId` are `readonly record +struct` newtypes over `string`/`Guid`. It is an **Akka message** delivered via `DistributedPubSub` +(`provider=cluster`) with **default (reflection) serialization** — no custom serializer. **The structured +actor path is DORMANT: zero production emit sites** construct/`Tell` an `AuditEvent` today (only the tests +do); all live audit goes through the bespoke **stored-procedure path** (`sp_NodeApplied`/`sp_PublishGeneration`/ +`sp_ValidateDraft`/`sp_RollbackToGeneration` INSERT directly with `ClusterId`/`GenerationId`, NULL `EventId`). +`AuditWriterActor` (`ControlPlane/Audit/AuditWriterActor.cs`): 500/5s batching, two-layer dedup (in-buffer +`Dictionary` + DB filtered-unique `UX_ConfigAuditLog_EventId`), mapping at `:75-84`. +`ConfigAuditLog` (10 cols, no `Outcome`; `ISJSON` CHECK on `DetailsJson`). `ClusterAudit.razor:78` filters +`a.ClusterId == ClusterId`, but the actor sets `NodeId` not `ClusterId`, so structured rows are invisible. +Package pinned `0.1.0` in `Directory.Packages.props`, feed-mapped, unreferenced. + +**Deep design — this is the easy one (the record is already ~canonical):** +- **2.1 (high-risk: actor + contract):** Delete Commons `AuditEvent.cs`; reference `ZB.MOM.WW.Audit.AuditEvent` + from `ZB.MOM.WW.OtOpcUa.Commons` + `…ControlPlane`. Field map: `EventId`→`EventId`; `OccurredAtUtc` + `DateTime`→`DateTimeOffset` (widen at construction); `Actor`/`Action`/`Category`/`DetailsJson` direct; + `SourceNode` (unwrap `NodeId.Value`→`string?`); `CorrelationId` (unwrap `.Value`→`Guid?`); `Target` unused + (null) — OtOpcUa has no extra domain fields to push into `DetailsJson`, so **no field relocation**. Add the + NEW required `Outcome` (derive: `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→`Denied`; config verbs→ + `Success`; no `Failure` in OtOpcUa's vocabulary). `AuditWriterActor : IAuditWriter` (`WriteAsync` wraps the + fire-and-forget `Tell`, returns `Task.CompletedTask` — trivially best-effort). Keep batching/dedup. Mapping + at `:75-84` becomes `NodeId = evt.SourceNode`, `CorrelationId = evt.CorrelationId`, `Outcome = evt.Outcome`, + `EventType = $"{evt.Category}:{evt.Action}"` (storage keeps the composite). Value-type unwrap happens at the + (test + future) construction sites. **Akka wire note:** the message type changes shape → a rolling-deploy + wire break IN PRINCIPLE, but **moot** (no live emit traffic). Flag in the commit; no dual-accept window needed. +- **2.2 (high-risk: EF migration + UI query):** add nullable `Outcome` to `ConfigAuditLog` (+ DbContext mapping + `:429-463`) + EF migration `AddConfigAuditLogOutcome` (chains after `20260602112419_CanonicalizeAdminRoles`). + Fix `ClusterAudit.razor:78` so `ClusterId == null && NodeId` resolves to the cluster (OR-predicate joining + `ClusterNodes`, or populate `ClusterId` at flush). SP path stays bespoke (documented). +- **Package refs:** `…Commons` (record + `AuditOutcome`), `…ControlPlane` (`IAuditWriter`), `…Configuration` + (only if `Outcome` is stored as the enum type; otherwise store `string?`/`int?` and skip). +- **Effort:** ~record swap 5m + actor seam 5m + Outcome derivation 5m (2.1); column+migration+query 5m (2.2). + +--- + +## MxGateway — DEEP (Task 2.3, re-scoped) · risk: MEDIUM–HIGH (was "standard") + +**Verified current state — the plan's file refs are STALE:** Phase 1 (Task 1.3) **moved** +`IApiKeyAuditStore` + `ApiKeyAuditEntry` + `SqliteApiKeyAuditStore` **into the shared library** +(`ZB.MOM.WW.Auth.Abstractions`/`…ApiKeys` 0.1.2) — they no longer exist in MxGateway. `ApiKeyAuditEntry` = +**5 fields** `(string? KeyId, string EventType, string? RemoteAddress, DateTimeOffset CreatedUtc, string? Details)`, +persisted to the SQLite `api_key_audit` table (5 cols). `IApiKeyAuditStore` = `AppendAsync` + `ListRecentAsync` +(the dashboard "recent audit" view reads via `ListRecentAsync`). **Three producers, but one is library-internal:** +- `ApiKeyAdminCommands` (**library-internal**, in `ZB.MOM.WW.Auth.ApiKeys`) — emits CLI/admin verbs + (`init-db`/`create-key`/`revoke-key`/`rotate-key`/`delete-key`/`set-scopes`/`enable-key`/`disable-key`), + keyless for `init-db`, `RemoteAddress` null on the CLI path. **MxGateway cannot edit these call sites.** +- `DashboardApiKeyManagementService` (MxGateway-local) — `dashboard-*` verbs, real `KeyId` + `RemoteAddress`. +- `ConstraintEnforcer.RecordDenialAsync` (MxGateway-local) — single `constraint-denied` EventType, `RemoteAddress` + hardcoded null, `Details = "{commandKind}: {target}: {ConstraintName}: {Message}"`. +`AppendAsync` currently **propagates** exceptions (no best-effort wrap). Serilog migration **landed** (no blocker). +`ZB.MOM.WW.Audit` unreferenced; `nuget.config` already maps the package. + +**Deep design — the library-internal CLI producer forces an adapter:** +- Add `` to `…Server`. +- New **MxGateway-owned canonical store** `audit_event` (SQLite, 9 canonical columns + `details_json`) with its own + migrator — the existing `api_key_audit` lives in the **library-owned** auth DB schema, so we do NOT alter that + schema. Implement `IAuditWriter` over the new store (best-effort try/catch — fixes the no-wrap gap). +- **Adapter for the library-internal CLI events:** register a MxGateway `IApiKeyAuditStore` impl whose + `AppendAsync(ApiKeyAuditEntry)` maps → canonical `AuditEvent` (`EventId=NewGuid`; `KeyId`→`Actor` with + `"cli"`/`"system"` fallback; `EventType`→`Action`; `CreatedUtc`→`OccurredAtUtc`; `RemoteAddress`→`SourceNode`; + `Outcome=Success`; `Category="ApiKey"`; `Target=KeyId`; `Details`→`DetailsJson` wrapped `{"detail":"…"}`) and + forwards to `IAuditWriter`. Its `ListRecentAsync` reads the canonical store and maps back to `ApiKeyAuditEntry` + (so the existing dashboard recent-audit view keeps working) **or** the dashboard view is repointed to canonical. +- **Local producers** (`DashboardApiKeyManagementService`, `ConstraintEnforcer`) rewritten to build canonical + `AuditEvent`s directly via `IAuditWriter` (`constraint-denied`→`Outcome.Denied`; capture `CorrelationId` from + `MxCommandRequest.ClientCorrelationId` (constraint path — needs threading down) / `HttpContext.TraceIdentifier` + (dashboard); structured `Target` from `commandKind`/`target` (GAPS #6)). +- **Open question for review:** retire `api_key_audit` (canonical store becomes the sole audit table) vs keep it + coexisting. Retiring is cleaner-deep but touches the library's store wiring; coexisting is lower-risk. +- **Effort/classification:** re-scoped from "standard ~5m" to **high-risk** (new store + migrator + adapter + + producer rewrites + dashboard read path + DI + tests). Realistically 2–3 sub-commits. + +--- + +## ScadaBridge — DEEP (Task 2.5, re-scoped) · risk: **VERY HIGH — audit-subsystem re-architecture** + +**This is the one to scrutinize at review.** The gate definitively answered the plan's central claim is FALSE. + +**Verified current state:** ScadaBridge's `AuditEvent` (`…Commons/Entities/Audit/AuditEvent.cs`) is a +**24-field** record — `EventId, OccurredAtUtc(DateTime), IngestedAtUtc, Channel(AuditChannel), Kind(AuditKind), +CorrelationId, ExecutionId, ParentExecutionId, SourceSiteId, SourceNode, SourceInstanceId, SourceScript, Actor, +Target, Status(AuditStatus), HttpStatus, DurationMs, ErrorMessage, ErrorDetail, RequestSummary, ResponseSummary, +PayloadTruncated, Extra, ForwardState(AuditForwardState?)`. It is the **storage shape of a partitioned SQL Server +audit table** with these as **queryable columns**. `IAuditPayloadFilter.Apply(ScadaBridgeAuditEvent) -> +ScadaBridgeAuditEvent` (NOT the library's record — a reflection contract test `PayloadFilterContractTests` pins +the typing). `IAuditWriter`/`ICentralAuditWriter` are likewise typed to the 24-field record. **`AuditStatus` +drives the site→central forwarding STATE MACHINE** (`Pending→Submitted→Forwarded→Reconciled`; +`Delivered`/`Failed`/`Parked`/`Discarded`) and the **filter's error-cap logic** (`IsErrorStatus`). The Central +reporting/UI queries by `Channel`/`Kind`/`Status`/`Site`. **Phase 1 did NOT touch any audit-pipeline file** (zero +drift). Blast radius of just the interface rename: ~10 files / ~20 sites; the contract test pins it. + +**What DEEP adoption concretely requires here (full honesty):** +Replacing the 24-field record with the 9-field canonical + pushing ~15 domain fields into `DetailsJson` means +**re-architecting the entire audit subsystem**, because those fields are not decorative — they are load-bearing: +1. **Storage:** migrate the partitioned SQL Server audit table from ~24 typed columns to the 9 canonical columns + + a JSON `DetailsJson` column. Massive, lossy-on-queryability data migration; partitioning scheme likely must + change; `IngestedAtUtc`/`ForwardState` are operational columns the forwarder UPDATEs. +3. **Forwarding state machine breaks:** `Status`/`ForwardState` move into opaque JSON — you cannot `UPDATE` a + JSON-embedded field as a column, and the reconciliation queries `WHERE Status/ForwardState = …` stop working. + The site→central forwarder would have to be redesigned (e.g., promote Status back out of JSON, defeating the + point). +4. **Redactor breaks:** `DefaultAuditPayloadFilter` reads `Channel`/`Status`/`RequestSummary`/`ResponseSummary`/ + `ErrorDetail`/`Extra`/`PayloadTruncated` to choose truncation caps — on a 9-field canonical record those are + gone (opaque in `DetailsJson`), so the filter must be rewritten to parse JSON. +5. **Reporting/UI breaks:** Central audit-log queries/filters by Channel/Kind/Status/Site lose SQL queryability. +6. ~Dozens of call sites + the contract test + the perf hot-path test. + +**Honest assessment:** ScadaBridge DEEP ≈ the **largest single undertaking in the whole program** (bigger than the +Phase-1 ApiKeys re-arch). The audit component's own GAPS doc says *"Align, don't replace"* for exactly this reason. + +**Bounded alternative to weigh at review (recommended if "deep" is to be kept tractable):** make the canonical +`ZB.MOM.WW.Audit.AuditEvent` the **seam/transport + cross-project reporting** shape (the redactor and an +`IAuditWriter` operate on the canonical record; domain richness rides in `DetailsJson`), while the **SQL storage +keeps its typed queryable columns** populated by a storage-side projection (canonical+DetailsJson → columns) and +the forwarding state machine continues to key on the `Status`/`ForwardState` columns. This delivers "deep" at the +seam/record level (library types consumed; domain fields in `DetailsJson` for the canonical view) **without** +gutting the partitioned store, the state machine, the filter, or the reporting — a far safer "deep." + +--- + +## Cross-cutting + +- **Branch model:** `feat/adopt-zb-audit` per app, **stacked on `feat/adopt-zb-auth` HEAD** (Phase 3 wires the + audit `Actor` from the Phase-1 Auth principal, so audit must build on auth). Local-only, never pushed. +- **No library change / republish** needed for the chosen designs (MxGateway adapts in-repo) — so no Gitea token + required unless the user later wants the canonical mapping pushed into a shared lib. +- **Phase 3 (unchanged in intent):** `IAuditActorAccessor` seam + wire `AuditEvent.Actor` from the Auth principal + at every authenticated emit site; keep `"system"`/`"cli"` fallbacks for keyless paths. + +## Re-scoped task list (for review) + +| # | Repo | Re-scoped scope | Class | Risk | +|---|---|---|---|---| +| 2.1 | OtOpcUa | Commons record → canonical `AuditEvent`; `AuditWriterActor : IAuditWriter`; `Outcome` derivation; Akka-wire note (dormant) | high-risk | Low–Med | +| 2.2 | OtOpcUa | `ConfigAuditLog.Outcome` column + EF migration + `ClusterAudit` visibility fix; SP path bespoke | high-risk | Low–Med | +| 2.3 | MxGateway | new canonical SQLite `audit_event` store + migrator; `IAuditWriter`; `IApiKeyAuditStore`→canonical adapter (for library-internal CLI events) incl. `ListRecentAsync`; rewrite local producers; CorrelationId/Target capture; DI; tests | **high-risk** (↑ from standard) | Med–High | +| 2.5 | ScadaBridge | **DEEP = audit-subsystem re-arch** (24-field→9-field record everywhere; domain fields→`DetailsJson`; SQL partitioned-table migration; forwarding state machine + filter + reporting rewrite; contract/perf tests) — **OR** the bounded "deep-at-the-seam" alternative above | **very-high-risk** | **VERY HIGH** | + +## Open items to confirm at review +1. **ScadaBridge:** full audit re-architecture (pure 9-col storage) vs the **bounded "deep-at-the-seam"** variant + (canonical record at the seam/reporting boundary; keep typed storage columns + state machine). Strongly + recommend the bounded variant. +2. **MxGateway:** retire `api_key_audit` (canonical store is sole) vs keep it coexisting. +3. **OtOpcUa:** confirm leaving the SP path bespoke (structured path is dormant; canonicalization is forward-looking + prep) is acceptable, and the `ClusterAudit` fix approach (OR-predicate vs populate `ClusterId`). +4. **Sequencing:** OtOpcUa (2.1→2.2) and MxGateway (2.3) are independent + tractable; ScadaBridge (2.5) is the + gating risk — do it last, and as staged reviewed sub-commits regardless of variant. diff --git a/docs/plans/2026-06-02-auth-audit-normalization.md b/docs/plans/2026-06-02-auth-audit-normalization.md index 3f41097..d791daa 100644 --- a/docs/plans/2026-06-02-auth-audit-normalization.md +++ b/docs/plans/2026-06-02-auth-audit-normalization.md @@ -239,6 +239,16 @@ Order within the phase (per `components/auth/GAPS.md` sequencing): **#3 seam → ## PHASE 2 — Audit adoption (audit GAPS #1–#3, #5, #6) +> ⚠️ **RE-SCOPED 2026-06-02 — the task specs below are SUPERSEDED.** The Task 2.0 gate (verified against +> live code) found these specs materially wrong: MxGateway's audit files moved into the shared library +> (Phase 1), OtOpcUa's structured audit path is dormant (zero emit sites), and the ScadaBridge +> "outright rename" is structurally impossible (its filter is typed to its own 24-field record, not the +> library's 9-field one). The user chose **DEEP adopt (canonical record)** + **pause for review**. The +> corrected, gate-grounded deep design is in +> [`2026-06-02-auth-audit-normalization-phase2-deep.md`](2026-06-02-auth-audit-normalization-phase2-deep.md) +> — **implementation is PAUSED pending user review of that doc (esp. the ScadaBridge audit-subsystem +> re-architecture cost).** The original specs below are kept for historical context only. + Branch `feat/adopt-zb-audit` per repo. Behaviour-preserving except the OtOpcUa `Outcome` column + `ClusterId` visibility fix. Concrete paths below come from `components/audit/current-state/*`. ### Task 2.0: Explore audit source + confirm elaboration *(GATE — light, paths already known)* diff --git a/docs/plans/2026-06-02-auth-audit-normalization.md.tasks.json b/docs/plans/2026-06-02-auth-audit-normalization.md.tasks.json index d9899a3..4cfbabd 100644 --- a/docs/plans/2026-06-02-auth-audit-normalization.md.tasks.json +++ b/docs/plans/2026-06-02-auth-audit-normalization.md.tasks.json @@ -23,7 +23,7 @@ {"id": 23, "subject": "Task 1.6: Unify dev base DN (#6) — DONE all 3 to dc=zb,dc=local (OtOpcUa 8ba289f, MxGw 9572045, SB 6ae6051)", "status": "completed", "blockedBy": [17]}, {"id": 24, "subject": "Task 1.7: Canonical roles native expansion (#8) [high-risk] — DONE all 3, full-value canonical (MxGw 04bce3ff, OtOpcUa c1619d9 +DB-mig, SB b104760+4118452 +DB-mig +SoD collapse)", "status": "completed", "blockedBy": [18]}, - {"id": 25, "subject": "Task 2.0: GATE confirm audit source refs", "status": "pending", "blockedBy": [8]}, + {"id": 25, "subject": "Task 2.0: GATE confirm audit source refs — DONE; found plan specs materially off → DEEP re-scope in -phase2-deep.md; PAUSED for user review before 2.1/2.2/2.3/2.5", "status": "completed", "blockedBy": [8]}, {"id": 26, "subject": "Task 2.1: OtOpcUa canonical record + IAuditWriter + Outcome (#1) [high-risk]", "status": "pending", "blockedBy": [25]}, {"id": 27, "subject": "Task 2.2: OtOpcUa Outcome migration + ClusterId fix (#1,#5) [high-risk]", "status": "pending", "blockedBy": [26]}, {"id": 28, "subject": "Task 2.3: MxGateway store→IAuditWriter adapter (#2,#6)", "status": "pending", "blockedBy": [25]},