From 98e957903f7daf01e7e44ca61c324e49b2193017 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 2 Jun 2026 10:36:00 -0400 Subject: [PATCH] plan(2.5): ScadaBridge audit full-rearch design + C1-C7 decomposition (sidecar forwarding, new-table-copy central migration, persisted computed cols, canonical record everywhere) --- .../2026-06-02-scadabridge-audit-rearch.md | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 docs/plans/2026-06-02-scadabridge-audit-rearch.md diff --git a/docs/plans/2026-06-02-scadabridge-audit-rearch.md b/docs/plans/2026-06-02-scadabridge-audit-rearch.md new file mode 100644 index 0000000..34ea463 --- /dev/null +++ b/docs/plans/2026-06-02-scadabridge-audit-rearch.md @@ -0,0 +1,66 @@ +# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition + +Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)** +for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full +audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`; +ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task. + +## Key resolutions (from the design) + +- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL + `AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is + pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops + `ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**, + add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed + `IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar; + `ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain + hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`). +- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes + + `SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME + `ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson` + (`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays). +- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical + `Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns** + `Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned + indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff. +- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact + + byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical + `Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a + fast-path that skips JSON parse when nothing to redact. `SafeDefault` → `SafeDefaultAuditRedactor`. Re-baseline the + perf hot-path budgets (JSON parse/rewrite is ~2–4× the typed-field path). +- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/ + Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked + first); `Status==Delivered`→`Success`; `Status∈{Failed,Parked,Discarded}`→`Failure`; in-flight/`Skipped`→`Success`. +- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId, + sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary, + payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is + MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would + break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.) +- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types + pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable. +- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects + to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a + separate later task.) + +## Staged decomposition (C1–C7) + +| Stage | Scope | Green? | Class | Risk | +|---|---|---|---|---| +| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial | +| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low | +| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent` → `ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter`→`IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** | +| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter` → `audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH | +| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** | +| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med | +| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low | + +**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion. + +## Top risks (carry into execution) +1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked. +2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic. +3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7. +4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut. +5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing. + +Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.