Files
scadaproj/docs/plans/2026-06-02-scadabridge-audit-rearch.md
T

9.4 KiB
Raw Blame History

ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition

Companion to 2026-06-02-auth-audit-normalization-phase2-deep.md. User chose Full re-arch (pure 9-col storage) for ScadaBridge audit. Architect design pass (read-only, verified on feat/adopt-zb-audit) produced this. The full audit record becomes the library 9-field ZB.MOM.WW.Audit.AuditEvent; ~15 domain fields relocate into DetailsJson; ScadaBridge consumes the library IAuditWriter/IAuditRedactor/AuditOutcome. This is the program's largest task.

Key resolutions (from the design)

  • Forwarding state machine (the crux) → resolved cleanly. It lives only in site SQLite; the central MS SQL AuditLog table is append-only (DENY UPDATE/DELETE; central rows leave ForwardState null; reconciliation is pure idempotent-insert with in-memory cursors), and the gRPC AuditEventDtoMapper already drops ForwardState/IngestedAtUtc on the wire. So central needs NO forwarding columns (pure 9-col). On the site, add a sidecar audit_forward_state table keyed by EventId (ForwardState, OccurredAtUtc, precomputed IsCachedKind, optional AttemptCount/LastAttemptUtc) — MarkForwarded/MarkReconciled UPDATE the sidecar; ReadPending* JOIN it; the canonical audit_event table is write-once. Precomputing IsCachedKind keeps the drain hot path off JSON parsing (strictly faster than today's Kind NOT IN(...)).
  • Central storage migration → new table + copy (in-place collapse infeasible: partition-aligned indexes + SwitchOutPartitionAsync hard-codes a byte-identical staging column list). New 10-col table on the SAME ps_AuditLog_Month(OccurredAtUtc) scheme; per-partition data copy projecting old typed columns into DetailsJson (FOR JSON PATH); rename + role re-grant (append-only preserved). Partitioning preserved (OccurredAtUtc stays).
  • Reporting queryability → persisted computed columns for hot filters. Category(=Channel) + canonical Outcome/Target/Actor/SourceNode/CorrelationId cover most filters directly. Add PERSISTED computed columns Kind/Status/SourceSiteId/ExecutionId/ParentExecutionId (JSON_VALUE(DetailsJson,'$.x')) + partition-aligned indexes so the existing index semantics + the GetExecutionTreeAsync recursive CTE survive without a JSON perf cliff.
  • Redactor → ScadaBridgeAuditRedactor : IAuditRedactor on the canonical record: parse DetailsJson once, redact + byte-safe-truncate requestSummary/responseSummary/errorDetail/extra in the JSON tree, cap on canonical Category/Outcome (replacing the typed Channel/Status reads), set payloadTruncated, re-serialize. Add a fast-path that skips JSON parse when nothing to redact. SafeDefaultSafeDefaultAuditRedactor. Re-baseline the perf hot-path budgets (JSON parse/rewrite is ~24× the typed-field path).
  • Canonical field mapping: Action = "{Channel}.{Kind}"; Category = Channel; Target/SourceNode/CorrelationId/ Actor/OccurredAtUtc direct (DateTime→DateTimeOffset UTC). Outcome: Kind==InboundAuthFailureDenied (checked first); Status==DeliveredSuccess; Status∈{Failed,Parked,Discarded}Failure; in-flight/SkippedSuccess.
  • DetailsJson schema (camelCase, stable): channel, kind, status, executionId, parentExecutionId, sourceSiteId, sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary, payloadTruncated, extra, ingestedAtUtc. One shared AuditDetailsCodec (Commons) with deterministic options is MANDATORY — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would break dedup. (forwardState is NOT in DetailsJson — it's site-sidecar only.)
  • Commons takes the ZB.MOM.WW.Audit package ref (the record lives in Commons; the package is a leaf canonical-types pkg, only dep Microsoft.Extensions.DependencyInjection.Abstractions). Acceptable.
  • gRPC proto kept UNCHANGED — the wire AuditEventDto stays 24-field internally; AuditEventDtoMapper projects to/from DetailsJson. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a separate later task.)

Staged decomposition (C1C7)

Stage Scope Green? Class Risk
C1 Commons: add ZB.MOM.WW.Audit ref; new pure types AuditDetails record + AuditDetailsCodec (deterministic) + Status/Kind→AuditOutcome projection + Action/Category builders. No existing type changes. yes small trivial
C2 ScadaBridgeAuditRedactor/SafeDefaultAuditRedactor : IAuditRedactor (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old IAuditPayloadFilter still wired; unit-tested in isolation. yes standard low
C3 ATOMIC CUT — swap the record everywhere. Commons.Entities.Audit.AuditEventZB.MOM.WW.Audit.AuditEvent across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (IAuditWriter/ICentralAuditWriter/ISiteAuditQueue/IAuditLogRepository/AuditLogQueryFilter) re-type; AuditEventDtoMapper DTO↔canonical (proto unchanged); switch redactor wiring IAuditPayloadFilterIAuditRedactor. boundaries only high-risk HIGHEST
C4 Site SQLite two-table forwarding: SqliteAuditWriteraudit_event + audit_forward_state; retarget MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow to JOIN+sidecar; precompute IsCachedKind. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. yes high-risk HIGH
C5 ATOMIC CUT — central migration. EF CollapseAuditLogToCanonical: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update AuditLogRepository.InsertIfNotExistsAsync + SwitchOutPartitionAsync staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. boundaries only high-risk HIGHEST
C6 Reporting/UI/export retarget: QueryAsync/GetKpiSnapshotAsync/GetExecutionTreeAsync predicates→canonical/computed cols; AuditLogExportService+AuditEndpoints CSV + CentralUI Audit components + CLI parse DetailsJson for display. yes standard med
C7 Tests + perf re-baseline + cleanup: rewrite PayloadFilterContractTests/redaction/HotPathLatencyTests to canonical+JSON + new budget; delete dead Commons.Entities.Audit.AuditEvent, 4 audit enums (or relocate behind codec), IAuditPayloadFilter/Default/SafeDefault, obsolete AddColumnIfMissing. yes standard low

Atomic cuts: only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.

Top risks (carry into execution)

  1. C5 partition + SwitchOutPartitionAsync + persisted computed columns — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. Documented fallback: if too brittle, keep Kind/Status as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
  2. DetailsJson determinism — single AuditDetailsCodec (C1) is load-bearing for value-equality/dedup, not cosmetic.
  3. Redactor perf — budgets move; add the no-op fast-path + empirically re-baseline in C7.
  4. gRPC — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
  5. Action=Channel.Kind lossiness — mitigated by Category(=channel) + persisted computed Kind; ScadaBridge-internal filtering uses those, not Action parsing.

Delivery: feat/adopt-zb-audit (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.

Stage status (live)

  • C1 DONE 3d77dc0 (code ) — AuditDetails + deterministic AuditDetailsCodec (pinned byte-exact) + AuditOutcomeProjector + AuditFieldBuilders + Commons→ZB.MOM.WW.Audit ref; 56 tests.
  • C2 DONE adfb4d3 + fix 5aaf9e2 (spec , code after fix) — ScadaBridgeAuditRedactor/SafeDefaultAuditRedactor : IAuditRedactor on the canonical record; redaction primitives extracted into shared AuditRedactionPrimitives/AuditRegexCache (old filter delegates, behaviour-preserved); cap-selection reads d.Status (faithful to legacy IsErrorStatus); fast-path + never-throws; review-fix hardened OverRedact to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
  • C3 DONE db707bb + fix c27b2c3 (spec , code ; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); ScadaBridgeAuditEventFactory single emit point; AuditRowProjection Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old IAuditPayloadFilter classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in MapRow+FromDto.
  • C4 IN PROGRESS — site SQLite → audit_event (canonical) + audit_forward_state sidecar; forwarding on the sidecar; IsCachedKind drain split.
  • C5/C6/C7 pending.