Files
scadaproj/docs/plans/2026-06-02-scadabridge-audit-rearch.md
T

13 KiB
Raw Blame History

ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition

Companion to 2026-06-02-auth-audit-normalization-phase2-deep.md. User chose Full re-arch (pure 9-col storage) for ScadaBridge audit. Architect design pass (read-only, verified on feat/adopt-zb-audit) produced this. The full audit record becomes the library 9-field ZB.MOM.WW.Audit.AuditEvent; ~15 domain fields relocate into DetailsJson; ScadaBridge consumes the library IAuditWriter/IAuditRedactor/AuditOutcome. This is the program's largest task.

Key resolutions (from the design)

  • Forwarding state machine (the crux) → resolved cleanly. It lives only in site SQLite; the central MS SQL AuditLog table is append-only (DENY UPDATE/DELETE; central rows leave ForwardState null; reconciliation is pure idempotent-insert with in-memory cursors), and the gRPC AuditEventDtoMapper already drops ForwardState/IngestedAtUtc on the wire. So central needs NO forwarding columns (pure 9-col). On the site, add a sidecar audit_forward_state table keyed by EventId (ForwardState, OccurredAtUtc, precomputed IsCachedKind, optional AttemptCount/LastAttemptUtc) — MarkForwarded/MarkReconciled UPDATE the sidecar; ReadPending* JOIN it; the canonical audit_event table is write-once. Precomputing IsCachedKind keeps the drain hot path off JSON parsing (strictly faster than today's Kind NOT IN(...)).
  • Central storage migration → new table + copy (in-place collapse infeasible: partition-aligned indexes + SwitchOutPartitionAsync hard-codes a byte-identical staging column list). New 10-col table on the SAME ps_AuditLog_Month(OccurredAtUtc) scheme; per-partition data copy projecting old typed columns into DetailsJson (FOR JSON PATH); rename + role re-grant (append-only preserved). Partitioning preserved (OccurredAtUtc stays).
  • Reporting queryability → persisted computed columns for hot filters. Category(=Channel) + canonical Outcome/Target/Actor/SourceNode/CorrelationId cover most filters directly. Add PERSISTED computed columns Kind/Status/SourceSiteId/ExecutionId/ParentExecutionId (JSON_VALUE(DetailsJson,'$.x')) + partition-aligned indexes so the existing index semantics + the GetExecutionTreeAsync recursive CTE survive without a JSON perf cliff.
  • Redactor → ScadaBridgeAuditRedactor : IAuditRedactor on the canonical record: parse DetailsJson once, redact + byte-safe-truncate requestSummary/responseSummary/errorDetail/extra in the JSON tree, cap on canonical Category/Outcome (replacing the typed Channel/Status reads), set payloadTruncated, re-serialize. Add a fast-path that skips JSON parse when nothing to redact. SafeDefaultSafeDefaultAuditRedactor. Re-baseline the perf hot-path budgets (JSON parse/rewrite is ~24× the typed-field path).
  • Canonical field mapping: Action = "{Channel}.{Kind}"; Category = Channel; Target/SourceNode/CorrelationId/ Actor/OccurredAtUtc direct (DateTime→DateTimeOffset UTC). Outcome: Kind==InboundAuthFailureDenied (checked first); Status==DeliveredSuccess; Status∈{Failed,Parked,Discarded}Failure; in-flight/SkippedSuccess.
  • DetailsJson schema (camelCase, stable): channel, kind, status, executionId, parentExecutionId, sourceSiteId, sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary, payloadTruncated, extra, ingestedAtUtc. One shared AuditDetailsCodec (Commons) with deterministic options is MANDATORY — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would break dedup. (forwardState is NOT in DetailsJson — it's site-sidecar only.)
  • Commons takes the ZB.MOM.WW.Audit package ref (the record lives in Commons; the package is a leaf canonical-types pkg, only dep Microsoft.Extensions.DependencyInjection.Abstractions). Acceptable.
  • gRPC proto kept UNCHANGED — the wire AuditEventDto stays 24-field internally; AuditEventDtoMapper projects to/from DetailsJson. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a separate later task.)

Staged decomposition (C1C7)

Stage Scope Green? Class Risk
C1 Commons: add ZB.MOM.WW.Audit ref; new pure types AuditDetails record + AuditDetailsCodec (deterministic) + Status/Kind→AuditOutcome projection + Action/Category builders. No existing type changes. yes small trivial
C2 ScadaBridgeAuditRedactor/SafeDefaultAuditRedactor : IAuditRedactor (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old IAuditPayloadFilter still wired; unit-tested in isolation. yes standard low
C3 ATOMIC CUT — swap the record everywhere. Commons.Entities.Audit.AuditEventZB.MOM.WW.Audit.AuditEvent across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (IAuditWriter/ICentralAuditWriter/ISiteAuditQueue/IAuditLogRepository/AuditLogQueryFilter) re-type; AuditEventDtoMapper DTO↔canonical (proto unchanged); switch redactor wiring IAuditPayloadFilterIAuditRedactor. boundaries only high-risk HIGHEST
C4 Site SQLite two-table forwarding: SqliteAuditWriteraudit_event + audit_forward_state; retarget MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow to JOIN+sidecar; precompute IsCachedKind. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. yes high-risk HIGH
C5 ATOMIC CUT — central migration. EF CollapseAuditLogToCanonical: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update AuditLogRepository.InsertIfNotExistsAsync + SwitchOutPartitionAsync staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. boundaries only high-risk HIGHEST
C6 Reporting/UI/export retarget: QueryAsync/GetKpiSnapshotAsync/GetExecutionTreeAsync predicates→canonical/computed cols; AuditLogExportService+AuditEndpoints CSV + CentralUI Audit components + CLI parse DetailsJson for display. yes standard med
C7 Tests + perf re-baseline + cleanup: rewrite PayloadFilterContractTests/redaction/HotPathLatencyTests to canonical+JSON + new budget; delete dead Commons.Entities.Audit.AuditEvent, 4 audit enums (or relocate behind codec), IAuditPayloadFilter/Default/SafeDefault, obsolete AddColumnIfMissing. yes standard low

Atomic cuts: only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.

Top risks (carry into execution)

  1. C5 partition + SwitchOutPartitionAsync + persisted computed columns — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. Documented fallback: if too brittle, keep Kind/Status as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
  2. DetailsJson determinism — single AuditDetailsCodec (C1) is load-bearing for value-equality/dedup, not cosmetic.
  3. Redactor perf — budgets move; add the no-op fast-path + empirically re-baseline in C7.
  4. gRPC — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
  5. Action=Channel.Kind lossiness — mitigated by Category(=channel) + persisted computed Kind; ScadaBridge-internal filtering uses those, not Action parsing.

Delivery: feat/adopt-zb-audit (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.

Stage status (live)

  • C1 DONE 3d77dc0 (code ) — AuditDetails + deterministic AuditDetailsCodec (pinned byte-exact) + AuditOutcomeProjector + AuditFieldBuilders + Commons→ZB.MOM.WW.Audit ref; 56 tests.
  • C2 DONE adfb4d3 + fix 5aaf9e2 (spec , code after fix) — ScadaBridgeAuditRedactor/SafeDefaultAuditRedactor : IAuditRedactor on the canonical record; redaction primitives extracted into shared AuditRedactionPrimitives/AuditRegexCache (old filter delegates, behaviour-preserved); cap-selection reads d.Status (faithful to legacy IsErrorStatus); fast-path + never-throws; review-fix hardened OverRedact to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
  • C3 DONE db707bb + fix c27b2c3 (spec , code ; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); ScadaBridgeAuditEventFactory single emit point; AuditRowProjection Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old IAuditPayloadFilter classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in MapRow+FromDto.
  • C4 DONE 946d3e2 + fix 1737d15 (spec , code ; independently re-verified diff scope = writer+tests only, build 0/0, AuditLog 249/1-preexisting). Site SQLite → audit_event (canonical) + audit_forward_state sidecar; forwarding marks/reads on the sidecar via JOIN; IsCachedKind={CachedSubmit,ApiCallCached,DbWriteCached,CachedResolve} precomputed drain split; old AuditLog table dropped (ephemeral reset). Fix: PRAGMA foreign_keys=ON + MarkForwarded no-demote guard.
  • C5 DONE 68a6bd1 (spec , code ; a LIVE SQL Server was available so the migration + SWITCH were fully exercised — independently re-verified build 0/0 + ConfigurationDatabase 248/248). Central dbo.AuditLog collapsed to 10 canonical cols + 6 computed cols (5 PERSISTED + IngestedAtUtc non-persisted) on the preserved ps_AuditLog_Month scheme; CollapseAuditLogToCanonical new-table-and-copy migration (FOR JSON PATH projection, byte-verified round-trip; Down = documented one-way); repo writes/reads canonical directly; SwitchOutPartition staging matches the computed-col defs; append-only roles re-granted. C3 central shim retired. Forced deviations (all sound): IngestedAtUtc non-persisted, execution-id indexes unfiltered, provider-aware OnModelCreating strips JSON_VALUE for SQLite. Deferred to C7: a dedicated migration-projection test + the stale CreatesFiveNamedIndexes test name.
  • C6 SUBSUMED (no commit) — reporting/UI/export/CLI retarget was already completed by the C3 record-swap (AuditEventView/AuditExportRow shims decode every domain field from DetailsJson) + the C5 repo-query retarget. Read-only explorer verdict: all consumer surfaces canonical-complete; the only flagged items (ExecutionId/ParentExecutionId not in CSV; SourceNodes not parsed in export ParseFilter) are PRE-rearch omissions, not regressions. CentralUI 595/595, ManagementService 125/125 confirm.
  • C7 DONE 635461c + doc-fix bc0e5bf (review ; independently re-verified build 0/0, PerformanceTests 10/10, ConfigurationDatabase 251/251 incl. the 3 new migration-projection tests PASSING on live MSSQL, zero dead crefs). Perf hot-path re-baselined (canonical JSON redactor measured ~14µs/2µs — faster than the old typed walk; budgets 200/30/5µs + fast-path Assert.Same); CollapseAuditLogToCanonicalMigrationTests (seed→migrate→assert Action/Category/Outcome/Actor-null/DetailsJson-round-trip + 5 persisted computed cols); index test → CreatesNineNamedIndexes; 26 dead-<see cref> across 13 files cleaned; doc-fix corrected the "six persisted" wording (5 persisted + IngestedAtUtc non-persisted).

TASK 2.5 COMPLETE — ScadaBridge audit FULL re-architecture to pure 9-col canonical (2026-06-02)

All of C1C7 done, each spec+code reviewed, on feat/adopt-zb-audit (local-only, never pushed). ScadaBridge's audit subsystem now: the canonical ZB.MOM.WW.Audit.AuditEvent record everywhere (domain fields in DetailsJson via the deterministic AuditDetailsCodec); the library IAuditRedactor/AuditOutcome consumed; site SQLite = audit_event (canonical) + audit_forward_state sidecar (forwarding decoupled, IsCachedKind drain split); central dbo.AuditLog collapsed to 10 canonical cols + persisted computed cols on the preserved partition scheme (CollapseAuditLogToCanonical migration, MSSQL-verified); UI/export/CLI canonical-complete via AuditEventView/AuditExportRow. The gRPC proto was intentionally left unchanged (mapper-internal projection). This was the program's single largest task.