Files
scadaproj/docs/plans/2026-06-02-scadabridge-audit-rearch.md
T

74 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition
Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)**
for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full
audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`;
ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task.
## Key resolutions (from the design)
- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL
`AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is
pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops
`ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**,
add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed
`IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar;
`ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain
hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`).
- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes +
`SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME
`ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson`
(`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays).
- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical
`Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns**
`Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned
indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff.
- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact +
byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical
`Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a
fast-path that skips JSON parse when nothing to redact. `SafeDefault``SafeDefaultAuditRedactor`. Re-baseline the
perf hot-path budgets (JSON parse/rewrite is ~24× the typed-field path).
- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/
Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked
first); `Status==Delivered`→`Success`; `Status∈{Failed,Parked,Discarded}`→`Failure`; in-flight/`Skipped`→`Success`.
- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId,
sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary,
payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is
MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would
break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.)
- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types
pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable.
- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects
to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a
separate later task.)
## Staged decomposition (C1C7)
| Stage | Scope | Green? | Class | Risk |
|---|---|---|---|---|
| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial |
| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low |
| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent` → `ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter`→`IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter` → `audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH |
| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med |
| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low |
**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.
## Top risks (carry into execution)
1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic.
3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7.
4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing.
Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.
## Stage status (live)
- **✅ C1 DONE** `3d77dc0` (code ✅) — `AuditDetails` + deterministic `AuditDetailsCodec` (pinned byte-exact) + `AuditOutcomeProjector` + `AuditFieldBuilders` + Commons→`ZB.MOM.WW.Audit` ref; 56 tests.
- **✅ C2 DONE** `adfb4d3` + fix `5aaf9e2` (spec ✅, code ✅ after fix) — `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` on the canonical record; redaction primitives extracted into shared `AuditRedactionPrimitives`/`AuditRegexCache` (old filter delegates, behaviour-preserved); cap-selection reads `d.Status` (faithful to legacy `IsErrorStatus`); fast-path + never-throws; review-fix hardened `OverRedact` to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
- **✅ C3 DONE** `db707bb` + fix `c27b2c3` (spec ✅, code ✅; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); `ScadaBridgeAuditEventFactory` single emit point; `AuditRowProjection` Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old `IAuditPayloadFilter` classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in `MapRow`+`FromDto`.
- **⏳ C4 IN PROGRESS** — site SQLite → `audit_event` (canonical) + `audit_forward_state` sidecar; forwarding on the sidecar; `IsCachedKind` drain split.
- ⏳ C5/C6/C7 pending.