plan(2.5): ScadaBridge audit full-rearch design + C1-C7 decomposition (sidecar forwarding, new-table-copy central migration, persisted computed cols, canonical record everywhere)
This commit is contained in:
@@ -0,0 +1,66 @@
|
||||
# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition
|
||||
|
||||
Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)**
|
||||
for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full
|
||||
audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`;
|
||||
ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task.
|
||||
|
||||
## Key resolutions (from the design)
|
||||
|
||||
- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL
|
||||
`AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is
|
||||
pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops
|
||||
`ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**,
|
||||
add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed
|
||||
`IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar;
|
||||
`ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain
|
||||
hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`).
|
||||
- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes +
|
||||
`SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME
|
||||
`ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson`
|
||||
(`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays).
|
||||
- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical
|
||||
`Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns**
|
||||
`Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned
|
||||
indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff.
|
||||
- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact +
|
||||
byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical
|
||||
`Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a
|
||||
fast-path that skips JSON parse when nothing to redact. `SafeDefault` → `SafeDefaultAuditRedactor`. Re-baseline the
|
||||
perf hot-path budgets (JSON parse/rewrite is ~2–4× the typed-field path).
|
||||
- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/
|
||||
Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked
|
||||
first); `Status==Delivered`→`Success`; `Status∈{Failed,Parked,Discarded}`→`Failure`; in-flight/`Skipped`→`Success`.
|
||||
- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId,
|
||||
sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary,
|
||||
payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is
|
||||
MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would
|
||||
break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.)
|
||||
- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types
|
||||
pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable.
|
||||
- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects
|
||||
to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a
|
||||
separate later task.)
|
||||
|
||||
## Staged decomposition (C1–C7)
|
||||
|
||||
| Stage | Scope | Green? | Class | Risk |
|
||||
|---|---|---|---|---|
|
||||
| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial |
|
||||
| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low |
|
||||
| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent` → `ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter`→`IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** |
|
||||
| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter` → `audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH |
|
||||
| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** |
|
||||
| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med |
|
||||
| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low |
|
||||
|
||||
**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.
|
||||
|
||||
## Top risks (carry into execution)
|
||||
1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
|
||||
2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic.
|
||||
3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7.
|
||||
4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
|
||||
5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing.
|
||||
|
||||
Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.
|
||||
Reference in New Issue
Block a user