Files
scadaproj/docs/plans/2026-06-02-scadabridge-audit-rearch.md
T

79 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition
Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)**
for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full
audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`;
ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task.
## Key resolutions (from the design)
- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL
`AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is
pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops
`ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**,
add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed
`IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar;
`ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain
hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`).
- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes +
`SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME
`ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson`
(`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays).
- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical
`Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns**
`Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned
indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff.
- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact +
byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical
`Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a
fast-path that skips JSON parse when nothing to redact. `SafeDefault``SafeDefaultAuditRedactor`. Re-baseline the
perf hot-path budgets (JSON parse/rewrite is ~24× the typed-field path).
- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/
Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked
first); `Status==Delivered`→`Success`; `Status∈{Failed,Parked,Discarded}`→`Failure`; in-flight/`Skipped`→`Success`.
- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId,
sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary,
payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is
MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would
break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.)
- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types
pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable.
- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects
to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a
separate later task.)
## Staged decomposition (C1C7)
| Stage | Scope | Green? | Class | Risk |
|---|---|---|---|---|
| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial |
| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low |
| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent` → `ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter`→`IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter` → `audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH |
| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med |
| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low |
**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.
## Top risks (carry into execution)
1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic.
3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7.
4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing.
Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.
## Stage status (live)
- **✅ C1 DONE** `3d77dc0` (code ✅) — `AuditDetails` + deterministic `AuditDetailsCodec` (pinned byte-exact) + `AuditOutcomeProjector` + `AuditFieldBuilders` + Commons→`ZB.MOM.WW.Audit` ref; 56 tests.
- **✅ C2 DONE** `adfb4d3` + fix `5aaf9e2` (spec ✅, code ✅ after fix) — `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` on the canonical record; redaction primitives extracted into shared `AuditRedactionPrimitives`/`AuditRegexCache` (old filter delegates, behaviour-preserved); cap-selection reads `d.Status` (faithful to legacy `IsErrorStatus`); fast-path + never-throws; review-fix hardened `OverRedact` to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
- **✅ C3 DONE** `db707bb` + fix `c27b2c3` (spec ✅, code ✅; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); `ScadaBridgeAuditEventFactory` single emit point; `AuditRowProjection` Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old `IAuditPayloadFilter` classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in `MapRow`+`FromDto`.
- **✅ C4 DONE** `946d3e2` + fix `1737d15` (spec ✅, code ✅; independently re-verified diff scope = writer+tests only, build 0/0, AuditLog 249/1-preexisting). Site SQLite → `audit_event` (canonical) + `audit_forward_state` sidecar; forwarding marks/reads on the sidecar via JOIN; `IsCachedKind`={CachedSubmit,ApiCallCached,DbWriteCached,CachedResolve} precomputed drain split; old `AuditLog` table dropped (ephemeral reset). Fix: `PRAGMA foreign_keys=ON` + `MarkForwarded` no-demote guard.
- **✅ C5 DONE** `68a6bd1` (spec ✅, code ✅; a LIVE SQL Server was available so the migration + SWITCH were fully exercised — independently re-verified build 0/0 + ConfigurationDatabase 248/248). Central `dbo.AuditLog` collapsed to 10 canonical cols + 6 computed cols (5 PERSISTED + `IngestedAtUtc` non-persisted) on the preserved `ps_AuditLog_Month` scheme; `CollapseAuditLogToCanonical` new-table-and-copy migration (`FOR JSON PATH` projection, byte-verified round-trip; Down = documented one-way); repo writes/reads canonical directly; `SwitchOutPartition` staging matches the computed-col defs; append-only roles re-granted. C3 central shim retired. Forced deviations (all sound): IngestedAtUtc non-persisted, execution-id indexes unfiltered, provider-aware `OnModelCreating` strips JSON_VALUE for SQLite. Deferred to C7: a dedicated migration-projection test + the stale `CreatesFiveNamedIndexes` test name.
- **✅ C6 SUBSUMED** (no commit) — reporting/UI/export/CLI retarget was already completed by the C3 record-swap (`AuditEventView`/`AuditExportRow` shims decode every domain field from `DetailsJson`) + the C5 repo-query retarget. Read-only explorer verdict: all consumer surfaces canonical-complete; the only flagged items (ExecutionId/ParentExecutionId not in CSV; SourceNodes not parsed in export `ParseFilter`) are PRE-rearch omissions, not regressions. CentralUI 595/595, ManagementService 125/125 confirm.
- **✅ C7 DONE** `635461c` + doc-fix `bc0e5bf` (review ✅; independently re-verified build 0/0, PerformanceTests 10/10, ConfigurationDatabase 251/251 incl. the 3 new migration-projection tests PASSING on live MSSQL, zero dead crefs). Perf hot-path re-baselined (canonical JSON redactor measured ~14µs/2µs — faster than the old typed walk; budgets 200/30/5µs + fast-path `Assert.Same`); `CollapseAuditLogToCanonicalMigrationTests` (seed→migrate→assert Action/Category/Outcome/Actor-null/DetailsJson-round-trip + 5 persisted computed cols); index test → `CreatesNineNamedIndexes`; 26 dead-`<see cref>` across 13 files cleaned; doc-fix corrected the "six persisted" wording (5 persisted + IngestedAtUtc non-persisted).
## ✅ TASK 2.5 COMPLETE — ScadaBridge audit FULL re-architecture to pure 9-col canonical (2026-06-02)
All of C1C7 done, each spec+code reviewed, on `feat/adopt-zb-audit` (local-only, never pushed). ScadaBridge's audit subsystem now: the canonical `ZB.MOM.WW.Audit.AuditEvent` record everywhere (domain fields in `DetailsJson` via the deterministic `AuditDetailsCodec`); the library `IAuditRedactor`/`AuditOutcome` consumed; site SQLite = `audit_event` (canonical) + `audit_forward_state` sidecar (forwarding decoupled, `IsCachedKind` drain split); central `dbo.AuditLog` collapsed to 10 canonical cols + persisted computed cols on the preserved partition scheme (`CollapseAuditLogToCanonical` migration, MSSQL-verified); UI/export/CLI canonical-complete via `AuditEventView`/`AuditExportRow`. The gRPC proto was intentionally left unchanged (mapper-internal projection). This was the program's single largest task.