79 lines
13 KiB
Markdown
79 lines
13 KiB
Markdown
# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition
|
||
|
||
Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)**
|
||
for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full
|
||
audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`;
|
||
ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task.
|
||
|
||
## Key resolutions (from the design)
|
||
|
||
- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL
|
||
`AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is
|
||
pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops
|
||
`ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**,
|
||
add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed
|
||
`IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar;
|
||
`ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain
|
||
hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`).
|
||
- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes +
|
||
`SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME
|
||
`ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson`
|
||
(`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays).
|
||
- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical
|
||
`Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns**
|
||
`Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned
|
||
indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff.
|
||
- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact +
|
||
byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical
|
||
`Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a
|
||
fast-path that skips JSON parse when nothing to redact. `SafeDefault` → `SafeDefaultAuditRedactor`. Re-baseline the
|
||
perf hot-path budgets (JSON parse/rewrite is ~2–4× the typed-field path).
|
||
- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/
|
||
Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked
|
||
first); `Status==Delivered`→`Success`; `Status∈{Failed,Parked,Discarded}`→`Failure`; in-flight/`Skipped`→`Success`.
|
||
- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId,
|
||
sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary,
|
||
payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is
|
||
MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would
|
||
break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.)
|
||
- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types
|
||
pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable.
|
||
- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects
|
||
to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a
|
||
separate later task.)
|
||
|
||
## Staged decomposition (C1–C7)
|
||
|
||
| Stage | Scope | Green? | Class | Risk |
|
||
|---|---|---|---|---|
|
||
| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial |
|
||
| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low |
|
||
| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent` → `ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter`→`IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** |
|
||
| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter` → `audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH |
|
||
| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** |
|
||
| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med |
|
||
| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low |
|
||
|
||
**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.
|
||
|
||
## Top risks (carry into execution)
|
||
1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
|
||
2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic.
|
||
3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7.
|
||
4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
|
||
5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing.
|
||
|
||
Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.
|
||
|
||
## Stage status (live)
|
||
- **✅ C1 DONE** `3d77dc0` (code ✅) — `AuditDetails` + deterministic `AuditDetailsCodec` (pinned byte-exact) + `AuditOutcomeProjector` + `AuditFieldBuilders` + Commons→`ZB.MOM.WW.Audit` ref; 56 tests.
|
||
- **✅ C2 DONE** `adfb4d3` + fix `5aaf9e2` (spec ✅, code ✅ after fix) — `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` on the canonical record; redaction primitives extracted into shared `AuditRedactionPrimitives`/`AuditRegexCache` (old filter delegates, behaviour-preserved); cap-selection reads `d.Status` (faithful to legacy `IsErrorStatus`); fast-path + never-throws; review-fix hardened `OverRedact` to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
|
||
- **✅ C3 DONE** `db707bb` + fix `c27b2c3` (spec ✅, code ✅; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); `ScadaBridgeAuditEventFactory` single emit point; `AuditRowProjection` Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old `IAuditPayloadFilter` classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in `MapRow`+`FromDto`.
|
||
- **✅ C4 DONE** `946d3e2` + fix `1737d15` (spec ✅, code ✅; independently re-verified diff scope = writer+tests only, build 0/0, AuditLog 249/1-preexisting). Site SQLite → `audit_event` (canonical) + `audit_forward_state` sidecar; forwarding marks/reads on the sidecar via JOIN; `IsCachedKind`={CachedSubmit,ApiCallCached,DbWriteCached,CachedResolve} precomputed drain split; old `AuditLog` table dropped (ephemeral reset). Fix: `PRAGMA foreign_keys=ON` + `MarkForwarded` no-demote guard.
|
||
- **✅ C5 DONE** `68a6bd1` (spec ✅, code ✅; a LIVE SQL Server was available so the migration + SWITCH were fully exercised — independently re-verified build 0/0 + ConfigurationDatabase 248/248). Central `dbo.AuditLog` collapsed to 10 canonical cols + 6 computed cols (5 PERSISTED + `IngestedAtUtc` non-persisted) on the preserved `ps_AuditLog_Month` scheme; `CollapseAuditLogToCanonical` new-table-and-copy migration (`FOR JSON PATH` projection, byte-verified round-trip; Down = documented one-way); repo writes/reads canonical directly; `SwitchOutPartition` staging matches the computed-col defs; append-only roles re-granted. C3 central shim retired. Forced deviations (all sound): IngestedAtUtc non-persisted, execution-id indexes unfiltered, provider-aware `OnModelCreating` strips JSON_VALUE for SQLite. Deferred to C7: a dedicated migration-projection test + the stale `CreatesFiveNamedIndexes` test name.
|
||
- **✅ C6 SUBSUMED** (no commit) — reporting/UI/export/CLI retarget was already completed by the C3 record-swap (`AuditEventView`/`AuditExportRow` shims decode every domain field from `DetailsJson`) + the C5 repo-query retarget. Read-only explorer verdict: all consumer surfaces canonical-complete; the only flagged items (ExecutionId/ParentExecutionId not in CSV; SourceNodes not parsed in export `ParseFilter`) are PRE-rearch omissions, not regressions. CentralUI 595/595, ManagementService 125/125 confirm.
|
||
- **✅ C7 DONE** `635461c` + doc-fix `bc0e5bf` (review ✅; independently re-verified build 0/0, PerformanceTests 10/10, ConfigurationDatabase 251/251 incl. the 3 new migration-projection tests PASSING on live MSSQL, zero dead crefs). Perf hot-path re-baselined (canonical JSON redactor measured ~14µs/2µs — faster than the old typed walk; budgets 200/30/5µs + fast-path `Assert.Same`); `CollapseAuditLogToCanonicalMigrationTests` (seed→migrate→assert Action/Category/Outcome/Actor-null/DetailsJson-round-trip + 5 persisted computed cols); index test → `CreatesNineNamedIndexes`; 26 dead-`<see cref>` across 13 files cleaned; doc-fix corrected the "six persisted" wording (5 persisted + IngestedAtUtc non-persisted).
|
||
|
||
## ✅ TASK 2.5 COMPLETE — ScadaBridge audit FULL re-architecture to pure 9-col canonical (2026-06-02)
|
||
All of C1–C7 done, each spec+code reviewed, on `feat/adopt-zb-audit` (local-only, never pushed). ScadaBridge's audit subsystem now: the canonical `ZB.MOM.WW.Audit.AuditEvent` record everywhere (domain fields in `DetailsJson` via the deterministic `AuditDetailsCodec`); the library `IAuditRedactor`/`AuditOutcome` consumed; site SQLite = `audit_event` (canonical) + `audit_forward_state` sidecar (forwarding decoupled, `IsCachedKind` drain split); central `dbo.AuditLog` collapsed to 10 canonical cols + persisted computed cols on the preserved partition scheme (`CollapseAuditLogToCanonical` migration, MSSQL-verified); UI/export/CLI canonical-complete via `AuditEventView`/`AuditExportRow`. The gRPC proto was intentionally left unchanged (mapper-internal projection). This was the program's single largest task.
|