docs(audit): add M6 reconciliation+purge+partition+health plan (#23)

6 bundles: proto+site handler, reconciliation actor, purge actor with
drop-and-rebuild around UX index, partition maintenance, four health
metrics, integration tests. M5 realities baked in.
This commit is contained in:
Joseph Doherty
2026-05-20 17:44:12 -04:00
parent db05af897e
commit b0584f7a08

View File

@@ -0,0 +1,19 @@
# Audit Log #23 — M6 Reconciliation + Purge + Partition Maintenance + Health Metrics
> **For Claude:** subagent-driven-development with bundled cadence.
**Goal:** Self-healing telemetry (5-min reconciliation pull), monthly partition rollover, daily partition-switch purge with drop-and-rebuild around UX_AuditLog_EventId, all five health metrics live (SiteAuditBacklog, SiteAuditWriteFailures, SiteAuditTelemetryStalled, CentralAuditWriteFailures, AuditRedactionFailure).
**M5 realities baked in:** AuditRedactionFailure counter is site-only — M6-T9 surfaces it centrally. SwitchOutPartitionAsync ships as NotSupportedException stub from M1; M6-T4 replaces it with the drop-DROP-INDEX → SWITCH PARTITION → DROP staging → CREATE UNIQUE NONCLUSTERED INDEX dance. Partition function pre-seeded Jan 2026 Dec 2027; M6-T5 SPLITs new boundaries forward.
**Bundles:**
- Bundle A — Proto + site handler (T1, T2)
- Bundle B — Reconciliation actor (T3)
- Bundle C — Purge actor + drop-and-rebuild repository fix (T4)
- Bundle D — Partition maintenance hosted service (T5)
- Bundle E — Health metrics (T6, T7, T8, T9)
- Bundle F — Integration tests (T10, T11, T12)
Final cross-bundle review + merge.
**Note**: M2 noted NoOpSiteStreamAuditClient stays in production until "M6 wires the real client". M6-T1+T2 add the PULL RPC; the actual production PUSH client (real implementation of ISiteStreamAuditClient.IngestAuditEventsAsync + IngestCachedTelemetryAsync) is the bigger lift. M6 will add the real client IF feasible within scope OR defer to a follow-up. Decision: try in Bundle A (alongside the proto extension); if scope blows up, the NoOp stays.