feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)

Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
This commit is contained in:
Joseph Doherty
2026-05-20 18:36:31 -04:00
parent 6069a20e0f
commit 660fdc4e93
8 changed files with 718 additions and 6 deletions

View File

@@ -202,7 +202,7 @@ VALUES
/// index.
/// </para>
/// </remarks>
public async Task SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
public async Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
{
// GUID-suffixed staging name: prevents collision with any concurrent
// purge attempt and avoids polluting the AuditLog object namespace with
@@ -214,6 +214,17 @@ VALUES
// settings.
var monthBoundaryStr = monthBoundary.ToUniversalTime().ToString("yyyy-MM-dd HH:mm:ss");
// Two-statement batch: the first SELECT samples the per-partition row
// count BEFORE the dance so we can report it back to the purge actor;
// the second batch performs the drop-and-rebuild. We use OUTPUT-style
// variables wired through @@ROWCOUNT after the SWITCH is not viable
// because SWITCH is a metadata-only operation that doesn't move rows in
// a way @@ROWCOUNT can observe.
var sampleSql = $@"
SELECT COUNT_BIG(*) FROM dbo.AuditLog
WHERE $PARTITION.pf_AuditLog_Month(OccurredAtUtc) =
$partition.pf_AuditLog_Month('{monthBoundaryStr}');";
var sql = $@"
BEGIN TRY
BEGIN TRANSACTION;
@@ -292,7 +303,43 @@ VALUES
THROW;
END CATCH;";
// Sample the row count before the switch. The sample is best-effort
// (no transaction wrapping the sample-then-switch pair) because the
// central singleton is the only writer to this RPC and a daily-purge
// tick doesn't compete with concurrent SwitchOut callers. A
// concurrent INSERT racing the sample under-reports by at most a
// few rows, which is acceptable for an "approximate" purged-row
// count surfaced via AuditLogPurgedEvent.
long rowsDeleted = 0;
var conn = _context.Database.GetDbConnection();
var openedHere = false;
if (conn.State != System.Data.ConnectionState.Open)
{
await conn.OpenAsync(ct).ConfigureAwait(false);
openedHere = true;
}
try
{
await using (var sampleCmd = conn.CreateCommand())
{
sampleCmd.CommandText = sampleSql;
var sampleResult = await sampleCmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
if (sampleResult is not null && sampleResult is not DBNull)
{
rowsDeleted = Convert.ToInt64(sampleResult);
}
}
}
finally
{
if (openedHere)
{
await conn.CloseAsync().ConfigureAwait(false);
}
}
await _context.Database.ExecuteSqlRawAsync(sql, ct);
return rowsDeleted;
}
/// <summary>