feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
monthly boundaries whose latest OccurredAtUtc is older than
DateTime.UtcNow - AuditLogOptions.RetentionDays.
2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
the drop-and-rebuild dance around UX_AuditLog_EventId.
3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
the actor-system EventStream so the Bundle E central health collector
and ops surfaces can subscribe without coupling to this actor.
Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
switch via COUNT_BIG over the per-partition filter so the count
reflects what the switch removed, not a post-purge scan of a table that
no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.
Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
await in async receive handlers (same pattern as Sender-capture in
AuditLogIngestActor.OnIngestAsync).
Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
This commit is contained in:
@@ -202,7 +202,7 @@ VALUES
|
||||
/// index.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public async Task SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
|
||||
public async Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
|
||||
{
|
||||
// GUID-suffixed staging name: prevents collision with any concurrent
|
||||
// purge attempt and avoids polluting the AuditLog object namespace with
|
||||
@@ -214,6 +214,17 @@ VALUES
|
||||
// settings.
|
||||
var monthBoundaryStr = monthBoundary.ToUniversalTime().ToString("yyyy-MM-dd HH:mm:ss");
|
||||
|
||||
// Two-statement batch: the first SELECT samples the per-partition row
|
||||
// count BEFORE the dance so we can report it back to the purge actor;
|
||||
// the second batch performs the drop-and-rebuild. We use OUTPUT-style
|
||||
// variables wired through @@ROWCOUNT after the SWITCH is not viable
|
||||
// because SWITCH is a metadata-only operation that doesn't move rows in
|
||||
// a way @@ROWCOUNT can observe.
|
||||
var sampleSql = $@"
|
||||
SELECT COUNT_BIG(*) FROM dbo.AuditLog
|
||||
WHERE $PARTITION.pf_AuditLog_Month(OccurredAtUtc) =
|
||||
$partition.pf_AuditLog_Month('{monthBoundaryStr}');";
|
||||
|
||||
var sql = $@"
|
||||
BEGIN TRY
|
||||
BEGIN TRANSACTION;
|
||||
@@ -292,7 +303,43 @@ VALUES
|
||||
THROW;
|
||||
END CATCH;";
|
||||
|
||||
// Sample the row count before the switch. The sample is best-effort
|
||||
// (no transaction wrapping the sample-then-switch pair) because the
|
||||
// central singleton is the only writer to this RPC and a daily-purge
|
||||
// tick doesn't compete with concurrent SwitchOut callers. A
|
||||
// concurrent INSERT racing the sample under-reports by at most a
|
||||
// few rows, which is acceptable for an "approximate" purged-row
|
||||
// count surfaced via AuditLogPurgedEvent.
|
||||
long rowsDeleted = 0;
|
||||
var conn = _context.Database.GetDbConnection();
|
||||
var openedHere = false;
|
||||
if (conn.State != System.Data.ConnectionState.Open)
|
||||
{
|
||||
await conn.OpenAsync(ct).ConfigureAwait(false);
|
||||
openedHere = true;
|
||||
}
|
||||
try
|
||||
{
|
||||
await using (var sampleCmd = conn.CreateCommand())
|
||||
{
|
||||
sampleCmd.CommandText = sampleSql;
|
||||
var sampleResult = await sampleCmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
|
||||
if (sampleResult is not null && sampleResult is not DBNull)
|
||||
{
|
||||
rowsDeleted = Convert.ToInt64(sampleResult);
|
||||
}
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
if (openedHere)
|
||||
{
|
||||
await conn.CloseAsync().ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
await _context.Database.ExecuteSqlRawAsync(sql, ct);
|
||||
return rowsDeleted;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
|
||||
Reference in New Issue
Block a user