Files
ScadaBridge/src/ScadaLink.AuditLog/Central/AuditLogPurgedEvent.cs
T
Joseph Doherty 660fdc4e93 feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
2026-05-20 18:36:31 -04:00

30 lines
1.3 KiB
C#

namespace ScadaLink.AuditLog.Central;
/// <summary>
/// Published on the actor-system EventStream by <see cref="AuditLogPurgeActor"/>
/// after each successful partition switch-out. Downstream consumers (Bundle E
/// central health collector, ops dashboards, audit trails) subscribe so a
/// purge action is observable without the actor needing to know about any
/// specific subscriber.
/// </summary>
/// <param name="MonthBoundary">
/// The pf_AuditLog_Month lower-bound boundary that was switched out — i.e.
/// the first instant of the purged month in UTC.
/// </param>
/// <param name="RowsDeleted">
/// Approximate row count purged from the partition, sampled BEFORE the
/// switch. Exact accounting would require a post-switch scan of the staging
/// table, which the dance drops immediately, so this is the closest
/// observable proxy. Zero is a valid value when the actor's enumerator
/// included a partition the operator subsequently emptied by hand.
/// </param>
/// <param name="DurationMs">
/// Wall-clock time spent inside <c>SwitchOutPartitionAsync</c> for this
/// boundary, in milliseconds. Useful for spotting the rare slow purge
/// without spinning up dedicated telemetry.
/// </param>
public sealed record AuditLogPurgedEvent(
DateTime MonthBoundary,
long RowsDeleted,
long DurationMs);