fix(notifications): close OAuth2 SMTP + dispatcher resilience gaps (5 findings)

NS-021/NO-001: thread FromAddress into XOAUTH2 so M365 stops rejecting
sends with 535 5.7.3. Added an additive oauth2UserName parameter on
ISmtpClientWrapper.AuthenticateAsync; both NotificationService and
NotificationOutbox now pass config.FromAddress.

NO-002: clamp non-positive SmtpConfiguration.MaxRetries/RetryDelay to the
1-min / 10-attempt fallback with a Warning so a misconfigured row no
longer parks transient failures on the first attempt or burn-loops.

NO-003: route a lifecycle-scoped CancellationToken from the
NotificationOutboxActor through the dispatch sweep into the adapter so
in-flight SMTP sends abort on PostStop instead of blocking
CoordinatedShutdown for the full SMTP timeout per row.

NO-004: await the central audit writer inside the existing try/catch
instead of fire-and-forget so the audit task can't outlive the per-sweep
DI scope and writer faults reach the operator log instead of being
silently dropped.

Two AuditLog integration tests seeded RetryDelay = TimeSpan.Zero to force
immediate re-claim on the second tick; updated them to 1 ms so they keep
the same intent without tripping the NO-002 clamp.
This commit is contained in:
Joseph Doherty
2026-05-28 03:54:43 -04:00
parent e536178323
commit 291274ae76
13 changed files with 370 additions and 61 deletions
@@ -365,11 +365,13 @@ public class AuditWriteFailureSafetyTests : TestKit, IClassFixture<MsSqlMigratio
private async Task SeedSmtpConfigAsync()
{
await using var ctx = CreateContext();
// NO-002: dispatcher clamps non-positive RetryDelay to the 1-minute fallback;
// use 1 ms so a transient outcome's NextAttemptAt is still effectively due.
ctx.SmtpConfigurations.Add(new SmtpConfiguration(
"smtp.example.com", "Basic", "noreply@example.com")
{
MaxRetries = 5,
RetryDelay = TimeSpan.Zero,
RetryDelay = TimeSpan.FromMilliseconds(1),
});
await ctx.SaveChangesAsync();
}
@@ -130,9 +130,12 @@ public class NotifyDispatcherAuditTrailTests : TestKit, IClassFixture<MsSqlMigra
/// <summary>
/// Inserts a single SMTP configuration row so the dispatcher's
/// <c>ResolveRetryPolicyAsync</c> sees a real (maxRetries, retryDelay)
/// pair rather than the conservative fallback. RetryDelay of 0 means a
/// transient outcome's <c>NextAttemptAt</c> is immediately due — useful so
/// the SECOND DispatchTick re-claims the row without waiting.
/// pair rather than the conservative fallback. A tiny positive RetryDelay
/// means a transient outcome's <c>NextAttemptAt</c> is immediately due —
/// useful so the SECOND DispatchTick re-claims the row without waiting.
/// NO-002: the dispatcher now clamps a non-positive RetryDelay to the
/// 1-minute fallback to avoid burn-looping on transient failures, so this
/// must be a strictly positive value (1 ms is fine for tests).
/// </summary>
private async Task SeedSmtpConfigAsync(int maxRetries = 5)
{
@@ -141,7 +144,7 @@ public class NotifyDispatcherAuditTrailTests : TestKit, IClassFixture<MsSqlMigra
"smtp.example.com", "Basic", "noreply@example.com")
{
MaxRetries = maxRetries,
RetryDelay = TimeSpan.Zero,
RetryDelay = TimeSpan.FromMilliseconds(1),
});
await ctx.SaveChangesAsync();
}