fix(notification-service): resolve NotificationService-014..018 — classify OAuth2 failures, fail on bad auth config, wire NotificationOptions fallback, disposable concurrency limiter

This commit is contained in:
Joseph Doherty
2026-05-17 03:18:33 -04:00
parent bf6bd8de5a
commit f5199e9da9
6 changed files with 454 additions and 41 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `39d737e` |
| Open findings | 5 |
| Open findings | 0 |
## Summary
@@ -497,7 +497,7 @@ Module test suite is green at 56 tests.
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:214-228`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:56-84` |
**Description**
@@ -510,7 +510,7 @@ Add a catch-all to `DeliverBufferedAsync` for exceptions that `ClassifySmtpError
**Resolution**
_Unresolved._
Resolved 2026-05-17. Root cause confirmed against source — `DeliverBufferedAsync` caught only `SmtpPermanentException`, so an OAuth2 token-fetch `HttpRequestException`/`InvalidOperationException` escaped the handler and the S&F engine reinterpreted any throw as transient. Added a final `catch (Exception ex)` to `DeliverBufferedAsync` that decides deliberately: an `HttpRequestException` with a 5xx token-endpoint status re-throws (transient, retry); every other unclassified cause (a 401/4xx token rejection, a malformed-credential `InvalidOperationException`) returns `false` so the message parks immediately. Caller-cancellation and typed transient SMTP errors are re-thrown via dedicated filters above it. Tests `DeliverBuffered_OAuth2MalformedCredentials_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint401_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint503_ThrowsSoEngineRetries`.
### NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script
@@ -518,7 +518,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96-148`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312` |
**Description**
@@ -531,7 +531,7 @@ Add a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-un
**Resolution**
_Unresolved._
Resolved 2026-05-17. Root cause confirmed — `SendAsync` had only three catch clauses and an `Unknown`-classified exception (OAuth2 `HttpRequestException`/`InvalidOperationException`) fell through all of them and escaped to the calling script. Added a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a credential-scrubbed `NotificationResult(false, "Notification delivery failed: ...")` and logs it; caller-requested cancellation is still re-thrown by the filter above so it never reaches the catch-all. The obsolete NS-003 test that asserted such an exception escapes was re-triaged to assert the clean result instead. Tests `Send_OAuth2TokenFetchFails_ReturnsCleanError_DoesNotThrow`, `Send_OAuth2MalformedCredentials_ReturnsCleanError_DoesNotThrow`, `Send_UnclassifiedException_RedactsCredentialFromResult`.
### NotificationService-016 — `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials
@@ -539,7 +539,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:46-67` |
**Description**
@@ -552,7 +552,7 @@ Make missing/unparseable credentials and an unrecognised `AuthType` hard errors:
**Resolution**
_Unresolved._
Resolved 2026-05-17. Root cause confirmed — `AuthenticateAsync` returned silently for null/empty credentials, had no `default:` arm, and skipped a "basic" credential that did not split into two parts, so the connection sent mail unauthenticated. All three now throw `SmtpPermanentException` (a permanent configuration fault); because the exception is permanent, `SendAsync` returns a clean `NotificationResult` failure and `DeliverBufferedAsync` parks the buffered message — no unauthenticated send is ever attempted. Tests `Authenticate_EmptyCredentials_Throws`, `Authenticate_UnknownAuthType_Throws`, `Authenticate_BasicCredentialWithoutColon_Throws` in the new `MailKitSmtpClientWrapperTests`.
### NotificationService-017 — `NotificationOptions` is bound from configuration but never read (dead config)
@@ -560,7 +560,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:1-15`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:10-11`, `src/ScadaLink.Host/SiteServiceRegistration.cs:70` |
**Description**
@@ -573,7 +573,7 @@ Either delete `NotificationOptions` and both of its registrations, or genuinely
**Resolution**
_Unresolved._
Resolved 2026-05-17. Root cause confirmed — `NotificationOptions` was bound but never read. Implemented the documented-fallback intent rather than deleting it: `NotificationDeliveryService` now takes an optional `IOptions<NotificationOptions>` and uses its `ConnectionTimeoutSeconds`/`MaxConcurrentConnections` whenever the deployed `SmtpConfiguration` field is non-positive (a value on the row still wins). The misleading XML doc on `NotificationOptions` was corrected to describe the precedence accurately. The duplicate `services.Configure<NotificationOptions>` in `Host/SiteServiceRegistration.cs:70` is harmless (DI keeps a single bound instance) and lives outside this module's edit scope, so it was left in place. Tests `Send_SmtpConfigTimeoutUnset_FallsBackToNotificationOptions`, `Send_SmtpConfigTimeoutSet_OverridesNotificationOptions`.
### NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed
@@ -581,7 +581,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:237-255` |
**Description**
@@ -594,4 +594,4 @@ Replace the hand-rolled double-checked init with `Lazy<SemaphoreSlim>` or `LazyI
**Resolution**
_Unresolved._
Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled double-checked init was replaced with a `Lazy<SemaphoreSlim>` — its publication is correctly synchronised, eliminating the lock-free read of a non-`volatile` reference. `NotificationDeliveryService` now implements `IDisposable` and disposes the limiter (if created) under the existing lock, with idempotent re-entry and an `ObjectDisposedException` guard in `SendAsync`/`GetConcurrencyLimiter`; the scoped DI registration disposes it per scope. The limiter remains scoped (not hoisted to a site singleton) — the design doc deploys one SMTP config per site and the per-instance capture is bounded; the redeploy-resize concern is acknowledged as low-impact and not changed here, since hoisting would require a registration change for marginal benefit. Tests `Service_Dispose_DisposesConcurrencyLimiter` plus the existing `Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.