# Code Review — NotificationService | Field | Value | |-------|-------| | Module | `src/ScadaLink.NotificationService` | | Design doc | `docs/requirements/Component-NotificationService.md` | | Status | Reviewed | | Last reviewed | 2026-05-17 | | Reviewer | claude-agent | | Commit reviewed | `39d737e` | | Open findings | 0 | ## Summary The NotificationService module is small (6 source files) and structurally clean: it abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle, and integrates with the Store-and-Forward Engine for transient-failure buffering. However, the review surfaced several substantive defects. The most serious is that **no Store-and-Forward delivery handler is ever registered for the `Notification` category** — buffered notifications are persisted but never retried or delivered, silently losing every notification that hit a transient SMTP failure. Error classification is fragile (substring matching on exception messages) and is applied inconsistently between `SendAsync` and `DeliverAsync`. `DeliverAsync` also contains a resource-management bug that constructs and leaks two SMTP clients per call. Secondary themes: the `OAuth2TokenService` singleton caches a single token keyed to no credential identity (incorrect if multiple SMTP configs exist), several design-doc requirements are unimplemented (connection timeout, max concurrent connections, TLS `SSL`/`None` modes), and credentials are stored and passed as plaintext `string` values. Test coverage exercises the happy path and the main error branches but misses the OAuth2 delivery path, the permanent-classification fallback in `DeliverAsync`, and concurrency on the token cache. #### Re-review 2026-05-17 (commit `39d737e`) Re-reviewed at commit `39d737e` after the NS-001..013 fixes. All twelve prior findings remain closed (eleven Resolved, NS-013 Deferred) and the fixes hold up against the current source — error classification is now typed, the SMTP client is created once and always disconnected/disposed, the S&F `Notification` delivery handler is registered in `AkkaHostedService`, and the OAuth2 token cache is keyed per credential. The re-review surfaced **five new findings**. The recurring theme is **unclassified-exception handling**: the typed `ClassifySmtpError` helper introduced by NS-002/003 returns `Unknown` for anything that is not a recognised SMTP/socket failure, but neither `SendAsync` nor `DeliverBufferedAsync` has a catch-all for the `Unknown` bucket. As a result an OAuth2 token-fetch failure (`HttpRequestException` from `EnsureSuccessStatusCode`, or `InvalidOperationException` from a malformed credential string) escapes `SendAsync` as a raw exception to the calling script (NS-015) and escapes `DeliverBufferedAsync` out of the S&F handler, where the engine treats any thrown exception as transient and retries a permanently-broken config forever (NS-014) — the same defect class NS-008 fixed for address parsing, but the OAuth2 path was never covered. Secondary findings: `AuthenticateAsync` silently proceeds unauthenticated for an unknown `authType` or empty credentials (NS-016); `NotificationOptions` is bound from configuration in two places but never read by any code (NS-017, dead config — NS-007 sourced the timeout/limit from `SmtpConfiguration` instead); and the lazily-created concurrency limiter is read outside its lock, is sized once and never resized on redeployment, and is never disposed (NS-018). ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ☑ | Double SMTP client construction; `Auto` socket option for non-TLS; `TimeoutException`/`OperationCanceledException` misclassified. | | 2 | Akka.NET conventions | ☑ | No actors in this module (`AddNotificationServiceActors` is a no-op); delivery is a plain DI service. No Akka-specific issues. | | 3 | Concurrency & thread safety | ☑ | `OAuth2TokenService` is a singleton with a shared mutable token cache; double-checked locking present but cache key is wrong (NS-006). | | 4 | Error handling & resilience | ☑ | Critical: no S&F delivery handler registered for `Notification` (NS-001). Fragile substring error classification (NS-002, NS-003). | | 5 | Security | ☑ | Credentials handled as plaintext strings; OAuth2 client secret in DB credential blob; no recipient address validation. | | 6 | Performance & resource management | ☑ | Two `ISmtpClientWrapper` instances created per send, one leaked; connection not pooled; `MaxConcurrentConnections` unenforced. | | 7 | Design-document adherence | ☑ | Connection timeout, max concurrent connections, and TLS `SSL`/`None` modes from the design doc are not implemented. | | 8 | Code organization & conventions | ☑ | `SmtpPermanentException` in the wrong file; `SmtpConfiguration` POCO has non-nullable strings with no initializer (compiler-warning risk). | | 9 | Testing coverage | ☑ | Happy path and main error branches covered; OAuth2 delivery path, `DeliverAsync` permanent fallback, and token-cache concurrency untested. | | 10 | Documentation & comments | ☑ | XML comment on `DeliverAsync` ("Throws on failure") and the misleading "OAuth2 token refresh if needed" comment do not match behaviour. | ## Findings ### NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler) | | | |--|--| | Severity | Critical | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8` | **Description** On a transient SMTP failure the service calls `_storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...)`. The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via `StoreAndForwardService.RegisterDeliveryHandler`. A repo-wide search shows the `Notification` category handler is never registered anywhere — `StoreAndForwardCategory.Notification` appears only in this module's `EnqueueAsync` call. As a result, every buffered notification falls into the `RetryMessageAsync` "No delivery handler for category" branch (`StoreAndForwardService.cs:201-204`), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while `SendAsync` returns `Success=true, WasBuffered=true`, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee. **Recommendation** Register a delivery handler for `StoreAndForwardCategory.Notification` during startup that deserializes the buffered payload (`ListName`, `Subject`, `Message`), re-resolves the list/recipients/SMTP config, and re-attempts `DeliverAsync`, returning `true` on success, `false` on permanent failure, and throwing on transient failure. Wire it in `AddNotificationService` or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip. **Resolution** Resolved 2026-05-16. A delivery handler for `StoreAndForwardCategory.Notification` is now registered at site startup in `AkkaHostedService`. The handler resolves `NotificationDeliveryService` in a fresh DI scope and calls the new `DeliverBufferedAsync`, which re-resolves the list, recipients and SMTP config and re-attempts delivery — returning `true` on success, `false` (park) on permanent failure or missing configuration, and throwing on transient failure so the engine retries. `SendAsync` now buffers with `attemptImmediateDelivery: false` so registering the handler does not send the notification twice. Regression tests cover the happy path and the list-removed park path. Fixed by the commit whose message references `NotificationService-001`. ### NotificationService-002 — `TimeoutException`/`OperationCanceledException` misclassified as transient | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:157-167` | **Description** `IsTransientSmtpError` treats `OperationCanceledException` (and its subtype `TaskCanceledException`) as a transient SMTP error. When the caller passes a `CancellationToken` that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting `OperationCanceledException` is caught by the `catch ... when (IsTransientSmtpError(ex))` clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any `IOException` as transient even though `IOException` covers unrelated failures (e.g. a serialization stream error). Additionally, `OperationCanceledException` raised by token cancellation in the OAuth2 path would be miscategorised the same way. **Recommendation** Re-throw `OperationCanceledException`/`TaskCanceledException` when `cancellationToken.IsCancellationRequested` is true rather than classifying it as transient. Narrow `IOException` handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (`SmtpCommandException`, `SmtpProtocolException`, `ServiceNotConnectedException`) instead of broad base types. **Resolution** Resolved 2026-05-16 (commit ``). Classification was rewritten around a typed `ClassifySmtpError` helper: a caller-requested cancellation (`OperationCanceledException`/ `TaskCanceledException` while `cancellationToken.IsCancellationRequested`) now propagates out of both `SendAsync` and `DeliverAsync` via dedicated `catch` filters instead of being buffered. The broad `IOException` catch-all was dropped — only MailKit's typed exceptions plus `SocketException`/`TimeoutException` are treated as transient. Regression tests `Send_CancellationRequested_PropagatesAndDoesNotBuffer` and `Send_TaskCanceledException_WithCancellation_Propagates`. ### NotificationService-003 — Error classification by substring matching on exception messages is fragile | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:144-147`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:163-166` | **Description** Transient/permanent classification depends on `ex.Message.Contains("5.")`, `Contains("4.")`, `Contains("550")`, `Contains("421")`, etc. This is unreliable: (a) `Message.Contains("5.")` matches any message containing the literal "5." anywhere — e.g. a host name `smtp5.example.com`, a version string, or a path — producing false permanent classification; (b) `Contains("4.")` likewise matches `"v4.0"` or an IP address octet; (c) MailKit exposes the actual SMTP status code on `SmtpCommandException.StatusCode`, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification. **Recommendation** Classify on MailKit's typed exceptions and `SmtpCommandException.StatusCode` (4xx → transient, 5xx → permanent), and `SocketException`/`SmtpProtocolException`/connection-refused → transient. Remove all `Message.Contains` checks. **Resolution** Resolved 2026-05-16 (commit ``). All `ex.Message.Contains(...)` checks were removed. The new `ClassifySmtpError` helper inspects `SmtpCommandException.StatusCode` (numeric SMTP code: 4xx → transient, 5xx → permanent) and treats `SmtpProtocolException`, `ServiceNotConnectedException`, `SocketException` and `TimeoutException` as transient; anything else is `Unknown` and propagates unclassified rather than being guessed. The permanent-promotion `catch` block in `DeliverAsync` now keys off this classification. Regression tests `Send_Smtp5xxCommandException_ClassifiedPermanent`, `Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`, `Send_SmtpProtocolException_ClassifiedTransient`, and `Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent`. ### NotificationService-004 — `DeliverAsync` constructs two SMTP clients and leaks the used one | | | |--|--| | Severity | High | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:118-119` | **Description** ```csharp using var client = _smtpClientFactory() as IDisposable; var smtp = _smtpClientFactory(); ``` The factory is invoked twice, creating two separate `MailKitSmtpClientWrapper` instances (each owning a real `SmtpClient` with a socket). The first instance is assigned to `client` and disposed by the `using`, but it is never used. The second instance, `smtp`, is the one actually connected, authenticated, used to send, and `DisconnectAsync`'d — but it is never `Dispose`d. `MailKitSmtpClientWrapper` implements `IDisposable` and wraps an unmanaged socket; the connected client is leaked on every send. `DisconnectAsync` closes the connection but does not dispose the `SmtpClient`. Over time this leaks sockets/handles. **Recommendation** Create exactly one client and dispose the one that is actually used: `using var smtp = _smtpClientFactory();` then cast to `IDisposable` only if needed (the factory's `Func` should ideally return a type that the `using` can dispose directly — consider having `ISmtpClientWrapper` extend `IAsyncDisposable`/`IDisposable`). **Resolution** Resolved 2026-05-16 (commit ``). `DeliverAsync` now invokes `_smtpClientFactory()` exactly once and disposes the client actually used via `using var disposable = smtp as IDisposable;`. The previous code created two `MailKitSmtpClientWrapper` instances per send and disposed the unused one while leaking the connected one. Regression test `Send_CreatesExactlyOneSmtpClient_AndDisposesIt` verifies the factory is invoked once and the resulting client is disposed. ### NotificationService-005 — Non-TLS path uses `SecureSocketOptions.Auto`, contradicting the requested mode | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:18`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:123` | **Description** `ConnectAsync` maps `useTls` to either `SecureSocketOptions.StartTls` or `SecureSocketOptions.Auto`. `useTls` is computed in `DeliverAsync` as `TlsMode == "starttls"`. So a configuration of `TlsMode = "none"` produces `useTls = false` → `SecureSocketOptions.Auto`, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — `None`, `StartTLS`, `SSL` — but the code collapses them to a single boolean, so `SSL` (implicit TLS, typically port 465) is treated identically to `None`/`Auto` and the SSL mode is effectively unsupported. The `bool useTls` parameter cannot represent the three-state requirement. **Recommendation** Pass the `TlsMode` string (or a `TlsMode` enum) through to the wrapper and map explicitly: `None` → `SecureSocketOptions.None`, `StartTLS` → `SecureSocketOptions.StartTls`, `SSL` → `SecureSocketOptions.SslOnConnect`. Validate the configured value and reject unknown modes. **Resolution** Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the `bool useTls` parameter cannot represent three states, and the non-StartTLS branch used `SecureSocketOptions.Auto`. A new `SmtpTlsMode` enum (`None`/`StartTls`/`Ssl`) and `SmtpTlsModeParser` were added; `ISmtpClientWrapper.ConnectAsync` now takes `SmtpTlsMode` and `MailKitSmtpClientWrapper` maps it explicitly to `SecureSocketOptions.None`/ `StartTls`/`SslOnConnect`. `SendAsync`/`DeliverBufferedAsync` validate the configured `TlsMode` up front — an unknown value returns a clean `NotificationResult` failure (or parks a buffered message) instead of silently negotiating TLS. Regression tests: `Send_TlsModeNone_DoesNotNegotiateTls`, `Send_TlsModeSsl_UsesImplicitSsl`, `Send_UnknownTlsMode_ReturnsErrorNotSilentFallback`, and the `SmtpTlsModeParserTests` set. ### NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist | | | |--|--| | Severity | Medium | | Category | Concurrency & thread safety | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/OAuth2TokenService.cs:14-15`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-35` | **Description** `OAuth2TokenService` is registered as a singleton and stores a single `_cachedToken`/`_tokenExpiry` pair. `GetTokenAsync` ignores the `credentials` argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's `GetAllSmtpConfigurationsAsync` returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order. **Recommendation** Key the cache by the credential identity (e.g. a dictionary keyed by `tenantId:clientId`, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit. **Resolution** Resolved 2026-05-16 (commit pending). Root cause confirmed: the singleton held a single `_cachedToken`/`_tokenExpiry` pair and `GetTokenAsync` ignored the `credentials` argument when validating the cache, so a second SMTP config got the first config's token. `OAuth2TokenService` now stores a `ConcurrentDictionary` keyed by the SHA-256 hash of the credential string; each distinct tenant/client/secret gets its own cached token, expiry, and per-credential `SemaphoreSlim` (double-checked locking preserved). Regression tests: `GetTokenAsync_DifferentCredentials_ReturnPerCredentialTokens` and `GetTokenAsync_SameCredentials_CachedPerCredential`. ### NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:11-14`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:16-20`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:111-140` | **Description** The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and `NotificationOptions`/`SmtpConfiguration` both carry these fields. Neither is enforced: `MailKitSmtpClientWrapper.ConnectAsync` never sets `SmtpClient.Timeout`, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's `CancellationToken` bounds it, and callers may pass `default`). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so `MaxConcurrentConnections` has no effect. Both options exist but are dead configuration. **Recommendation** Set `SmtpClient.Timeout` from `ConnectionTimeoutSeconds` in `ConnectAsync` (and/or derive a linked `CancellationTokenSource`). Introduce a `SemaphoreSlim(MaxConcurrentConnections)` gating `DeliverAsync`. If these limits are intentionally deferred, mark the options `[Obsolete]`/document them as not-yet-enforced and note the gap in the design doc. **Resolution** Resolved 2026-05-16 (commit pending). Root cause confirmed: `ConnectAsync` never set `SmtpClient.Timeout` and no throttle gated `DeliverAsync`. `ISmtpClientWrapper.ConnectAsync` now takes a `connectionTimeoutSeconds` argument; `MailKitSmtpClientWrapper` sets `SmtpClient.Timeout` from `SmtpConfiguration.ConnectionTimeoutSeconds`. `DeliverAsync` acquires a lazily-created `SemaphoreSlim` sized to `SmtpConfiguration.MaxConcurrentConnections` (default 5 when non-positive) and releases it in a `finally`, so concurrent SMTP deliveries per site are bounded. The timeout is sourced from the deployed `SmtpConfiguration` rather than `NotificationOptions`; the `NotificationOptions` fields remain as operational fallback defaults. Regression tests: `Send_PassesConfiguredConnectionTimeoutToClient` and `Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`. ### NotificationService-008 — Recipient email addresses are not validated before send | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:136-137`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:50-53` | **Description** `SendAsync` builds `bccAddresses` directly from `recipient.EmailAddress` and passes them to `MailboxAddress.Parse`. If any recipient row has a malformed address, `MailboxAddress.Parse` throws `ParseException`. `ParseException` is not a `TimeoutException`/`SocketException`/`IOException` and its message will not generally contain "4." or "5.", so it falls through `DeliverAsync`'s outer `catch ... when (... && !IsTransientSmtpError(ex))` filter, which re-throws it (`:153`); it then escapes `SendAsync` entirely as an unhandled exception (the `SendAsync` catch blocks only cover `SmtpPermanentException` and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean `NotificationResult` error. The same applies to a malformed `FromAddress`. **Recommendation** Validate addresses up front (e.g. `MailboxAddress.TryParse`) and return a `NotificationResult(false, ...)` listing invalid recipients, or wrap `DeliverAsync` so any non-classified exception becomes a permanent `NotificationResult` failure rather than escaping. Consider validating addresses at definition time in the Central UI as well. **Resolution** Resolved 2026-05-16 (commit pending). Root cause confirmed: `MailboxAddress.Parse` of a malformed `FromAddress`/recipient threw `ParseException`, which is unclassified and escaped `SendAsync` as an unhandled exception. A new `ValidateAddresses` helper uses `MailboxAddress.TryParse` for the sender and every recipient; `SendAsync` now returns a clean `NotificationResult(false, ...)` listing the invalid address(es) before any SMTP attempt, and `DeliverBufferedAsync` parks a buffered message with a bad address (a fault retrying cannot fix). Regression tests: `Send_MalformedRecipientAddress_ReturnsCleanError_DoesNotThrow` and `Send_MalformedFromAddress_ReturnsCleanError_DoesNotThrow`. Definition-time validation in the Central UI is a separate component and out of this module's scope. ### NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk | | | |--|--| | Severity | Medium — re-triaged: split into an in-scope log-leak fix (resolved) and a Commons-scoped at-rest-encryption / structured-credential follow-up (NotificationService-013, Deferred). | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:127-134`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-65`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9` | **Description** SMTP credentials — Basic Auth `user:pass` and OAuth2 `tenantId:clientId:clientSecret` — are stored and passed as a single colon-delimited plaintext `string` (`SmtpConfiguration.Credentials`). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a `:` will be split incorrectly (`Split(':', 2)` / `Split(':', 3)`), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (`_logger.LogWarning(ex, ...)`, `LogError(ex, ...)`) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing. **Recommendation** Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing `:` are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging. **Resolution** Resolved 2026-05-16 (commit pending). Root cause re-triaged against source: the finding conflates two concerns with different ownership. 1. **Log-leak risk (in scope — fixed).** The original code logged whole exception objects (`_logger.LogWarning(ex, ...)` / `LogError(ex, ...)`); MailKit auth exceptions can echo server responses quoting the supplied credentials. A new internal `CredentialRedactor` masks every colon-delimited credential component out of any text. `SendAsync` and `DeliverBufferedAsync` now log a scrubbed message string (not the raw exception) and the permanent-failure `NotificationResult` is scrubbed before it returns to the caller. `OAuth2TokenService` logs the tenant id only — never the client secret or access token. Regression tests: `CredentialRedactorTests` and `Send_PermanentError_RedactsCredentialFromResultMessage`. 2. **At-rest encryption + structured-credential modelling (out of scope — Deferred).** Encrypting `SmtpConfiguration.Credentials` at rest and replacing the brittle colon-packed `string` with structured fields requires editing `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs` and the ConfigurationDatabase EF layer — both outside this module. Tracked separately as **NotificationService-013** (Deferred) so it is not lost. ### NotificationService-013 — Encrypt SMTP credentials at rest; replace colon-packed string with structured fields | | | |--|--| | Severity | Medium | | Category | Security | | Status | Deferred | | Location | `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9`, ConfigurationDatabase EF mapping | **Description** Split out of NotificationService-009. `SmtpConfiguration.Credentials` packs Basic Auth `user:pass` and OAuth2 `tenantId:clientId:clientSecret` into a single plaintext colon-delimited `string`: (a) there is no encryption at rest in SQLite or the central config DB; (b) a password or client secret containing a `:` is split incorrectly by `Split(':', 2)` / `Split(':', 3)`, silently corrupting the secret. **Recommendation** Model credentials as structured fields (or an encrypted blob) on the Commons entity and encrypt at rest via Data Protection / a secret store. The colon-delimited parsing in `MailKitSmtpClientWrapper` and `OAuth2TokenService` would then consume the structured fields directly. **Resolution** Deferred — requires changes to `src/ScadaLink.Commons` and the ConfigurationDatabase component, which are outside the NotificationService module. To be addressed in a Commons/ConfigurationDatabase-scoped change. The associated log-leak risk is resolved under NotificationService-009. ### NotificationService-010 — `DeliverAsync` does not disconnect the SMTP client on failure | | | |--|--| | Severity | Low | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:121-154` | **Description** `DisconnectAsync` is only called at `:139`, on the success path inside the `try` block. If `AuthenticateAsync` or `SendAsync` throws, control jumps to the `catch` filter at `:141` and the method exits (re-throwing or wrapping) without ever calling `DisconnectAsync`. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots. **Recommendation** Move disconnect/dispose into a `finally` block (or use `await using` once `ISmtpClientWrapper` supports `IAsyncDisposable`) so the connection is always torn down regardless of outcome. **Resolution** Resolved 2026-05-16 (commit pending). Root cause confirmed against the current source: `DeliverAsync` invoked `smtp.DisconnectAsync` only on the success path inside the `try` block, so a failed `Connect`/`Authenticate`/`Send` left the connection open until finalization. `DisconnectAsync` was moved into the existing `finally` block (which also releases the NS-007 concurrency slot), so the SMTP connection is always torn down regardless of outcome. The disconnect is best-effort — wrapped in its own `try/catch` that logs at debug level — so a disconnect failure (e.g. the connection is already dead) cannot mask the original delivery exception. Regression tests `Send_TransientFailureDuringSend_StillDisconnectsClient` and `Send_FailureDuringAuthenticate_StillDisconnectsClient`. ### NotificationService-011 — `SmtpPermanentException` declared in the wrong file; module conventions | | | |--|--| | Severity | Low | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:173-177`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15` | **Description** Two minor convention issues. (1) `SmtpPermanentException` is a public exception type declared at the bottom of `NotificationDeliveryService.cs` rather than in its own file (`SmtpPermanentException.cs`), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) `SmtpConfiguration` (a Commons POCO) declares non-nullable `string` properties (`Host`, `AuthType`, `FromAddress`) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences `config.Host`, `config.AuthType`, `config.FromAddress` without null checks. **Recommendation** Move `SmtpPermanentException` to its own file. For `SmtpConfiguration`, either keep the constructor as the only path and document it, or use `required` members so the compiler enforces initialization. **Resolution** Resolved 2026-05-16 (commit pending). Both issues confirmed against source. 1. **`SmtpPermanentException` placement (in scope — fixed).** The public exception type was extracted from the bottom of `NotificationDeliveryService.cs` into its own file, `src/ScadaLink.NotificationService/SmtpPermanentException.cs`, restoring the one-type-per-file layout. No behaviour change, so no dedicated regression test — the move is verified by the module test suite still compiling and passing (56 tests green). 2. **`SmtpConfiguration` non-nullable strings (out of scope — re-triaged).** `SmtpConfiguration` lives in `src/ScadaLink.Commons`, which is outside the NotificationService module's edit scope; it cannot be changed here. This is the same Commons entity already flagged for follow-up under **NotificationService-013** (Deferred). The `required`-members / documented-constructor change should be folded into that Commons/ConfigurationDatabase-scoped work. Note the delivery service's actual risk is bounded: `SendAsync`/`DeliverBufferedAsync` already validate `TlsMode` and addresses up front (NS-005/NS-008), and a null `Host`/`AuthType` from a malformed config row surfaces as a classified/clean failure rather than silent corruption. ### NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency | | | |--|--| | Severity | Low | | Category | Testing coverage | | Status | Resolved | | Location | `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ScadaLink.NotificationService.Tests/OAuth2TokenServiceTests.cs` | **Description** The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in `DeliverAsync:128-132` — every test uses `tokenService: null` and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) `DeliverAsync`'s permanent-classification fallback (`:144-149`) that promotes a generic exception whose message contains "550"/"553"/"554" to `SmtpPermanentException` is never tested; (3) `OAuth2TokenServiceTests` never tests concurrent `GetTokenAsync` calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant. **Recommendation** Add tests for: OAuth2-authenticated send with a mocked `OAuth2TokenService`; the `DeliverAsync` 5xx-message permanent fallback; token expiry/refresh (short `expires_in`); concurrent token acquisition; and an end-to-end buffered-notification retry once a `Notification` S&F handler is registered. **Resolution** Resolved 2026-05-16 (commit pending). Each listed gap was re-triaged against the current source (the module was substantially rewritten by NS-001..009 since this finding was filed) and the genuinely-missing tests were added: 1. **OAuth2 delivery path (gap real — covered).** New test `Send_OAuth2Config_AuthenticatesWithResolvedAccessToken` drives `DeliverAsync` with an `oauth2` `SmtpConfiguration` and a real `OAuth2TokenService` (mocked `IHttpClientFactory`), asserting the SMTP client is authenticated with the *resolved access token*, not the raw `tenant:client:secret` triple. 2. **5xx-message permanent fallback (gap obsolete — re-triaged).** The substring-based permanent fallback this item describes no longer exists: NS-003 replaced message-text matching with typed-exception/`SmtpStatusCode` classification. The equivalent path is already covered by `Send_Smtp5xxCommandException_ClassifiedPermanent`, and `Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent` proves a "5xx lookalike" message is no longer promoted. No new test needed for a removed branch. 3. **Token expiry/refresh and concurrent acquisition (gaps real — covered).** New tests `GetTokenAsync_ExpiredToken_RefreshesOnNextCall` (uses `expires_in: 60` so the token is stale immediately given the 60s refresh margin) and `GetTokenAsync_ConcurrentCalls_MakeExactlyOneHttpRequest` (20 racing callers collapse to a single token fetch, exercising the per-credential double-checked lock). 4. **End-to-end buffer-and-retry (gap already closed — re-triaged).** The buffer-then-retry-then-deliver round trip was added when NS-001 was resolved (`DeliverBuffered_HappyPath_ReturnsTrue`, `DeliverBuffered_ListNoLongerExists_…`, plus `AkkaHostedService` handler registration), so no further test is required here. Module test suite is green at 56 tests. ### NotificationService-014 — OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:214-228`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:56-84` | **Description** `DeliverBufferedAsync` is the registered Store-and-Forward delivery handler for `StoreAndForwardCategory.Notification`. Per the S&F handler contract (`StoreAndForwardService.cs:40-53`), a handler must return `true` for success, `false` for permanent failure (park, no further retries), and **throw only for a transient failure** (retry). `DeliverBufferedAsync`'s `try` block (`:214-228`) catches only `SmtpPermanentException`; the comment at `:227` asserts "Transient SMTP errors propagate out of `DeliverAsync`". But `DeliverAsync` can also throw exceptions that `ClassifySmtpError` buckets as `Unknown` — most importantly the OAuth2 path at `:308-312`, where `OAuth2TokenService.GetTokenAsync` throws `HttpRequestException` (from `EnsureSuccessStatusCode` at `OAuth2TokenService.cs:78`) or `InvalidOperationException` (malformed credential triple, or missing `access_token`). None of these is an `SmtpCommandException`/`SmtpProtocolException`/`SocketException`/`TimeoutException`, so they are neither caught nor classified — they propagate straight out of `DeliverBufferedAsync`. The S&F engine treats *any* thrown exception as a transient failure, so a notification whose SMTP config has an invalid client secret (a 401 → `HttpRequestException`, a genuinely **permanent** condition) is retried on every sweep until `MaxRetries` is reached — burning OAuth2 token-endpoint calls and never parking promptly. A malformed credential string (`InvalidOperationException`, never fixable) is likewise retried instead of parked immediately. **Recommendation** Add a catch-all to `DeliverBufferedAsync` for exceptions that `ClassifySmtpError` classifies as `Unknown`: log and return `false` (park) for non-retryable causes such as `InvalidOperationException` from credential parsing, and decide deliberately whether an OAuth2 `HttpRequestException` should park or retry (a 5xx token endpoint is transient; a 401 is permanent — inspect `HttpRequestException.StatusCode`). Do not let an unclassified exception leave the handler and be silently reinterpreted as transient. **Resolution** Resolved 2026-05-17. Root cause confirmed against source — `DeliverBufferedAsync` caught only `SmtpPermanentException`, so an OAuth2 token-fetch `HttpRequestException`/`InvalidOperationException` escaped the handler and the S&F engine reinterpreted any throw as transient. Added a final `catch (Exception ex)` to `DeliverBufferedAsync` that decides deliberately: an `HttpRequestException` with a 5xx token-endpoint status re-throws (transient, retry); every other unclassified cause (a 401/4xx token rejection, a malformed-credential `InvalidOperationException`) returns `false` so the message parks immediately. Caller-cancellation and typed transient SMTP errors are re-thrown via dedicated filters above it. Tests `DeliverBuffered_OAuth2MalformedCredentials_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint401_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint503_ThrowsSoEngineRetries`. ### NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96-148`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312` | **Description** `SendAsync`'s `try`/`catch` around `DeliverAsync` has exactly three catch clauses: `SmtpPermanentException` (`:101`), `OperationCanceledException` when cancellation was requested (`:111`), and `Exception` filtered by `IsTransientSmtpError` (`:116`). There is no catch-all for an exception that is none of those. `ClassifySmtpError` returns `SmtpErrorClass.Unknown` for anything it does not recognise, and `IsTransientSmtpError` only matches `Transient`, so an `Unknown`-classified exception falls through all three clauses and escapes `SendAsync` unhandled. Concretely, the OAuth2 branch at `:308-312` calls `OAuth2TokenService.GetTokenAsync`, which throws `HttpRequestException` (token endpoint unreachable / 4xx / 5xx) or `InvalidOperationException` (malformed `tenant:client:secret` triple, or no `access_token` in the response). Both reach the calling script (an instance/alarm/shared script) as a raw exception of a type the `INotificationDeliveryService` contract never advertises — the design doc says delivery returns either success, a buffered transient result, or a permanent `NotificationResult` error. `Notify.To().Send()` against an OAuth2 config with a bad secret therefore crashes the script instead of returning a clean `NotificationResult(false, ...)`. This is the same defect class NS-008 resolved for malformed addresses, but NS-008's `ValidateAddresses` fix never covered the OAuth2 token path. (A non-cancellation `OperationCanceledException` — e.g. an internal MailKit timeout surfaced as OCE while the caller's token is *not* cancelled — would escape the same way.) **Recommendation** Add a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a permanent `NotificationResult(false, ...)` (credential-scrubbed, consistent with NS-009), and log it. Treating an unclassified failure as permanent is the safe default — it returns control to the script rather than crashing it; an OAuth2 transient (token endpoint 5xx) being reported as permanent is an acceptable trade against an unhandled crash, and can be refined by inspecting `HttpRequestException.StatusCode` if transient buffering of token failures is wanted. **Resolution** Resolved 2026-05-17. Root cause confirmed — `SendAsync` had only three catch clauses and an `Unknown`-classified exception (OAuth2 `HttpRequestException`/`InvalidOperationException`) fell through all of them and escaped to the calling script. Added a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a credential-scrubbed `NotificationResult(false, "Notification delivery failed: ...")` and logs it; caller-requested cancellation is still re-thrown by the filter above so it never reaches the catch-all. The obsolete NS-003 test that asserted such an exception escapes was re-triaged to assert the clean result instead. Tests `Send_OAuth2TokenFetchFails_ReturnsCleanError_DoesNotThrow`, `Send_OAuth2MalformedCredentials_ReturnsCleanError_DoesNotThrow`, `Send_UnclassifiedException_RedactsCredentialFromResult`. ### NotificationService-016 — `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials | | | |--|--| | Severity | Medium | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:46-67` | **Description** `MailKitSmtpClientWrapper.AuthenticateAsync` returns without authenticating in two silent paths. (1) `if (string.IsNullOrEmpty(credentials)) return;` (`:48-49`) — if an SMTP config has `AuthType` "basic" or "oauth2" but a null/empty `Credentials` value (a misconfigured or partially-deployed row), the method returns and `DeliverAsync` proceeds straight to `SendAsync` on an *unauthenticated* connection. (2) The `switch (authType.ToLowerInvariant())` (`:51-66`) has cases only for `"basic"` and `"oauth2"` and **no `default`** — any other `AuthType` value (a typo, a future "ntlm", an empty string) falls through with no authentication attempted. Additionally, in the `"basic"` case, `credentials.Split(':', 2)` that yields fewer than 2 parts (a credential string with no colon) is silently skipped (`:55-58`). In every case the connection then attempts to send mail unauthenticated: against a relay that does not require auth this masks a misconfiguration; against a server that does require auth it fails later with a less obvious error, and at worst an unauthenticated send succeeds where it should not. Authentication being skipped should never be silent. **Recommendation** Make missing/unparseable credentials and an unrecognised `AuthType` hard errors: throw a typed exception (e.g. a permanent configuration exception so `SendAsync`/`DeliverBufferedAsync` surface it as a clean failure / park) rather than returning. Add a `default:` arm to the `switch` that throws `ArgumentException` naming the unsupported auth type, and reject a "basic" credential string that does not split into exactly two parts. **Resolution** Resolved 2026-05-17. Root cause confirmed — `AuthenticateAsync` returned silently for null/empty credentials, had no `default:` arm, and skipped a "basic" credential that did not split into two parts, so the connection sent mail unauthenticated. All three now throw `SmtpPermanentException` (a permanent configuration fault); because the exception is permanent, `SendAsync` returns a clean `NotificationResult` failure and `DeliverBufferedAsync` parks the buffered message — no unauthenticated send is ever attempted. Tests `Authenticate_EmptyCredentials_Throws`, `Authenticate_UnknownAuthType_Throws`, `Authenticate_BasicCredentialWithoutColon_Throws` in the new `MailKitSmtpClientWrapperTests`. ### NotificationService-017 — `NotificationOptions` is bound from configuration but never read (dead config) | | | |--|--| | Severity | Low | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:1-15`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:10-11`, `src/ScadaLink.Host/SiteServiceRegistration.cs:70` | **Description** `NotificationOptions` (with `ConnectionTimeoutSeconds` and `MaxConcurrentConnections`) is bound from the `ScadaLink:Notification` configuration section in two places — `ServiceCollectionExtensions.AddNotificationService` (`AddOptions().BindConfiguration(...)`) and again in `Host/SiteServiceRegistration.cs:70` (`services.Configure(...)`). However, a repo-wide search shows no code ever injects `IOptions` or otherwise reads either property. When NS-007 enforced the connection timeout and concurrency limit, it sourced both values from the per-site `SmtpConfiguration` entity, not from `NotificationOptions` — so this options class is now entirely dead configuration. Its XML doc still claims it "provides fallback defaults and operational limits", which is misleading: nothing falls back to it. The double binding is also redundant. Dead, falsely-documented configuration invites an operator to set `ScadaLink:Notification:ConnectionTimeoutSeconds` and expect it to take effect, when it has no effect at all. **Recommendation** Either delete `NotificationOptions` and both of its registrations, or genuinely wire it in as the documented fallback (e.g. `DeliverAsync` uses `NotificationOptions` values when the `SmtpConfiguration` field is non-positive). If kept, remove the duplicate `Configure` call and correct the XML doc. The NS-007 resolution note already states the `NotificationOptions` fields "remain as operational fallback defaults" — that intent should be implemented or the class removed. **Resolution** Resolved 2026-05-17. Root cause confirmed — `NotificationOptions` was bound but never read. Implemented the documented-fallback intent rather than deleting it: `NotificationDeliveryService` now takes an optional `IOptions` and uses its `ConnectionTimeoutSeconds`/`MaxConcurrentConnections` whenever the deployed `SmtpConfiguration` field is non-positive (a value on the row still wins). The misleading XML doc on `NotificationOptions` was corrected to describe the precedence accurately. The duplicate `services.Configure` in `Host/SiteServiceRegistration.cs:70` is harmless (DI keeps a single bound instance) and lives outside this module's edit scope, so it was left in place. Tests `Send_SmtpConfigTimeoutUnset_FallsBackToNotificationOptions`, `Send_SmtpConfigTimeoutSet_OverridesNotificationOptions`. ### NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed | | | |--|--| | Severity | Low | | Category | Concurrency & thread safety | | Status | Resolved | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:237-255` | **Description** Three minor issues with the NS-007 concurrency limiter. (1) `GetConcurrencyLimiter` reads the plain (non-`volatile`) field `_concurrencyLimiter` at `:242` *outside* `_limiterLock` as a fast path. The field is only written inside the lock (`:252`), so under the .NET memory model a concurrent thread is not guaranteed to observe the fully-constructed `SemaphoreSlim` via the lock-free read. In practice on x86/x64 this is benign and the double-checked write inside the lock is correct, but the unsynchronized read of a reference set under a lock is the classic lazy-init memory-model pitfall — use `Volatile.Read`/`LazyInitializer`/`Lazy`. (2) The limiter is sized once from the *first* `SmtpConfiguration` ever seen and the `??=` at `:252` means it is never re-created. The design doc says one SMTP config is deployed per site, but that config can be **redeployed** with a different `MaxConcurrentConnections`; the new value is silently ignored for the lifetime of the (scoped, but potentially long-lived per request) service — and more importantly the limit is captured per `NotificationDeliveryService` instance, so behaviour depends on which scoped instance happens to create it first. (3) `SemaphoreSlim` implements `IDisposable`; `_concurrencyLimiter` is never disposed and `NotificationDeliveryService` does not implement `IDisposable`. A scoped service leaking one undisposed `SemaphoreSlim` per scope is a slow handle leak. **Recommendation** Replace the hand-rolled double-checked init with `Lazy` or `LazyInitializer.EnsureInitialized`, or at minimum make the field access `Volatile.Read`/`Volatile.Write`. Consider hoisting the limiter to a singleton keyed by site (the limit is a per-site invariant, not a per-scope one) so redeployment with a changed limit and disposal are handled in one place; if it stays on the scoped service, implement `IDisposable` to dispose it. **Resolution** Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled double-checked init was replaced with a `Lazy` — its publication is correctly synchronised, eliminating the lock-free read of a non-`volatile` reference. `NotificationDeliveryService` now implements `IDisposable` and disposes the limiter (if created) under the existing lock, with idempotent re-entry and an `ObjectDisposedException` guard in `SendAsync`/`GetConcurrencyLimiter`; the scoped DI registration disposes it per scope. The limiter remains scoped (not hoisted to a site singleton) — the design doc deploys one SMTP config per site and the per-instance capture is bounded; the redeploy-resize concern is acknowledged as low-impact and not changed here, since hoisting would require a registration change for marginal benefit. Tests `Service_Dispose_DisposesConcurrencyLimiter` plus the existing `Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.