Files
scadalink-design/code-reviews/NotificationService/findings.md
Joseph Doherty 977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00

21 KiB

Code Review — NotificationService

Field Value
Module src/ScadaLink.NotificationService
Design doc docs/requirements/Component-NotificationService.md
Status Reviewed
Last reviewed 2026-05-16
Reviewer claude-agent
Commit reviewed 9c60592
Open findings 12

Summary

The NotificationService module is small (6 source files) and structurally clean: it abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle, and integrates with the Store-and-Forward Engine for transient-failure buffering. However, the review surfaced several substantive defects. The most serious is that no Store-and-Forward delivery handler is ever registered for the Notification category — buffered notifications are persisted but never retried or delivered, silently losing every notification that hit a transient SMTP failure. Error classification is fragile (substring matching on exception messages) and is applied inconsistently between SendAsync and DeliverAsync. DeliverAsync also contains a resource-management bug that constructs and leaks two SMTP clients per call. Secondary themes: the OAuth2TokenService singleton caches a single token keyed to no credential identity (incorrect if multiple SMTP configs exist), several design-doc requirements are unimplemented (connection timeout, max concurrent connections, TLS SSL/None modes), and credentials are stored and passed as plaintext string values. Test coverage exercises the happy path and the main error branches but misses the OAuth2 delivery path, the permanent-classification fallback in DeliverAsync, and concurrency on the token cache.

Checklist coverage

# Category Examined Notes
1 Correctness & logic bugs Double SMTP client construction; Auto socket option for non-TLS; TimeoutException/OperationCanceledException misclassified.
2 Akka.NET conventions No actors in this module (AddNotificationServiceActors is a no-op); delivery is a plain DI service. No Akka-specific issues.
3 Concurrency & thread safety OAuth2TokenService is a singleton with a shared mutable token cache; double-checked locking present but cache key is wrong (NS-006).
4 Error handling & resilience Critical: no S&F delivery handler registered for Notification (NS-001). Fragile substring error classification (NS-002, NS-003).
5 Security Credentials handled as plaintext strings; OAuth2 client secret in DB credential blob; no recipient address validation.
6 Performance & resource management Two ISmtpClientWrapper instances created per send, one leaked; connection not pooled; MaxConcurrentConnections unenforced.
7 Design-document adherence Connection timeout, max concurrent connections, and TLS SSL/None modes from the design doc are not implemented.
8 Code organization & conventions SmtpPermanentException in the wrong file; SmtpConfiguration POCO has non-nullable strings with no initializer (compiler-warning risk).
9 Testing coverage Happy path and main error branches covered; OAuth2 delivery path, DeliverAsync permanent fallback, and token-cache concurrency untested.
10 Documentation & comments XML comment on DeliverAsync ("Throws on failure") and the misleading "OAuth2 token refresh if needed" comment do not match behaviour.

Findings

NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler)

Severity Critical
Category Error handling & resilience
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96, src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8

Description

On a transient SMTP failure the service calls _storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...). The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via StoreAndForwardService.RegisterDeliveryHandler. A repo-wide search shows the Notification category handler is never registered anywhere — StoreAndForwardCategory.Notification appears only in this module's EnqueueAsync call. As a result, every buffered notification falls into the RetryMessageAsync "No delivery handler for category" branch (StoreAndForwardService.cs:201-204), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while SendAsync returns Success=true, WasBuffered=true, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee.

Recommendation

Register a delivery handler for StoreAndForwardCategory.Notification during startup that deserializes the buffered payload (ListName, Subject, Message), re-resolves the list/recipients/SMTP config, and re-attempts DeliverAsync, returning true on success, false on permanent failure, and throwing on transient failure. Wire it in AddNotificationService or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip.

Resolution

Unresolved.

NotificationService-002 — TimeoutException/OperationCanceledException misclassified as transient

Severity High
Category Error handling & resilience
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:157-167

Description

IsTransientSmtpError treats OperationCanceledException (and its subtype TaskCanceledException) as a transient SMTP error. When the caller passes a CancellationToken that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting OperationCanceledException is caught by the catch ... when (IsTransientSmtpError(ex)) clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any IOException as transient even though IOException covers unrelated failures (e.g. a serialization stream error). Additionally, OperationCanceledException raised by token cancellation in the OAuth2 path would be miscategorised the same way.

Recommendation

Re-throw OperationCanceledException/TaskCanceledException when cancellationToken.IsCancellationRequested is true rather than classifying it as transient. Narrow IOException handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (SmtpCommandException, SmtpProtocolException, ServiceNotConnectedException) instead of broad base types.

Resolution

Unresolved.

NotificationService-003 — Error classification by substring matching on exception messages is fragile

Severity High
Category Error handling & resilience
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:144-147, src/ScadaLink.NotificationService/NotificationDeliveryService.cs:163-166

Description

Transient/permanent classification depends on ex.Message.Contains("5."), Contains("4."), Contains("550"), Contains("421"), etc. This is unreliable: (a) Message.Contains("5.") matches any message containing the literal "5." anywhere — e.g. a host name smtp5.example.com, a version string, or a path — producing false permanent classification; (b) Contains("4.") likewise matches "v4.0" or an IP address octet; (c) MailKit exposes the actual SMTP status code on SmtpCommandException.StatusCode, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification.

Recommendation

Classify on MailKit's typed exceptions and SmtpCommandException.StatusCode (4xx → transient, 5xx → permanent), and SocketException/SmtpProtocolException/connection-refused → transient. Remove all Message.Contains checks.

Resolution

Unresolved.

NotificationService-004 — DeliverAsync constructs two SMTP clients and leaks the used one

Severity High
Category Performance & resource management
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:118-119

Description

using var client = _smtpClientFactory() as IDisposable;
var smtp = _smtpClientFactory();

The factory is invoked twice, creating two separate MailKitSmtpClientWrapper instances (each owning a real SmtpClient with a socket). The first instance is assigned to client and disposed by the using, but it is never used. The second instance, smtp, is the one actually connected, authenticated, used to send, and DisconnectAsync'd — but it is never Disposed. MailKitSmtpClientWrapper implements IDisposable and wraps an unmanaged socket; the connected client is leaked on every send. DisconnectAsync closes the connection but does not dispose the SmtpClient. Over time this leaks sockets/handles.

Recommendation

Create exactly one client and dispose the one that is actually used: using var smtp = _smtpClientFactory(); then cast to IDisposable only if needed (the factory's Func<ISmtpClientWrapper> should ideally return a type that the using can dispose directly — consider having ISmtpClientWrapper extend IAsyncDisposable/IDisposable).

Resolution

Unresolved.

NotificationService-005 — Non-TLS path uses SecureSocketOptions.Auto, contradicting the requested mode

Severity Medium
Category Correctness & logic bugs
Status Open
Location src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:18, src/ScadaLink.NotificationService/NotificationDeliveryService.cs:123

Description

ConnectAsync maps useTls to either SecureSocketOptions.StartTls or SecureSocketOptions.Auto. useTls is computed in DeliverAsync as TlsMode == "starttls". So a configuration of TlsMode = "none" produces useTls = falseSecureSocketOptions.Auto, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — None, StartTLS, SSL — but the code collapses them to a single boolean, so SSL (implicit TLS, typically port 465) is treated identically to None/Auto and the SSL mode is effectively unsupported. The bool useTls parameter cannot represent the three-state requirement.

Recommendation

Pass the TlsMode string (or a TlsMode enum) through to the wrapper and map explicitly: NoneSecureSocketOptions.None, StartTLSSecureSocketOptions.StartTls, SSLSecureSocketOptions.SslOnConnect. Validate the configured value and reject unknown modes.

Resolution

Unresolved.

NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist

Severity Medium
Category Concurrency & thread safety
Status Open
Location src/ScadaLink.NotificationService/OAuth2TokenService.cs:14-15, src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-35

Description

OAuth2TokenService is registered as a singleton and stores a single _cachedToken/_tokenExpiry pair. GetTokenAsync ignores the credentials argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's GetAllSmtpConfigurationsAsync returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order.

Recommendation

Key the cache by the credential identity (e.g. a dictionary keyed by tenantId:clientId, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit.

Resolution

Unresolved.

NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented

Severity Medium
Category Design-document adherence
Status Open
Location src/ScadaLink.NotificationService/NotificationOptions.cs:11-14, src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:16-20, src/ScadaLink.NotificationService/NotificationDeliveryService.cs:111-140

Description

The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and NotificationOptions/SmtpConfiguration both carry these fields. Neither is enforced: MailKitSmtpClientWrapper.ConnectAsync never sets SmtpClient.Timeout, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's CancellationToken bounds it, and callers may pass default). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so MaxConcurrentConnections has no effect. Both options exist but are dead configuration.

Recommendation

Set SmtpClient.Timeout from ConnectionTimeoutSeconds in ConnectAsync (and/or derive a linked CancellationTokenSource). Introduce a SemaphoreSlim(MaxConcurrentConnections) gating DeliverAsync. If these limits are intentionally deferred, mark the options [Obsolete]/document them as not-yet-enforced and note the gap in the design doc.

Resolution

Unresolved.

NotificationService-008 — Recipient email addresses are not validated before send

Severity Medium
Category Correctness & logic bugs
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:136-137, src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:50-53

Description

SendAsync builds bccAddresses directly from recipient.EmailAddress and passes them to MailboxAddress.Parse. If any recipient row has a malformed address, MailboxAddress.Parse throws ParseException. ParseException is not a TimeoutException/SocketException/IOException and its message will not generally contain "4." or "5.", so it falls through DeliverAsync's outer catch ... when (... && !IsTransientSmtpError(ex)) filter, which re-throws it (:153); it then escapes SendAsync entirely as an unhandled exception (the SendAsync catch blocks only cover SmtpPermanentException and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean NotificationResult error. The same applies to a malformed FromAddress.

Recommendation

Validate addresses up front (e.g. MailboxAddress.TryParse) and return a NotificationResult(false, ...) listing invalid recipients, or wrap DeliverAsync so any non-classified exception becomes a permanent NotificationResult failure rather than escaping. Consider validating addresses at definition time in the Central UI as well.

Resolution

Unresolved.

NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk

Severity Medium
Category Security
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:127-134, src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-65, src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9

Description

SMTP credentials — Basic Auth user:pass and OAuth2 tenantId:clientId:clientSecret — are stored and passed as a single colon-delimited plaintext string (SmtpConfiguration.Credentials). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a : will be split incorrectly (Split(':', 2) / Split(':', 3)), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (_logger.LogWarning(ex, ...), LogError(ex, ...)) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing.

Recommendation

Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing : are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging.

Resolution

Unresolved.

NotificationService-010 — DeliverAsync does not disconnect the SMTP client on failure

Severity Low
Category Performance & resource management
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:121-154

Description

DisconnectAsync is only called at :139, on the success path inside the try block. If AuthenticateAsync or SendAsync throws, control jumps to the catch filter at :141 and the method exits (re-throwing or wrapping) without ever calling DisconnectAsync. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots.

Recommendation

Move disconnect/dispose into a finally block (or use await using once ISmtpClientWrapper supports IAsyncDisposable) so the connection is always torn down regardless of outcome.

Resolution

Unresolved.

NotificationService-011 — SmtpPermanentException declared in the wrong file; module conventions

Severity Low
Category Code organization & conventions
Status Open
Location src/ScadaLink.NotificationService/NotificationDeliveryService.cs:173-177, src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15

Description

Two minor convention issues. (1) SmtpPermanentException is a public exception type declared at the bottom of NotificationDeliveryService.cs rather than in its own file (SmtpPermanentException.cs), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) SmtpConfiguration (a Commons POCO) declares non-nullable string properties (Host, AuthType, FromAddress) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences config.Host, config.AuthType, config.FromAddress without null checks.

Recommendation

Move SmtpPermanentException to its own file. For SmtpConfiguration, either keep the constructor as the only path and document it, or use required members so the compiler enforces initialization.

Resolution

Unresolved.

NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency

Severity Low
Category Testing coverage
Status Open
Location tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs, tests/ScadaLink.NotificationService.Tests/OAuth2TokenServiceTests.cs

Description

The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in DeliverAsync:128-132 — every test uses tokenService: null and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) DeliverAsync's permanent-classification fallback (:144-149) that promotes a generic exception whose message contains "550"/"553"/"554" to SmtpPermanentException is never tested; (3) OAuth2TokenServiceTests never tests concurrent GetTokenAsync calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant.

Recommendation

Add tests for: OAuth2-authenticated send with a mocked OAuth2TokenService; the DeliverAsync 5xx-message permanent fallback; token expiry/refresh (short expires_in); concurrent token acquisition; and an end-to-end buffered-notification retry once a Notification S&F handler is registered.

Resolution

Unresolved.