Files
scadalink-design/code-reviews/NotificationService/findings.md
Joseph Doherty 977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00

307 lines
21 KiB
Markdown

# Code Review — NotificationService
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.NotificationService` |
| Design doc | `docs/requirements/Component-NotificationService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 12 |
## Summary
The NotificationService module is small (6 source files) and structurally clean: it
abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle,
and integrates with the Store-and-Forward Engine for transient-failure buffering.
However, the review surfaced several substantive defects. The most serious is that
**no Store-and-Forward delivery handler is ever registered for the `Notification`
category** — buffered notifications are persisted but never retried or delivered,
silently losing every notification that hit a transient SMTP failure. Error
classification is fragile (substring matching on exception messages) and is
applied inconsistently between `SendAsync` and `DeliverAsync`. `DeliverAsync` also
contains a resource-management bug that constructs and leaks two SMTP clients per
call. Secondary themes: the `OAuth2TokenService` singleton caches a single token
keyed to no credential identity (incorrect if multiple SMTP configs exist), several
design-doc requirements are unimplemented (connection timeout, max concurrent
connections, TLS `SSL`/`None` modes), and credentials are stored and passed as
plaintext `string` values. Test coverage exercises the happy path and the main
error branches but misses the OAuth2 delivery path, the permanent-classification
fallback in `DeliverAsync`, and concurrency on the token cache.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | Double SMTP client construction; `Auto` socket option for non-TLS; `TimeoutException`/`OperationCanceledException` misclassified. |
| 2 | Akka.NET conventions | ☑ | No actors in this module (`AddNotificationServiceActors` is a no-op); delivery is a plain DI service. No Akka-specific issues. |
| 3 | Concurrency & thread safety | ☑ | `OAuth2TokenService` is a singleton with a shared mutable token cache; double-checked locking present but cache key is wrong (NS-006). |
| 4 | Error handling & resilience | ☑ | Critical: no S&F delivery handler registered for `Notification` (NS-001). Fragile substring error classification (NS-002, NS-003). |
| 5 | Security | ☑ | Credentials handled as plaintext strings; OAuth2 client secret in DB credential blob; no recipient address validation. |
| 6 | Performance & resource management | ☑ | Two `ISmtpClientWrapper` instances created per send, one leaked; connection not pooled; `MaxConcurrentConnections` unenforced. |
| 7 | Design-document adherence | ☑ | Connection timeout, max concurrent connections, and TLS `SSL`/`None` modes from the design doc are not implemented. |
| 8 | Code organization & conventions | ☑ | `SmtpPermanentException` in the wrong file; `SmtpConfiguration` POCO has non-nullable strings with no initializer (compiler-warning risk). |
| 9 | Testing coverage | ☑ | Happy path and main error branches covered; OAuth2 delivery path, `DeliverAsync` permanent fallback, and token-cache concurrency untested. |
| 10 | Documentation & comments | ☑ | XML comment on `DeliverAsync` ("Throws on failure") and the misleading "OAuth2 token refresh if needed" comment do not match behaviour. |
## Findings
### NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler)
| | |
|--|--|
| Severity | Critical |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8` |
**Description**
On a transient SMTP failure the service calls `_storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...)`. The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via `StoreAndForwardService.RegisterDeliveryHandler`. A repo-wide search shows the `Notification` category handler is never registered anywhere — `StoreAndForwardCategory.Notification` appears only in this module's `EnqueueAsync` call. As a result, every buffered notification falls into the `RetryMessageAsync` "No delivery handler for category" branch (`StoreAndForwardService.cs:201-204`), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while `SendAsync` returns `Success=true, WasBuffered=true`, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee.
**Recommendation**
Register a delivery handler for `StoreAndForwardCategory.Notification` during startup that deserializes the buffered payload (`ListName`, `Subject`, `Message`), re-resolves the list/recipients/SMTP config, and re-attempts `DeliverAsync`, returning `true` on success, `false` on permanent failure, and throwing on transient failure. Wire it in `AddNotificationService` or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip.
**Resolution**
_Unresolved._
### NotificationService-002 — `TimeoutException`/`OperationCanceledException` misclassified as transient
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:157-167` |
**Description**
`IsTransientSmtpError` treats `OperationCanceledException` (and its subtype `TaskCanceledException`) as a transient SMTP error. When the caller passes a `CancellationToken` that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting `OperationCanceledException` is caught by the `catch ... when (IsTransientSmtpError(ex))` clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any `IOException` as transient even though `IOException` covers unrelated failures (e.g. a serialization stream error). Additionally, `OperationCanceledException` raised by token cancellation in the OAuth2 path would be miscategorised the same way.
**Recommendation**
Re-throw `OperationCanceledException`/`TaskCanceledException` when `cancellationToken.IsCancellationRequested` is true rather than classifying it as transient. Narrow `IOException` handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (`SmtpCommandException`, `SmtpProtocolException`, `ServiceNotConnectedException`) instead of broad base types.
**Resolution**
_Unresolved._
### NotificationService-003 — Error classification by substring matching on exception messages is fragile
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:144-147`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:163-166` |
**Description**
Transient/permanent classification depends on `ex.Message.Contains("5.")`, `Contains("4.")`, `Contains("550")`, `Contains("421")`, etc. This is unreliable: (a) `Message.Contains("5.")` matches any message containing the literal "5." anywhere — e.g. a host name `smtp5.example.com`, a version string, or a path — producing false permanent classification; (b) `Contains("4.")` likewise matches `"v4.0"` or an IP address octet; (c) MailKit exposes the actual SMTP status code on `SmtpCommandException.StatusCode`, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification.
**Recommendation**
Classify on MailKit's typed exceptions and `SmtpCommandException.StatusCode` (4xx → transient, 5xx → permanent), and `SocketException`/`SmtpProtocolException`/connection-refused → transient. Remove all `Message.Contains` checks.
**Resolution**
_Unresolved._
### NotificationService-004 — `DeliverAsync` constructs two SMTP clients and leaks the used one
| | |
|--|--|
| Severity | High |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:118-119` |
**Description**
```csharp
using var client = _smtpClientFactory() as IDisposable;
var smtp = _smtpClientFactory();
```
The factory is invoked twice, creating two separate `MailKitSmtpClientWrapper` instances (each owning a real `SmtpClient` with a socket). The first instance is assigned to `client` and disposed by the `using`, but it is never used. The second instance, `smtp`, is the one actually connected, authenticated, used to send, and `DisconnectAsync`'d — but it is never `Dispose`d. `MailKitSmtpClientWrapper` implements `IDisposable` and wraps an unmanaged socket; the connected client is leaked on every send. `DisconnectAsync` closes the connection but does not dispose the `SmtpClient`. Over time this leaks sockets/handles.
**Recommendation**
Create exactly one client and dispose the one that is actually used:
`using var smtp = _smtpClientFactory();` then cast to `IDisposable` only if needed (the factory's `Func<ISmtpClientWrapper>` should ideally return a type that the `using` can dispose directly — consider having `ISmtpClientWrapper` extend `IAsyncDisposable`/`IDisposable`).
**Resolution**
_Unresolved._
### NotificationService-005 — Non-TLS path uses `SecureSocketOptions.Auto`, contradicting the requested mode
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:18`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:123` |
**Description**
`ConnectAsync` maps `useTls` to either `SecureSocketOptions.StartTls` or `SecureSocketOptions.Auto`. `useTls` is computed in `DeliverAsync` as `TlsMode == "starttls"`. So a configuration of `TlsMode = "none"` produces `useTls = false``SecureSocketOptions.Auto`, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — `None`, `StartTLS`, `SSL` — but the code collapses them to a single boolean, so `SSL` (implicit TLS, typically port 465) is treated identically to `None`/`Auto` and the SSL mode is effectively unsupported. The `bool useTls` parameter cannot represent the three-state requirement.
**Recommendation**
Pass the `TlsMode` string (or a `TlsMode` enum) through to the wrapper and map explicitly: `None``SecureSocketOptions.None`, `StartTLS``SecureSocketOptions.StartTls`, `SSL``SecureSocketOptions.SslOnConnect`. Validate the configured value and reject unknown modes.
**Resolution**
_Unresolved._
### NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/OAuth2TokenService.cs:14-15`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-35` |
**Description**
`OAuth2TokenService` is registered as a singleton and stores a single `_cachedToken`/`_tokenExpiry` pair. `GetTokenAsync` ignores the `credentials` argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's `GetAllSmtpConfigurationsAsync` returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order.
**Recommendation**
Key the cache by the credential identity (e.g. a dictionary keyed by `tenantId:clientId`, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit.
**Resolution**
_Unresolved._
### NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:11-14`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:16-20`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:111-140` |
**Description**
The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and `NotificationOptions`/`SmtpConfiguration` both carry these fields. Neither is enforced: `MailKitSmtpClientWrapper.ConnectAsync` never sets `SmtpClient.Timeout`, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's `CancellationToken` bounds it, and callers may pass `default`). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so `MaxConcurrentConnections` has no effect. Both options exist but are dead configuration.
**Recommendation**
Set `SmtpClient.Timeout` from `ConnectionTimeoutSeconds` in `ConnectAsync` (and/or derive a linked `CancellationTokenSource`). Introduce a `SemaphoreSlim(MaxConcurrentConnections)` gating `DeliverAsync`. If these limits are intentionally deferred, mark the options `[Obsolete]`/document them as not-yet-enforced and note the gap in the design doc.
**Resolution**
_Unresolved._
### NotificationService-008 — Recipient email addresses are not validated before send
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:136-137`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:50-53` |
**Description**
`SendAsync` builds `bccAddresses` directly from `recipient.EmailAddress` and passes them to `MailboxAddress.Parse`. If any recipient row has a malformed address, `MailboxAddress.Parse` throws `ParseException`. `ParseException` is not a `TimeoutException`/`SocketException`/`IOException` and its message will not generally contain "4." or "5.", so it falls through `DeliverAsync`'s outer `catch ... when (... && !IsTransientSmtpError(ex))` filter, which re-throws it (`:153`); it then escapes `SendAsync` entirely as an unhandled exception (the `SendAsync` catch blocks only cover `SmtpPermanentException` and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean `NotificationResult` error. The same applies to a malformed `FromAddress`.
**Recommendation**
Validate addresses up front (e.g. `MailboxAddress.TryParse`) and return a `NotificationResult(false, ...)` listing invalid recipients, or wrap `DeliverAsync` so any non-classified exception becomes a permanent `NotificationResult` failure rather than escaping. Consider validating addresses at definition time in the Central UI as well.
**Resolution**
_Unresolved._
### NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:127-134`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-65`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9` |
**Description**
SMTP credentials — Basic Auth `user:pass` and OAuth2 `tenantId:clientId:clientSecret` — are stored and passed as a single colon-delimited plaintext `string` (`SmtpConfiguration.Credentials`). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a `:` will be split incorrectly (`Split(':', 2)` / `Split(':', 3)`), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (`_logger.LogWarning(ex, ...)`, `LogError(ex, ...)`) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing.
**Recommendation**
Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing `:` are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging.
**Resolution**
_Unresolved._
### NotificationService-010 — `DeliverAsync` does not disconnect the SMTP client on failure
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:121-154` |
**Description**
`DisconnectAsync` is only called at `:139`, on the success path inside the `try` block. If `AuthenticateAsync` or `SendAsync` throws, control jumps to the `catch` filter at `:141` and the method exits (re-throwing or wrapping) without ever calling `DisconnectAsync`. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots.
**Recommendation**
Move disconnect/dispose into a `finally` block (or use `await using` once `ISmtpClientWrapper` supports `IAsyncDisposable`) so the connection is always torn down regardless of outcome.
**Resolution**
_Unresolved._
### NotificationService-011 — `SmtpPermanentException` declared in the wrong file; module conventions
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:173-177`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15` |
**Description**
Two minor convention issues. (1) `SmtpPermanentException` is a public exception type declared at the bottom of `NotificationDeliveryService.cs` rather than in its own file (`SmtpPermanentException.cs`), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) `SmtpConfiguration` (a Commons POCO) declares non-nullable `string` properties (`Host`, `AuthType`, `FromAddress`) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences `config.Host`, `config.AuthType`, `config.FromAddress` without null checks.
**Recommendation**
Move `SmtpPermanentException` to its own file. For `SmtpConfiguration`, either keep the constructor as the only path and document it, or use `required` members so the compiler enforces initialization.
**Resolution**
_Unresolved._
### NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ScadaLink.NotificationService.Tests/OAuth2TokenServiceTests.cs` |
**Description**
The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in `DeliverAsync:128-132` — every test uses `tokenService: null` and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) `DeliverAsync`'s permanent-classification fallback (`:144-149`) that promotes a generic exception whose message contains "550"/"553"/"554" to `SmtpPermanentException` is never tested; (3) `OAuth2TokenServiceTests` never tests concurrent `GetTokenAsync` calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant.
**Recommendation**
Add tests for: OAuth2-authenticated send with a mocked `OAuth2TokenService`; the `DeliverAsync` 5xx-message permanent fallback; token expiry/refresh (short `expires_in`); concurrent token acquisition; and an end-to-end buffered-notification retry once a `Notification` S&F handler is registered.
**Resolution**
_Unresolved._