Files
ScadaBridge/code-reviews/NotificationService/findings.md
T
Joseph Doherty f93b7b99bb code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.

regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
2026-05-28 02:55:47 -04:00

839 lines
76 KiB
Markdown

# Code Review — NotificationService
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.NotificationService` |
| Design doc | `docs/requirements/Component-NotificationService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 7 |
## Summary
The NotificationService module is small (6 source files) and structurally clean: it
abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle,
and integrates with the Store-and-Forward Engine for transient-failure buffering.
However, the review surfaced several substantive defects. The most serious is that
**no Store-and-Forward delivery handler is ever registered for the `Notification`
category** — buffered notifications are persisted but never retried or delivered,
silently losing every notification that hit a transient SMTP failure. Error
classification is fragile (substring matching on exception messages) and is
applied inconsistently between `SendAsync` and `DeliverAsync`. `DeliverAsync` also
contains a resource-management bug that constructs and leaks two SMTP clients per
call. Secondary themes: the `OAuth2TokenService` singleton caches a single token
keyed to no credential identity (incorrect if multiple SMTP configs exist), several
design-doc requirements are unimplemented (connection timeout, max concurrent
connections, TLS `SSL`/`None` modes), and credentials are stored and passed as
plaintext `string` values. Test coverage exercises the happy path and the main
error branches but misses the OAuth2 delivery path, the permanent-classification
fallback in `DeliverAsync`, and concurrency on the token cache.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed at commit `39d737e` after the NS-001..013 fixes. All twelve prior
findings remain closed (eleven Resolved, NS-013 Deferred) and the fixes hold up
against the current source — error classification is now typed, the SMTP client is
created once and always disconnected/disposed, the S&F `Notification` delivery
handler is registered in `AkkaHostedService`, and the OAuth2 token cache is keyed
per credential. The re-review surfaced **five new findings**. The recurring theme is
**unclassified-exception handling**: the typed `ClassifySmtpError` helper introduced
by NS-002/003 returns `Unknown` for anything that is not a recognised SMTP/socket
failure, but neither `SendAsync` nor `DeliverBufferedAsync` has a catch-all for the
`Unknown` bucket. As a result an OAuth2 token-fetch failure (`HttpRequestException`
from `EnsureSuccessStatusCode`, or `InvalidOperationException` from a malformed
credential string) escapes `SendAsync` as a raw exception to the calling script
(NS-015) and escapes `DeliverBufferedAsync` out of the S&F handler, where the engine
treats any thrown exception as transient and retries a permanently-broken config
forever (NS-014) — the same defect class NS-008 fixed for address parsing, but the
OAuth2 path was never covered. Secondary findings: `AuthenticateAsync` silently
proceeds unauthenticated for an unknown `authType` or empty credentials (NS-016);
`NotificationOptions` is bound from configuration in two places but never read by
any code (NS-017, dead config — NS-007 sourced the timeout/limit from
`SmtpConfiguration` instead); and the lazily-created concurrency limiter is read
outside its lock, is sized once and never resized on redeployment, and is never
disposed (NS-018).
#### Re-review 2026-05-28 (commit `1eb6e97`)
Re-reviewed at commit `1eb6e97` against the **materially-changed design**: per the
updated `Component-NotificationService.md` and CLAUDE.md, the Notification Service
is now **central-only**. Sites no longer deliver notifications over SMTP — a
script's `Notify.Send` enqueues to the site Store-and-Forward Engine and
`NotificationForwarder.DeliverAsync` (S&F handler in StoreAndForward) forwards
the payload to the central Notification Outbox, which dispatches via the
`INotificationDeliveryAdapter` registered for the list's `Type`. Email delivery
on central is performed by `EmailNotificationDeliveryAdapter` in the
NotificationOutbox project — it reuses this module's SMTP machinery
(`ISmtpClientWrapper`, `OAuth2TokenService`, `SmtpErrorClassifier`,
`SmtpTlsModeParser`, `EmailAddressValidator`, `CredentialRedactor`,
`SmtpPermanentException`, `NotificationOptions`) but is the actual production
caller. The intended residual responsibility of this module is to **supply that
shared SMTP machinery** plus list/SMTP-config definition management on central.
The re-review surfaced **seven new findings**. The dominant theme is **dead
code that contradicts the design doc**: `NotificationDeliveryService`, the
`INotificationDeliveryService` interface in Commons, the `NotificationResult`
record, the entire `DeliverBufferedAsync` S&F handler, and the prior NS-001…
NS-018 test fixtures that exercise them are now orphaned — no production code
path resolves `INotificationDeliveryService` on a site (sites no longer register
this module per `SiteServiceRegistration.cs:33-38`) and on central the
NotificationOutbox uses its own `EmailNotificationDeliveryAdapter` (which
duplicates the connect/auth/send/disconnect sequence rather than delegating to
`NotificationDeliveryService`). The class is still registered by
`AddNotificationService` on central (`Program.cs:77`) but no consumer resolves
it (NS-019). The `S&F handler must be registered` workaround that NS-001 added
to `AkkaHostedService` is itself superseded by the `NotificationForwarder`
registered for the same category at `AkkaHostedService.cs:654-660` (NS-020).
Secondary findings: a real-world correctness gap (the OAuth2
`SaslMechanismOAuth2` is constructed with an **empty user id** so server-side
account binding fails for any provider that requires it — NS-021); the SMTP
client wrapper holds a single `MailKit.SmtpClient` for the lifetime of the
wrapper but the factory delegate creates a new wrapper per send, so successive
sends through the same factory share NO connection but DO share a wrapper that
mutates `_client.Timeout` on every connect (benign because every wrapper has its
own client, but the design comment about pooling is now contradicted — NS-022);
the design-doc retention/maintenance language has no implementation in this
module and there is no test affirming the module is central-only (NS-023, NS-024);
and `CredentialRedactor` masks any component of the credential string that is
≥ 4 characters long — a 4-character user name like `root` or a 4-char tenant
prefix could be aggressively scrubbed out of unrelated log text (NS-025).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | Re-review: OAuth2 SASL constructed with empty user id (NS-021); `CredentialRedactor` over-masks short components (NS-025). Earlier NS-005/NS-008 fixes hold. |
| 2 | Akka.NET conventions | ☑ | No actors in this module. `AddNotificationServiceActors` remains a documented no-op. |
| 3 | Concurrency & thread safety | ☑ | `OAuth2TokenService` per-credential locks now correct (NS-006 hold). No new issues. |
| 4 | Error handling & resilience | ☑ | NS-014/NS-015 classification fixes hold but the entire `DeliverBufferedAsync` / `SendAsync` error path is dead (NS-019/NS-020). |
| 5 | Security | ☑ | OAuth2 `SaslMechanismOAuth2` empty user id (NS-021); `CredentialRedactor` aggressiveness (NS-025); at-rest encryption still deferred (NS-013). |
| 6 | Performance & resource management | ☑ | `MailKitSmtpClientWrapper` keeps a single `SmtpClient` for the wrapper lifetime; combined with per-send factory this means no pooling — re-document or fix (NS-022). |
| 7 | Design-document adherence | ☑ | Critical drift: module still exposes site-style S&F sending; the design doc inverted delivery to central months ago (NS-019). Site registration removed but central still wires the dead service. |
| 8 | Code organization & conventions | ☑ | `INotificationDeliveryService` lives in Commons and is now unused — should be retired or relocated to a NotificationService-internal namespace (NS-019). Module-vs-NotificationOutbox boundary unclear. |
| 9 | Testing coverage | ☑ | 56 tests pass but ~40 of them assert behaviour of a code path no production caller exercises (NS-024). No test affirms the central-only design — i.e. that `AddNotificationService` registers no notification-sending service on a site. |
| 10 | Documentation & comments | ☑ | `NotificationDeliveryService` XML doc still claims "WP-11/12: Notification delivery via SMTP" with no warning that the class is orphaned; `INotificationDeliveryService` Commons doc claims "Implemented by NotificationService, consumed by ScriptRuntimeContext" — both consumers are wrong now (NS-023). |
## Findings
### NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler)
| | |
|--|--|
| Severity | Critical |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8` |
**Description**
On a transient SMTP failure the service calls `_storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...)`. The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via `StoreAndForwardService.RegisterDeliveryHandler`. A repo-wide search shows the `Notification` category handler is never registered anywhere — `StoreAndForwardCategory.Notification` appears only in this module's `EnqueueAsync` call. As a result, every buffered notification falls into the `RetryMessageAsync` "No delivery handler for category" branch (`StoreAndForwardService.cs:201-204`), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while `SendAsync` returns `Success=true, WasBuffered=true`, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee.
**Recommendation**
Register a delivery handler for `StoreAndForwardCategory.Notification` during startup that deserializes the buffered payload (`ListName`, `Subject`, `Message`), re-resolves the list/recipients/SMTP config, and re-attempts `DeliverAsync`, returning `true` on success, `false` on permanent failure, and throwing on transient failure. Wire it in `AddNotificationService` or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip.
**Resolution**
Resolved 2026-05-16. A delivery handler for `StoreAndForwardCategory.Notification` is now
registered at site startup in `AkkaHostedService`. The handler resolves
`NotificationDeliveryService` in a fresh DI scope and calls the new `DeliverBufferedAsync`,
which re-resolves the list, recipients and SMTP config and re-attempts delivery —
returning `true` on success, `false` (park) on permanent failure or missing
configuration, and throwing on transient failure so the engine retries. `SendAsync` now
buffers with `attemptImmediateDelivery: false` so registering the handler does not send
the notification twice. Regression tests cover the happy path and the list-removed park
path. Fixed by the commit whose message references `NotificationService-001`.
### NotificationService-002 — `TimeoutException`/`OperationCanceledException` misclassified as transient
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:157-167` |
**Description**
`IsTransientSmtpError` treats `OperationCanceledException` (and its subtype `TaskCanceledException`) as a transient SMTP error. When the caller passes a `CancellationToken` that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting `OperationCanceledException` is caught by the `catch ... when (IsTransientSmtpError(ex))` clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any `IOException` as transient even though `IOException` covers unrelated failures (e.g. a serialization stream error). Additionally, `OperationCanceledException` raised by token cancellation in the OAuth2 path would be miscategorised the same way.
**Recommendation**
Re-throw `OperationCanceledException`/`TaskCanceledException` when `cancellationToken.IsCancellationRequested` is true rather than classifying it as transient. Narrow `IOException` handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (`SmtpCommandException`, `SmtpProtocolException`, `ServiceNotConnectedException`) instead of broad base types.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Classification was rewritten around a typed
`ClassifySmtpError` helper: a caller-requested cancellation (`OperationCanceledException`/
`TaskCanceledException` while `cancellationToken.IsCancellationRequested`) now propagates
out of both `SendAsync` and `DeliverAsync` via dedicated `catch` filters instead of being
buffered. The broad `IOException` catch-all was dropped — only MailKit's typed exceptions
plus `SocketException`/`TimeoutException` are treated as transient. Regression tests
`Send_CancellationRequested_PropagatesAndDoesNotBuffer` and
`Send_TaskCanceledException_WithCancellation_Propagates`.
### NotificationService-003 — Error classification by substring matching on exception messages is fragile
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:144-147`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:163-166` |
**Description**
Transient/permanent classification depends on `ex.Message.Contains("5.")`, `Contains("4.")`, `Contains("550")`, `Contains("421")`, etc. This is unreliable: (a) `Message.Contains("5.")` matches any message containing the literal "5." anywhere — e.g. a host name `smtp5.example.com`, a version string, or a path — producing false permanent classification; (b) `Contains("4.")` likewise matches `"v4.0"` or an IP address octet; (c) MailKit exposes the actual SMTP status code on `SmtpCommandException.StatusCode`, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification.
**Recommendation**
Classify on MailKit's typed exceptions and `SmtpCommandException.StatusCode` (4xx → transient, 5xx → permanent), and `SocketException`/`SmtpProtocolException`/connection-refused → transient. Remove all `Message.Contains` checks.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). All `ex.Message.Contains(...)` checks were
removed. The new `ClassifySmtpError` helper inspects `SmtpCommandException.StatusCode`
(numeric SMTP code: 4xx → transient, 5xx → permanent) and treats `SmtpProtocolException`,
`ServiceNotConnectedException`, `SocketException` and `TimeoutException` as transient;
anything else is `Unknown` and propagates unclassified rather than being guessed. The
permanent-promotion `catch` block in `DeliverAsync` now keys off this classification.
Regression tests `Send_Smtp5xxCommandException_ClassifiedPermanent`,
`Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`,
`Send_SmtpProtocolException_ClassifiedTransient`, and
`Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent`.
### NotificationService-004 — `DeliverAsync` constructs two SMTP clients and leaks the used one
| | |
|--|--|
| Severity | High |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:118-119` |
**Description**
```csharp
using var client = _smtpClientFactory() as IDisposable;
var smtp = _smtpClientFactory();
```
The factory is invoked twice, creating two separate `MailKitSmtpClientWrapper` instances (each owning a real `SmtpClient` with a socket). The first instance is assigned to `client` and disposed by the `using`, but it is never used. The second instance, `smtp`, is the one actually connected, authenticated, used to send, and `DisconnectAsync`'d — but it is never `Dispose`d. `MailKitSmtpClientWrapper` implements `IDisposable` and wraps an unmanaged socket; the connected client is leaked on every send. `DisconnectAsync` closes the connection but does not dispose the `SmtpClient`. Over time this leaks sockets/handles.
**Recommendation**
Create exactly one client and dispose the one that is actually used:
`using var smtp = _smtpClientFactory();` then cast to `IDisposable` only if needed (the factory's `Func<ISmtpClientWrapper>` should ideally return a type that the `using` can dispose directly — consider having `ISmtpClientWrapper` extend `IAsyncDisposable`/`IDisposable`).
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). `DeliverAsync` now invokes `_smtpClientFactory()`
exactly once and disposes the client actually used via `using var disposable = smtp as
IDisposable;`. The previous code created two `MailKitSmtpClientWrapper` instances per send
and disposed the unused one while leaking the connected one. Regression test
`Send_CreatesExactlyOneSmtpClient_AndDisposesIt` verifies the factory is invoked once and
the resulting client is disposed.
### NotificationService-005 — Non-TLS path uses `SecureSocketOptions.Auto`, contradicting the requested mode
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:18`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:123` |
**Description**
`ConnectAsync` maps `useTls` to either `SecureSocketOptions.StartTls` or `SecureSocketOptions.Auto`. `useTls` is computed in `DeliverAsync` as `TlsMode == "starttls"`. So a configuration of `TlsMode = "none"` produces `useTls = false``SecureSocketOptions.Auto`, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — `None`, `StartTLS`, `SSL` — but the code collapses them to a single boolean, so `SSL` (implicit TLS, typically port 465) is treated identically to `None`/`Auto` and the SSL mode is effectively unsupported. The `bool useTls` parameter cannot represent the three-state requirement.
**Recommendation**
Pass the `TlsMode` string (or a `TlsMode` enum) through to the wrapper and map explicitly: `None``SecureSocketOptions.None`, `StartTLS``SecureSocketOptions.StartTls`, `SSL``SecureSocketOptions.SslOnConnect`. Validate the configured value and reject unknown modes.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
`bool useTls` parameter cannot represent three states, and the non-StartTLS branch used
`SecureSocketOptions.Auto`. A new `SmtpTlsMode` enum (`None`/`StartTls`/`Ssl`) and
`SmtpTlsModeParser` were added; `ISmtpClientWrapper.ConnectAsync` now takes `SmtpTlsMode`
and `MailKitSmtpClientWrapper` maps it explicitly to `SecureSocketOptions.None`/
`StartTls`/`SslOnConnect`. `SendAsync`/`DeliverBufferedAsync` validate the configured
`TlsMode` up front — an unknown value returns a clean `NotificationResult` failure (or
parks a buffered message) instead of silently negotiating TLS. Regression tests:
`Send_TlsModeNone_DoesNotNegotiateTls`, `Send_TlsModeSsl_UsesImplicitSsl`,
`Send_UnknownTlsMode_ReturnsErrorNotSilentFallback`, and the `SmtpTlsModeParserTests` set.
### NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/OAuth2TokenService.cs:14-15`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-35` |
**Description**
`OAuth2TokenService` is registered as a singleton and stores a single `_cachedToken`/`_tokenExpiry` pair. `GetTokenAsync` ignores the `credentials` argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's `GetAllSmtpConfigurationsAsync` returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order.
**Recommendation**
Key the cache by the credential identity (e.g. a dictionary keyed by `tenantId:clientId`, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed: the singleton held a single
`_cachedToken`/`_tokenExpiry` pair and `GetTokenAsync` ignored the `credentials` argument
when validating the cache, so a second SMTP config got the first config's token.
`OAuth2TokenService` now stores a `ConcurrentDictionary<string, CacheEntry>` keyed by the
SHA-256 hash of the credential string; each distinct tenant/client/secret gets its own
cached token, expiry, and per-credential `SemaphoreSlim` (double-checked locking
preserved). Regression tests: `GetTokenAsync_DifferentCredentials_ReturnPerCredentialTokens`
and `GetTokenAsync_SameCredentials_CachedPerCredential`.
### NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:11-14`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:16-20`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:111-140` |
**Description**
The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and `NotificationOptions`/`SmtpConfiguration` both carry these fields. Neither is enforced: `MailKitSmtpClientWrapper.ConnectAsync` never sets `SmtpClient.Timeout`, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's `CancellationToken` bounds it, and callers may pass `default`). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so `MaxConcurrentConnections` has no effect. Both options exist but are dead configuration.
**Recommendation**
Set `SmtpClient.Timeout` from `ConnectionTimeoutSeconds` in `ConnectAsync` (and/or derive a linked `CancellationTokenSource`). Introduce a `SemaphoreSlim(MaxConcurrentConnections)` gating `DeliverAsync`. If these limits are intentionally deferred, mark the options `[Obsolete]`/document them as not-yet-enforced and note the gap in the design doc.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed: `ConnectAsync` never set
`SmtpClient.Timeout` and no throttle gated `DeliverAsync`. `ISmtpClientWrapper.ConnectAsync`
now takes a `connectionTimeoutSeconds` argument; `MailKitSmtpClientWrapper` sets
`SmtpClient.Timeout` from `SmtpConfiguration.ConnectionTimeoutSeconds`. `DeliverAsync`
acquires a lazily-created `SemaphoreSlim` sized to `SmtpConfiguration.MaxConcurrentConnections`
(default 5 when non-positive) and releases it in a `finally`, so concurrent SMTP
deliveries per site are bounded. The timeout is sourced from the deployed
`SmtpConfiguration` rather than `NotificationOptions`; the `NotificationOptions` fields
remain as operational fallback defaults. Regression tests:
`Send_PassesConfiguredConnectionTimeoutToClient` and
`Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.
### NotificationService-008 — Recipient email addresses are not validated before send
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:136-137`, `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:50-53` |
**Description**
`SendAsync` builds `bccAddresses` directly from `recipient.EmailAddress` and passes them to `MailboxAddress.Parse`. If any recipient row has a malformed address, `MailboxAddress.Parse` throws `ParseException`. `ParseException` is not a `TimeoutException`/`SocketException`/`IOException` and its message will not generally contain "4." or "5.", so it falls through `DeliverAsync`'s outer `catch ... when (... && !IsTransientSmtpError(ex))` filter, which re-throws it (`:153`); it then escapes `SendAsync` entirely as an unhandled exception (the `SendAsync` catch blocks only cover `SmtpPermanentException` and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean `NotificationResult` error. The same applies to a malformed `FromAddress`.
**Recommendation**
Validate addresses up front (e.g. `MailboxAddress.TryParse`) and return a `NotificationResult(false, ...)` listing invalid recipients, or wrap `DeliverAsync` so any non-classified exception becomes a permanent `NotificationResult` failure rather than escaping. Consider validating addresses at definition time in the Central UI as well.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed: `MailboxAddress.Parse` of a
malformed `FromAddress`/recipient threw `ParseException`, which is unclassified and
escaped `SendAsync` as an unhandled exception. A new `ValidateAddresses` helper uses
`MailboxAddress.TryParse` for the sender and every recipient; `SendAsync` now returns a
clean `NotificationResult(false, ...)` listing the invalid address(es) before any SMTP
attempt, and `DeliverBufferedAsync` parks a buffered message with a bad address (a fault
retrying cannot fix). Regression tests:
`Send_MalformedRecipientAddress_ReturnsCleanError_DoesNotThrow` and
`Send_MalformedFromAddress_ReturnsCleanError_DoesNotThrow`. Definition-time validation in
the Central UI is a separate component and out of this module's scope.
### NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk
| | |
|--|--|
| Severity | Medium — re-triaged: split into an in-scope log-leak fix (resolved) and a Commons-scoped at-rest-encryption / structured-credential follow-up (NotificationService-013, Deferred). |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:127-134`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:30-65`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9` |
**Description**
SMTP credentials — Basic Auth `user:pass` and OAuth2 `tenantId:clientId:clientSecret` — are stored and passed as a single colon-delimited plaintext `string` (`SmtpConfiguration.Credentials`). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a `:` will be split incorrectly (`Split(':', 2)` / `Split(':', 3)`), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (`_logger.LogWarning(ex, ...)`, `LogError(ex, ...)`) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing.
**Recommendation**
Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing `:` are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause re-triaged against source: the finding
conflates two concerns with different ownership.
1. **Log-leak risk (in scope — fixed).** The original code logged whole exception objects
(`_logger.LogWarning(ex, ...)` / `LogError(ex, ...)`); MailKit auth exceptions can echo
server responses quoting the supplied credentials. A new internal `CredentialRedactor`
masks every colon-delimited credential component out of any text. `SendAsync` and
`DeliverBufferedAsync` now log a scrubbed message string (not the raw exception) and the
permanent-failure `NotificationResult` is scrubbed before it returns to the caller.
`OAuth2TokenService` logs the tenant id only — never the client secret or access token.
Regression tests: `CredentialRedactorTests` and
`Send_PermanentError_RedactsCredentialFromResultMessage`.
2. **At-rest encryption + structured-credential modelling (out of scope — Deferred).**
Encrypting `SmtpConfiguration.Credentials` at rest and replacing the brittle
colon-packed `string` with structured fields requires editing
`src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs` and the
ConfigurationDatabase EF layer — both outside this module. Tracked separately as
**NotificationService-013** (Deferred) so it is not lost.
### NotificationService-013 — Encrypt SMTP credentials at rest; replace colon-packed string with structured fields
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Deferred |
| Location | `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:9`, ConfigurationDatabase EF mapping |
**Description**
Split out of NotificationService-009. `SmtpConfiguration.Credentials` packs Basic Auth
`user:pass` and OAuth2 `tenantId:clientId:clientSecret` into a single plaintext
colon-delimited `string`: (a) there is no encryption at rest in SQLite or the central
config DB; (b) a password or client secret containing a `:` is split incorrectly by
`Split(':', 2)` / `Split(':', 3)`, silently corrupting the secret.
**Recommendation**
Model credentials as structured fields (or an encrypted blob) on the Commons entity and
encrypt at rest via Data Protection / a secret store. The colon-delimited parsing in
`MailKitSmtpClientWrapper` and `OAuth2TokenService` would then consume the structured
fields directly.
**Resolution**
Deferred — requires changes to `src/ScadaLink.Commons` and the ConfigurationDatabase
component, which are outside the NotificationService module. To be addressed in a
Commons/ConfigurationDatabase-scoped change. The associated log-leak risk is resolved
under NotificationService-009.
### NotificationService-010 — `DeliverAsync` does not disconnect the SMTP client on failure
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:121-154` |
**Description**
`DisconnectAsync` is only called at `:139`, on the success path inside the `try` block. If `AuthenticateAsync` or `SendAsync` throws, control jumps to the `catch` filter at `:141` and the method exits (re-throwing or wrapping) without ever calling `DisconnectAsync`. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots.
**Recommendation**
Move disconnect/dispose into a `finally` block (or use `await using` once `ISmtpClientWrapper` supports `IAsyncDisposable`) so the connection is always torn down regardless of outcome.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against the current source:
`DeliverAsync` invoked `smtp.DisconnectAsync` only on the success path inside the `try`
block, so a failed `Connect`/`Authenticate`/`Send` left the connection open until
finalization. `DisconnectAsync` was moved into the existing `finally` block (which also
releases the NS-007 concurrency slot), so the SMTP connection is always torn down
regardless of outcome. The disconnect is best-effort — wrapped in its own `try/catch` that
logs at debug level — so a disconnect failure (e.g. the connection is already dead) cannot
mask the original delivery exception. Regression tests
`Send_TransientFailureDuringSend_StillDisconnectsClient` and
`Send_FailureDuringAuthenticate_StillDisconnectsClient`.
### NotificationService-011 — `SmtpPermanentException` declared in the wrong file; module conventions
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:173-177`, `src/ScadaLink.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15` |
**Description**
Two minor convention issues. (1) `SmtpPermanentException` is a public exception type declared at the bottom of `NotificationDeliveryService.cs` rather than in its own file (`SmtpPermanentException.cs`), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) `SmtpConfiguration` (a Commons POCO) declares non-nullable `string` properties (`Host`, `AuthType`, `FromAddress`) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences `config.Host`, `config.AuthType`, `config.FromAddress` without null checks.
**Recommendation**
Move `SmtpPermanentException` to its own file. For `SmtpConfiguration`, either keep the constructor as the only path and document it, or use `required` members so the compiler enforces initialization.
**Resolution**
Resolved 2026-05-16 (commit pending). Both issues confirmed against source.
1. **`SmtpPermanentException` placement (in scope — fixed).** The public exception type
was extracted from the bottom of `NotificationDeliveryService.cs` into its own file,
`src/ScadaLink.NotificationService/SmtpPermanentException.cs`, restoring the
one-type-per-file layout. No behaviour change, so no dedicated regression test — the
move is verified by the module test suite still compiling and passing (56 tests green).
2. **`SmtpConfiguration` non-nullable strings (out of scope — re-triaged).**
`SmtpConfiguration` lives in `src/ScadaLink.Commons`, which is outside the
NotificationService module's edit scope; it cannot be changed here. This is the same
Commons entity already flagged for follow-up under **NotificationService-013**
(Deferred). The `required`-members / documented-constructor change should be folded into
that Commons/ConfigurationDatabase-scoped work. Note the delivery service's actual risk
is bounded: `SendAsync`/`DeliverBufferedAsync` already validate `TlsMode` and addresses
up front (NS-005/NS-008), and a null `Host`/`AuthType` from a malformed config row
surfaces as a classified/clean failure rather than silent corruption.
### NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ScadaLink.NotificationService.Tests/OAuth2TokenServiceTests.cs` |
**Description**
The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in `DeliverAsync:128-132` — every test uses `tokenService: null` and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) `DeliverAsync`'s permanent-classification fallback (`:144-149`) that promotes a generic exception whose message contains "550"/"553"/"554" to `SmtpPermanentException` is never tested; (3) `OAuth2TokenServiceTests` never tests concurrent `GetTokenAsync` calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant.
**Recommendation**
Add tests for: OAuth2-authenticated send with a mocked `OAuth2TokenService`; the `DeliverAsync` 5xx-message permanent fallback; token expiry/refresh (short `expires_in`); concurrent token acquisition; and an end-to-end buffered-notification retry once a `Notification` S&F handler is registered.
**Resolution**
Resolved 2026-05-16 (commit pending). Each listed gap was re-triaged against the current
source (the module was substantially rewritten by NS-001..009 since this finding was
filed) and the genuinely-missing tests were added:
1. **OAuth2 delivery path (gap real — covered).** New test
`Send_OAuth2Config_AuthenticatesWithResolvedAccessToken` drives `DeliverAsync` with an
`oauth2` `SmtpConfiguration` and a real `OAuth2TokenService` (mocked `IHttpClientFactory`),
asserting the SMTP client is authenticated with the *resolved access token*, not the raw
`tenant:client:secret` triple.
2. **5xx-message permanent fallback (gap obsolete — re-triaged).** The substring-based
permanent fallback this item describes no longer exists: NS-003 replaced message-text
matching with typed-exception/`SmtpStatusCode` classification. The equivalent path is
already covered by `Send_Smtp5xxCommandException_ClassifiedPermanent`, and
`Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent` proves a "5xx
lookalike" message is no longer promoted. No new test needed for a removed branch.
3. **Token expiry/refresh and concurrent acquisition (gaps real — covered).** New tests
`GetTokenAsync_ExpiredToken_RefreshesOnNextCall` (uses `expires_in: 60` so the token is
stale immediately given the 60s refresh margin) and
`GetTokenAsync_ConcurrentCalls_MakeExactlyOneHttpRequest` (20 racing callers collapse to
a single token fetch, exercising the per-credential double-checked lock).
4. **End-to-end buffer-and-retry (gap already closed — re-triaged).** The
buffer-then-retry-then-deliver round trip was added when NS-001 was resolved
(`DeliverBuffered_HappyPath_ReturnsTrue`, `DeliverBuffered_ListNoLongerExists_…`, plus
`AkkaHostedService` handler registration), so no further test is required here.
Module test suite is green at 56 tests.
### NotificationService-014 — OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:214-228`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:56-84` |
**Description**
`DeliverBufferedAsync` is the registered Store-and-Forward delivery handler for `StoreAndForwardCategory.Notification`. Per the S&F handler contract (`StoreAndForwardService.cs:40-53`), a handler must return `true` for success, `false` for permanent failure (park, no further retries), and **throw only for a transient failure** (retry). `DeliverBufferedAsync`'s `try` block (`:214-228`) catches only `SmtpPermanentException`; the comment at `:227` asserts "Transient SMTP errors propagate out of `DeliverAsync`". But `DeliverAsync` can also throw exceptions that `ClassifySmtpError` buckets as `Unknown` — most importantly the OAuth2 path at `:308-312`, where `OAuth2TokenService.GetTokenAsync` throws `HttpRequestException` (from `EnsureSuccessStatusCode` at `OAuth2TokenService.cs:78`) or `InvalidOperationException` (malformed credential triple, or missing `access_token`). None of these is an `SmtpCommandException`/`SmtpProtocolException`/`SocketException`/`TimeoutException`, so they are neither caught nor classified — they propagate straight out of `DeliverBufferedAsync`. The S&F engine treats *any* thrown exception as a transient failure, so a notification whose SMTP config has an invalid client secret (a 401 → `HttpRequestException`, a genuinely **permanent** condition) is retried on every sweep until `MaxRetries` is reached — burning OAuth2 token-endpoint calls and never parking promptly. A malformed credential string (`InvalidOperationException`, never fixable) is likewise retried instead of parked immediately.
**Recommendation**
Add a catch-all to `DeliverBufferedAsync` for exceptions that `ClassifySmtpError` classifies as `Unknown`: log and return `false` (park) for non-retryable causes such as `InvalidOperationException` from credential parsing, and decide deliberately whether an OAuth2 `HttpRequestException` should park or retry (a 5xx token endpoint is transient; a 401 is permanent — inspect `HttpRequestException.StatusCode`). Do not let an unclassified exception leave the handler and be silently reinterpreted as transient.
**Resolution**
Resolved 2026-05-17. Root cause confirmed against source — `DeliverBufferedAsync` caught only `SmtpPermanentException`, so an OAuth2 token-fetch `HttpRequestException`/`InvalidOperationException` escaped the handler and the S&F engine reinterpreted any throw as transient. Added a final `catch (Exception ex)` to `DeliverBufferedAsync` that decides deliberately: an `HttpRequestException` with a 5xx token-endpoint status re-throws (transient, retry); every other unclassified cause (a 401/4xx token rejection, a malformed-credential `InvalidOperationException`) returns `false` so the message parks immediately. Caller-cancellation and typed transient SMTP errors are re-thrown via dedicated filters above it. Tests `DeliverBuffered_OAuth2MalformedCredentials_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint401_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint503_ThrowsSoEngineRetries`.
### NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96-148`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312` |
**Description**
`SendAsync`'s `try`/`catch` around `DeliverAsync` has exactly three catch clauses: `SmtpPermanentException` (`:101`), `OperationCanceledException` when cancellation was requested (`:111`), and `Exception` filtered by `IsTransientSmtpError` (`:116`). There is no catch-all for an exception that is none of those. `ClassifySmtpError` returns `SmtpErrorClass.Unknown` for anything it does not recognise, and `IsTransientSmtpError` only matches `Transient`, so an `Unknown`-classified exception falls through all three clauses and escapes `SendAsync` unhandled. Concretely, the OAuth2 branch at `:308-312` calls `OAuth2TokenService.GetTokenAsync`, which throws `HttpRequestException` (token endpoint unreachable / 4xx / 5xx) or `InvalidOperationException` (malformed `tenant:client:secret` triple, or no `access_token` in the response). Both reach the calling script (an instance/alarm/shared script) as a raw exception of a type the `INotificationDeliveryService` contract never advertises — the design doc says delivery returns either success, a buffered transient result, or a permanent `NotificationResult` error. `Notify.To().Send()` against an OAuth2 config with a bad secret therefore crashes the script instead of returning a clean `NotificationResult(false, ...)`. This is the same defect class NS-008 resolved for malformed addresses, but NS-008's `ValidateAddresses` fix never covered the OAuth2 token path. (A non-cancellation `OperationCanceledException` — e.g. an internal MailKit timeout surfaced as OCE while the caller's token is *not* cancelled — would escape the same way.)
**Recommendation**
Add a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a permanent `NotificationResult(false, ...)` (credential-scrubbed, consistent with NS-009), and log it. Treating an unclassified failure as permanent is the safe default — it returns control to the script rather than crashing it; an OAuth2 transient (token endpoint 5xx) being reported as permanent is an acceptable trade against an unhandled crash, and can be refined by inspecting `HttpRequestException.StatusCode` if transient buffering of token failures is wanted.
**Resolution**
Resolved 2026-05-17. Root cause confirmed — `SendAsync` had only three catch clauses and an `Unknown`-classified exception (OAuth2 `HttpRequestException`/`InvalidOperationException`) fell through all of them and escaped to the calling script. Added a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a credential-scrubbed `NotificationResult(false, "Notification delivery failed: ...")` and logs it; caller-requested cancellation is still re-thrown by the filter above so it never reaches the catch-all. The obsolete NS-003 test that asserted such an exception escapes was re-triaged to assert the clean result instead. Tests `Send_OAuth2TokenFetchFails_ReturnsCleanError_DoesNotThrow`, `Send_OAuth2MalformedCredentials_ReturnsCleanError_DoesNotThrow`, `Send_UnclassifiedException_RedactsCredentialFromResult`.
### NotificationService-016 — `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:46-67` |
**Description**
`MailKitSmtpClientWrapper.AuthenticateAsync` returns without authenticating in two silent paths. (1) `if (string.IsNullOrEmpty(credentials)) return;` (`:48-49`) — if an SMTP config has `AuthType` "basic" or "oauth2" but a null/empty `Credentials` value (a misconfigured or partially-deployed row), the method returns and `DeliverAsync` proceeds straight to `SendAsync` on an *unauthenticated* connection. (2) The `switch (authType.ToLowerInvariant())` (`:51-66`) has cases only for `"basic"` and `"oauth2"` and **no `default`** — any other `AuthType` value (a typo, a future "ntlm", an empty string) falls through with no authentication attempted. Additionally, in the `"basic"` case, `credentials.Split(':', 2)` that yields fewer than 2 parts (a credential string with no colon) is silently skipped (`:55-58`). In every case the connection then attempts to send mail unauthenticated: against a relay that does not require auth this masks a misconfiguration; against a server that does require auth it fails later with a less obvious error, and at worst an unauthenticated send succeeds where it should not. Authentication being skipped should never be silent.
**Recommendation**
Make missing/unparseable credentials and an unrecognised `AuthType` hard errors: throw a typed exception (e.g. a permanent configuration exception so `SendAsync`/`DeliverBufferedAsync` surface it as a clean failure / park) rather than returning. Add a `default:` arm to the `switch` that throws `ArgumentException` naming the unsupported auth type, and reject a "basic" credential string that does not split into exactly two parts.
**Resolution**
Resolved 2026-05-17. Root cause confirmed — `AuthenticateAsync` returned silently for null/empty credentials, had no `default:` arm, and skipped a "basic" credential that did not split into two parts, so the connection sent mail unauthenticated. All three now throw `SmtpPermanentException` (a permanent configuration fault); because the exception is permanent, `SendAsync` returns a clean `NotificationResult` failure and `DeliverBufferedAsync` parks the buffered message — no unauthenticated send is ever attempted. Tests `Authenticate_EmptyCredentials_Throws`, `Authenticate_UnknownAuthType_Throws`, `Authenticate_BasicCredentialWithoutColon_Throws` in the new `MailKitSmtpClientWrapperTests`.
### NotificationService-017 — `NotificationOptions` is bound from configuration but never read (dead config)
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:1-15`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:10-11`, `src/ScadaLink.Host/SiteServiceRegistration.cs:70` |
**Description**
`NotificationOptions` (with `ConnectionTimeoutSeconds` and `MaxConcurrentConnections`) is bound from the `ScadaLink:Notification` configuration section in two places — `ServiceCollectionExtensions.AddNotificationService` (`AddOptions<NotificationOptions>().BindConfiguration(...)`) and again in `Host/SiteServiceRegistration.cs:70` (`services.Configure<NotificationOptions>(...)`). However, a repo-wide search shows no code ever injects `IOptions<NotificationOptions>` or otherwise reads either property. When NS-007 enforced the connection timeout and concurrency limit, it sourced both values from the per-site `SmtpConfiguration` entity, not from `NotificationOptions` — so this options class is now entirely dead configuration. Its XML doc still claims it "provides fallback defaults and operational limits", which is misleading: nothing falls back to it. The double binding is also redundant. Dead, falsely-documented configuration invites an operator to set `ScadaLink:Notification:ConnectionTimeoutSeconds` and expect it to take effect, when it has no effect at all.
**Recommendation**
Either delete `NotificationOptions` and both of its registrations, or genuinely wire it in as the documented fallback (e.g. `DeliverAsync` uses `NotificationOptions` values when the `SmtpConfiguration` field is non-positive). If kept, remove the duplicate `Configure` call and correct the XML doc. The NS-007 resolution note already states the `NotificationOptions` fields "remain as operational fallback defaults" — that intent should be implemented or the class removed.
**Resolution**
Resolved 2026-05-17. Root cause confirmed — `NotificationOptions` was bound but never read. Implemented the documented-fallback intent rather than deleting it: `NotificationDeliveryService` now takes an optional `IOptions<NotificationOptions>` and uses its `ConnectionTimeoutSeconds`/`MaxConcurrentConnections` whenever the deployed `SmtpConfiguration` field is non-positive (a value on the row still wins). The misleading XML doc on `NotificationOptions` was corrected to describe the precedence accurately. The duplicate `services.Configure<NotificationOptions>` in `Host/SiteServiceRegistration.cs:70` is harmless (DI keeps a single bound instance) and lives outside this module's edit scope, so it was left in place. Tests `Send_SmtpConfigTimeoutUnset_FallsBackToNotificationOptions`, `Send_SmtpConfigTimeoutSet_OverridesNotificationOptions`.
### NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed
| | |
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:237-255` |
**Description**
Three minor issues with the NS-007 concurrency limiter. (1) `GetConcurrencyLimiter` reads the plain (non-`volatile`) field `_concurrencyLimiter` at `:242` *outside* `_limiterLock` as a fast path. The field is only written inside the lock (`:252`), so under the .NET memory model a concurrent thread is not guaranteed to observe the fully-constructed `SemaphoreSlim` via the lock-free read. In practice on x86/x64 this is benign and the double-checked write inside the lock is correct, but the unsynchronized read of a reference set under a lock is the classic lazy-init memory-model pitfall — use `Volatile.Read`/`LazyInitializer`/`Lazy<T>`. (2) The limiter is sized once from the *first* `SmtpConfiguration` ever seen and the `??=` at `:252` means it is never re-created. The design doc says one SMTP config is deployed per site, but that config can be **redeployed** with a different `MaxConcurrentConnections`; the new value is silently ignored for the lifetime of the (scoped, but potentially long-lived per request) service — and more importantly the limit is captured per `NotificationDeliveryService` instance, so behaviour depends on which scoped instance happens to create it first. (3) `SemaphoreSlim` implements `IDisposable`; `_concurrencyLimiter` is never disposed and `NotificationDeliveryService` does not implement `IDisposable`. A scoped service leaking one undisposed `SemaphoreSlim` per scope is a slow handle leak.
**Recommendation**
Replace the hand-rolled double-checked init with `Lazy<SemaphoreSlim>` or `LazyInitializer.EnsureInitialized`, or at minimum make the field access `Volatile.Read`/`Volatile.Write`. Consider hoisting the limiter to a singleton keyed by site (the limit is a per-site invariant, not a per-scope one) so redeployment with a changed limit and disposal are handled in one place; if it stays on the scoped service, implement `IDisposable` to dispose it.
**Resolution**
Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled double-checked init was replaced with a `Lazy<SemaphoreSlim>` — its publication is correctly synchronised, eliminating the lock-free read of a non-`volatile` reference. `NotificationDeliveryService` now implements `IDisposable` and disposes the limiter (if created) under the existing lock, with idempotent re-entry and an `ObjectDisposedException` guard in `SendAsync`/`GetConcurrencyLimiter`; the scoped DI registration disposes it per scope. The limiter remains scoped (not hoisted to a site singleton) — the design doc deploys one SMTP config per site and the per-instance capture is bounded; the redeploy-resize concern is acknowledged as low-impact and not changed here, since hoisting would require a registration change for marginal benefit. Tests `Service_Dispose_DisposesConcurrencyLimiter` plus the existing `Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.
### NotificationService-019 — `NotificationDeliveryService` and `INotificationDeliveryService` are orphaned by the central-only redesign
| | |
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:18-442`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:20-21`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33`, `src/ScadaLink.Host/Program.cs:77` |
**Description**
The updated `Component-NotificationService.md` (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central **Notification Outbox** owns dispatch and delivery, calling an `INotificationDeliveryAdapter` supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites".
The current source does not match. `NotificationDeliveryService` is a site-shaped notification sender: it accepts `(listName, subject, message)`, performs an immediate SMTP `DeliverAsync`, catches transient failures and **buffers them to a `StoreAndForwardCategory.Notification` row**, and exposes `DeliverBufferedAsync` as the matching S&F handler. That is precisely the old site-side flow the design doc says was removed. The doc explicitly notes "there is no … local SQLite copy" of notification lists at sites, yet `DeliverBufferedAsync` re-resolves the list from a repository expected to be reachable on the buffering node.
Who actually calls it?
- **Sites** do **not**. `SiteServiceRegistration.cs:33-38` documents the deliberate omission: "AddNotificationService() is intentionally NOT registered on the site path." Sites register `NotificationForwarder` (in `ScadaLink.StoreAndForward`) as the S&F handler for `StoreAndForwardCategory.Notification` (`AkkaHostedService.cs:654-660`), which Asks the central comms actor and never touches SMTP. `ScriptRuntimeContext.NotifyHelper` (in `SiteRuntime`) enqueues directly to S&F as a serialized `NotificationSubmit`, **not** via `INotificationDeliveryService.SendAsync`.
- **Central** registers it (`Program.cs:77` calls `AddNotificationService`) but no central component resolves it. The central notification dispatcher is `NotificationOutboxActor``INotificationDeliveryAdapter``EmailNotificationDeliveryAdapter`. The adapter is a full re-implementation of the connect/auth/send/disconnect sequence (see `EmailNotificationDeliveryAdapter.cs:163-222`) — it deliberately does not call `NotificationDeliveryService.DeliverAsync` (XML-doc on the adapter says "Reuses the `ScadaLink.NotificationService` SMTP machinery — `ISmtpClientWrapper`, `SmtpTlsModeParser`, `OAuth2TokenService` and the typed `SmtpPermanentException`", i.e. only the leaf primitives).
The `NotificationDeliveryService` class, its `DeliverBufferedAsync`, the `Func<ISmtpClientWrapper>` registration consumed only by it, and the `INotificationDeliveryService` interface (still in Commons) and `NotificationResult` record are therefore dead code that contradicts the design. Worse, every prior finding NS-001..NS-018 was reviewed and resolved against this dead path. The 56-test green test suite (NS-012 resolution note) exercises behaviour no production caller invokes — it gives a false sense of coverage. The misleading XML doc on `NotificationDeliveryService` ("WP-11/12: Notification delivery via SMTP") tells a maintainer this is *the* delivery path; the registration on central does the same.
Risk: an operator following the design doc will look here for "the central email delivery code" and find a parallel implementation that is never called; a future feature change (e.g. retry policy tweak) made here will silently have no effect; the `Notify` script-API end-to-end behaviour now depends on `NotificationOutbox` + `EmailNotificationDeliveryAdapter` + `NotificationForwarder`, none of which are tested in this module's suite.
**Recommendation**
Decide and execute one of:
1. **Delete `NotificationDeliveryService`, `DeliverBufferedAsync`, the `BufferedNotification` payload type, the `Func<ISmtpClientWrapper>` scoped registration (move it to NotificationOutbox if still needed there — it already has its own), and `INotificationDeliveryService`/`NotificationResult` in Commons.** Reduce `AddNotificationService` to registering the shared primitives — `OAuth2TokenService`, `ISmtpClientWrapper` factory, `NotificationOptions`. Delete the NS-001..NS-018 tests that target the orphaned path; rebase the ones that exercise primitives (`SmtpErrorClassifier`, `SmtpTlsModeParser`, `CredentialRedactor`, `EmailAddressValidator`, `MailKitSmtpClientWrapper`, `OAuth2TokenService`) which remain genuinely shared. Update `CompositionRootTests` (`tests/ScadaLink.Host.Tests/CompositionRootTests.cs:208-209`) and `IntegrationSurfaceTests` (`tests/ScadaLink.IntegrationTests/IntegrationSurfaceTests.cs:122-135`) to drop the stale assertions.
2. **Keep the class as the central-only Email delivery primitive** and rewrite `EmailNotificationDeliveryAdapter` to delegate to it. This is the smaller diff but the larger semantic burden — `NotificationDeliveryService.SendAsync` returns `NotificationResult` (Success / WasBuffered) which cannot encode the three-way `DeliveryOutcome` (Success / Transient / Permanent) the outbox needs, so the contract still has to change.
Recommended path is option 1: the parallel implementation in `EmailNotificationDeliveryAdapter` is already complete and matches the new design's `DeliveryOutcome` model; salvaging the old class would re-introduce the very inversion this redesign removed.
### NotificationService-020 — NS-001 fix superseded; `AkkaHostedService` would register two competing `Notification` S&F handlers if both code paths ran
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:654-660`, NS-001 resolution note (this file) |
**Description**
NS-001 was resolved by registering an `S&F → DeliverBufferedAsync` handler for `StoreAndForwardCategory.Notification` at site startup in `AkkaHostedService`. The current source registers a **different** handler for the same category at `AkkaHostedService.cs:654-660``NotificationForwarder.DeliverAsync`, which forwards to central instead of sending SMTP. `StoreAndForwardService.RegisterDeliveryHandler` (verified by reading `StoreAndForward/StoreAndForwardService.cs` around line 109) takes a single handler per category — last-write-wins or first-write-wins, either way the two registrations cannot both be active.
The NS-001 resolution note in this file describes a state of the code that no longer exists: it says the handler "is now registered at site startup in `AkkaHostedService`" and points to a handler resolving `NotificationDeliveryService` via a fresh DI scope. That registration is gone from the current `AkkaHostedService` (only `ExternalSystem`, `CachedDbWrite`, and the `NotificationForwarder`-based `Notification` registration are present at the current location). So the NS-001 fix has been silently rolled back / replaced as part of the central-only redesign.
The risk this finding tracks is not the current state per se — `NotificationForwarder` registration is correct under the new design — but the **stale resolution note** plus the fact that `NotificationDeliveryService.DeliverBufferedAsync` still exists in this module and is still tested as an S&F handler. A future merge or revert that re-introduces the NS-001-style registration (because it is what the test suite shape implies) would conflict with `NotificationForwarder`. The two handlers do diametrically opposite things (forward to central vs. send SMTP locally on a site where there is no SMTP config), so a misregistration would cause a silent regression of the design inversion.
**Recommendation**
Mark the NS-001 resolution note in this file as **superseded by NS-019** with a one-line note explaining that the registration was removed when sites stopped delivering. Delete the orphan `DeliverBufferedAsync` and its tests as part of the NS-019 work. Add a comment on `NotificationForwarder` registration in `AkkaHostedService` cross-referencing NS-019/NS-020 so a maintainer searching for the `Notification` S&F handler finds the one canonical registration.
### NotificationService-021 — OAuth2 SASL constructed with empty user identifier; M365 SMTP will reject the auth handshake
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:76-79` |
**Description**
```csharp
case "oauth2":
// OAuth2 token is passed directly as credentials (pre-fetched by token service)
var oauth2 = new SaslMechanismOAuth2("", credentials);
await _client.AuthenticateAsync(oauth2, cancellationToken);
break;
```
`SaslMechanismOAuth2(string userName, string token)` — MailKit's XOAUTH2 mechanism — sends the SASL initial response as `user=<userName>\x01auth=Bearer <token>\x01\x01`. Microsoft 365 (and most OAuth2-enabled SMTP relays) **require the `userName` field to be the From mailbox identity the token was issued for**; an empty string is rejected with a server response like `535 5.7.3 Authentication unsuccessful` ("Either the user identity does not match the principal in the token, or the user is empty"). Office 365's documentation for SMTP AUTH XOAUTH2 calls this out explicitly.
The token-fetch path supports this: `OAuth2TokenService.GetTokenAsync` issues a Client Credentials grant against `login.microsoftonline.com/{tenantId}/oauth2/v2.0/token` with `scope=https://outlook.office365.com/.default`, which is the Microsoft 365 SMTP send scope — meaning the intended target is M365 SMTP, which is precisely the server that rejects an empty user. The `SmtpConfiguration.FromAddress` field is exactly the user identity that should be passed.
This bug is not caught by tests because every existing test uses a fake `ISmtpClientWrapper` (`Substitute.For<ISmtpClientWrapper>()`, `RecordingAuthClient`, etc.) — `MailKitSmtpClientWrapper.AuthenticateAsync` is never exercised against a real `SaslMechanismOAuth2`. The OAuth2 delivery test (NS-012, `Send_OAuth2Config_AuthenticatesWithResolvedAccessToken`) only asserts the wrapper's `AuthenticateAsync` is invoked with `("oauth2", "<access-token>")`; the wrapper itself is mocked out. The same defect is present in `EmailNotificationDeliveryAdapter` only because it routes through this same `AuthenticateAsync` method.
**Recommendation**
Pass the sender mailbox into the wrapper's `AuthenticateAsync` path. The cleanest fix is to thread `config.FromAddress` (or a dedicated `oauth2UserName` parameter) through `ISmtpClientWrapper.AuthenticateAsync` so the OAuth2 branch can construct `new SaslMechanismOAuth2(config.FromAddress, credentials)`. Add an integration-style test that runs `MailKitSmtpClientWrapper.AuthenticateAsync` against a stub `SmtpClient` and asserts the XOAUTH2 initial-response bytes contain the expected `user=<from>` field, so this regression is caught next time.
### NotificationService-022 — `MailKitSmtpClientWrapper` holds a long-lived `SmtpClient`; combined with per-send factory, the design comment about pooling is contradicted
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:14`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:19` |
**Description**
`MailKitSmtpClientWrapper` declares `private readonly SmtpClient _client = new();` — a single `SmtpClient` is constructed when the wrapper is constructed and lives for the wrapper's lifetime. The DI registration is `services.AddSingleton<Func<ISmtpClientWrapper>>(_ => () => new MailKitSmtpClientWrapper());` (`ServiceCollectionExtensions.cs:19`) — every invocation of the factory creates a **new** wrapper and therefore a **new** `SmtpClient`. `NotificationDeliveryService.DeliverAsync` (the orphan, per NS-019) and `EmailNotificationDeliveryAdapter.SendAsync` both invoke the factory per send and dispose the wrapper at end of send. So in practice there is no connection pooling — every send pays a full TCP+TLS handshake.
This is internally consistent (and matches MailKit guidance — `SmtpClient` is not thread-safe and reusing across deliveries needs careful guarding). However:
1. The XML on the wrapper class says nothing about lifetime; the field-initializer `new SmtpClient()` *implies* a reusable connection. A maintainer might "fix" the factory to reuse a single wrapper (singleton) believing they are enabling pooling, and immediately introduce a concurrency bug: `MailKit.SmtpClient` rejects concurrent send calls and the wrapper carries no synchronization.
2. `ConnectAsync` mutates `_client.Timeout` (`MailKitSmtpClientWrapper.cs:39-42`) every time it runs. If a wrapper is ever reused across deliveries with different `SmtpConfiguration.ConnectionTimeoutSeconds` values, the timeout is silently overwritten — not a current bug, but a latent footgun.
3. The design doc requirement "Max concurrent connections (default 5)" is currently honoured by the NS-007 `SemaphoreSlim` on `NotificationDeliveryService`, but `EmailNotificationDeliveryAdapter` has **no equivalent throttle** — see `EmailNotificationDeliveryAdapter.cs:163-222`, no semaphore. So on central, where the actual delivery now happens, the design-doc concurrency limit is no longer enforced. This is a regression introduced by the redesign — the outbox does not carry NS-007's limiter forward.
**Recommendation**
Document the per-send lifecycle on `MailKitSmtpClientWrapper` (XML on the class: "one wrapper per delivery; the wrapper owns a single `SmtpClient` that is connected/authenticated/sent/disconnected/disposed once"). Either move the NS-007 `SemaphoreSlim` into a shared per-site holder consumed by `EmailNotificationDeliveryAdapter`, or accept the loss and update the design doc. Add `[Obsolete]` or `internal` to discourage re-using a wrapper across sends.
### NotificationService-023 — XML docs on the orphaned classes still describe the removed site-delivery flow; misleading to maintainers
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:12-17`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs:3-12`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8-9` |
**Description**
XML comments still claim the dead path is the live path:
- `NotificationDeliveryService` class summary: "WP-11: Notification delivery via SMTP. WP-12: Error classification and S&F integration. Transient: connection refused, timeout, SMTP 4xx → hand to S&F. Permanent: SMTP 5xx → returned to script." This is the pre-redesign behaviour. The site-S&F branch in particular is dead (see NS-019), and "returned to script" is no longer accurate — `Notify.Send` is async and never returns a permanent error to the script per the design doc.
- `INotificationDeliveryService` (Commons): "Interface for sending notifications. Implemented by NotificationService, consumed by ScriptRuntimeContext." Verified against source: `ScriptRuntimeContext` does **not** consume this interface — it enqueues directly to `StoreAndForwardService` (see `SiteRuntime/Scripts/ScriptRuntimeContext.cs:1770-1774`). The Commons-level claim therefore documents an interaction that no longer exists.
- `NotificationResult` is a record returned only by the orphaned `SendAsync`. The Notification Outbox uses `DeliveryOutcome` instead, which encodes the Success/Transient/Permanent three-way that `NotificationResult(Success, ErrorMessage, WasBuffered)` cannot.
- `ServiceCollectionExtensions.AddNotificationService` XML doc says "Registers the notification delivery services (SMTP, OAuth2 token, delivery adapter)" — no mention that the central-only redesign means most of what it registers is unused.
A reader following the XML docs from any entry point ends up at a path that does not run. The CLAUDE.md "External Integrations" section and `Component-NotificationService.md` describe the new design; the in-source docs contradict them.
**Recommendation**
Tied to NS-019: if the orphan classes are deleted, this finding closes itself. If they are kept temporarily, prepend each summary with "**Obsolete — superseded by NotificationOutbox's `EmailNotificationDeliveryAdapter`. Retained for transitional compatibility; do not add new callers.**" and update `INotificationDeliveryService`'s summary to reflect the inverted flow or remove the interface.
### NotificationService-024 — No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ScadaLink.IntegrationTests/IntegrationSurfaceTests.cs:118-136`, `tests/ScadaLink.Host.Tests/CompositionRootTests.cs:207-209` |
**Description**
The module test suite has 56 tests; counting `NotificationDeliveryServiceTests.cs`, ~40 of them exercise `NotificationDeliveryService.SendAsync`/`DeliverBufferedAsync` — code paths that, per NS-019, no production caller resolves. They pass against the orphaned class and so the suite stays green, but the green is a false signal: changing the dead implementation (or deleting it) does not flag any regression in the live notification-delivery flow, which now lives in `EmailNotificationDeliveryAdapter` (covered by NotificationOutbox's own tests) and `NotificationForwarder` (covered, if at all, by StoreAndForward's tests).
In particular there is **no test in this module** that affirms the central-only invariant the design doc requires:
- No test that `AddNotificationService()` registered on a *site* role would be inert / no-op'd, or that `SiteServiceRegistration.Configure` does **not** call `AddNotificationService` (an obvious regression vector — re-adding it would silently restore the orphaned site-delivery path).
- No test that confirms `INotificationDeliveryService` has no production consumer (i.e. an architecture test that fails if anyone re-introduces a constructor parameter or `GetRequiredService<INotificationDeliveryService>()` call).
- The cross-module `CompositionRootTests` (`tests/ScadaLink.Host.Tests/CompositionRootTests.cs:208-209`) still asserts `NotificationDeliveryService` and `INotificationDeliveryService` are registered, locking in the orphan rather than catching it.
- `IntegrationSurfaceTests.cs:122-125` constructs `NotificationDeliveryService` directly to validate "the integration surface" — testing a surface that no script actually crosses.
**Recommendation**
After NS-019 is decided:
1. If the orphan is deleted, remove the orphaned-path tests (NS-001/004/005/007/008/009/010/014/015/016/017/018-style tests targeting `SendAsync`/`DeliverBufferedAsync`). Retain `SmtpErrorClassifierTests`, `SmtpTlsModeParserTests`, `CredentialRedactorTests`, `OAuth2TokenServiceTests`, and `MailKitSmtpClientWrapperTests` (primitives genuinely shared). Update `CompositionRootTests` to drop the stale rows and `IntegrationSurfaceTests` to call the live path via `INotificationDeliveryAdapter`/`EmailNotificationDeliveryAdapter`.
2. Add a one-shot architecture test in `tests/ScadaLink.Architecture.Tests` (if it exists, else this module) that scans for direct references to `INotificationDeliveryService` outside this project and the obsolete-interface declaration in Commons, failing if any new consumer reappears.
### NotificationService-025 — `CredentialRedactor` over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/CredentialRedactor.cs:34-48` |
**Description**
```csharp
var parts = credentials.Split(':')
.Where(p => p.Length >= 4)
.Append(credentials)
.Distinct()
.OrderByDescending(p => p.Length);
foreach (var part in parts)
{
result = result.Replace(part, Mask, StringComparison.Ordinal);
}
```
The threshold `p.Length >= 4` is permissive enough that common short identifiers used by operators become aggressive global redaction tokens:
- A Basic-Auth credential of `root:hunter2` produces components `["root", "hunter2", "root:hunter2"]`. Every literal `root` anywhere in the exception/log text is masked — including unrelated mentions like file paths (`/root/.config`) or default-account names in the server's reply. This obscures legitimate diagnostic information without protecting any additional secret.
- An OAuth2 tenant id is a GUID (long, safe). The client id is typically a GUID. The client secret is the high-entropy part. The full `tenant:client:secret` is the actual sensitive triple. A tenant GUID embedded in unrelated text (a tenant-bound error code, a partial URL) will be masked even when the appearance is non-sensitive.
- The user name in Basic Auth is sometimes the From address (`scada-notifications@company.com`) — masking *the company's notification mailbox* in every log line that mentions it has real operational cost.
The function also uses `String.Replace` ordinarily, not word-boundary aware — a 4-char prefix that happens to be a substring of a longer benign token gets eaten.
The threshold is a defence-in-depth choice; the existing tests assert that `Hunter2pw!` and `Sup3rSecretValue` are masked (good) and that `null` text/credentials are handled (good), but nothing pins the negative behaviour: e.g. a test that a 4-char user name `root` is **not** also masked when it appears in an unrelated path.
**Recommendation**
Tighten the redaction policy: mask only the obviously-secret components — the password (Basic), the client secret (OAuth2), and the whole packed string — not the user name / tenant / client id. The simplest implementation is to redact only the **last** colon-separated component (the secret) plus the full packed string. Bump the per-component minimum length to something high enough that a typical short user name does not match (≥ 12 chars is the usual heuristic for a password). Add a test asserting `Scrub("/root/.config", "root:hunter2")` does not mask `/root/.config`'s `root`.