7b0b9c7365
Solution + 23 src projects + 26 test projects renamed; folders, csproj, namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated. ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated. SQL roles/logins, LDAP domains, CLI command name, and CLI config dir (~/.scadalink → ~/.scadabridge) also renamed. Build green; 5 Host.Tests fail awaiting SQL login rename in next commit. Pre-existing StaleTagMonitor timing flakes unchanged. Rename script committed at tools/rename-to-scadabridge.sh.
863 lines
83 KiB
Markdown
863 lines
83 KiB
Markdown
# Code Review — NotificationService
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ZB.MOM.WW.ScadaBridge.NotificationService` |
|
|
| Design doc | `docs/requirements/Component-NotificationService.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-28 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `1eb6e97` |
|
|
| Open findings | 0 |
|
|
|
|
## Summary
|
|
|
|
The NotificationService module is small (6 source files) and structurally clean: it
|
|
abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle,
|
|
and integrates with the Store-and-Forward Engine for transient-failure buffering.
|
|
However, the review surfaced several substantive defects. The most serious is that
|
|
**no Store-and-Forward delivery handler is ever registered for the `Notification`
|
|
category** — buffered notifications are persisted but never retried or delivered,
|
|
silently losing every notification that hit a transient SMTP failure. Error
|
|
classification is fragile (substring matching on exception messages) and is
|
|
applied inconsistently between `SendAsync` and `DeliverAsync`. `DeliverAsync` also
|
|
contains a resource-management bug that constructs and leaks two SMTP clients per
|
|
call. Secondary themes: the `OAuth2TokenService` singleton caches a single token
|
|
keyed to no credential identity (incorrect if multiple SMTP configs exist), several
|
|
design-doc requirements are unimplemented (connection timeout, max concurrent
|
|
connections, TLS `SSL`/`None` modes), and credentials are stored and passed as
|
|
plaintext `string` values. Test coverage exercises the happy path and the main
|
|
error branches but misses the OAuth2 delivery path, the permanent-classification
|
|
fallback in `DeliverAsync`, and concurrency on the token cache.
|
|
|
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
|
|
|
Re-reviewed at commit `39d737e` after the NS-001..013 fixes. All twelve prior
|
|
findings remain closed (eleven Resolved, NS-013 Deferred) and the fixes hold up
|
|
against the current source — error classification is now typed, the SMTP client is
|
|
created once and always disconnected/disposed, the S&F `Notification` delivery
|
|
handler is registered in `AkkaHostedService`, and the OAuth2 token cache is keyed
|
|
per credential. The re-review surfaced **five new findings**. The recurring theme is
|
|
**unclassified-exception handling**: the typed `ClassifySmtpError` helper introduced
|
|
by NS-002/003 returns `Unknown` for anything that is not a recognised SMTP/socket
|
|
failure, but neither `SendAsync` nor `DeliverBufferedAsync` has a catch-all for the
|
|
`Unknown` bucket. As a result an OAuth2 token-fetch failure (`HttpRequestException`
|
|
from `EnsureSuccessStatusCode`, or `InvalidOperationException` from a malformed
|
|
credential string) escapes `SendAsync` as a raw exception to the calling script
|
|
(NS-015) and escapes `DeliverBufferedAsync` out of the S&F handler, where the engine
|
|
treats any thrown exception as transient and retries a permanently-broken config
|
|
forever (NS-014) — the same defect class NS-008 fixed for address parsing, but the
|
|
OAuth2 path was never covered. Secondary findings: `AuthenticateAsync` silently
|
|
proceeds unauthenticated for an unknown `authType` or empty credentials (NS-016);
|
|
`NotificationOptions` is bound from configuration in two places but never read by
|
|
any code (NS-017, dead config — NS-007 sourced the timeout/limit from
|
|
`SmtpConfiguration` instead); and the lazily-created concurrency limiter is read
|
|
outside its lock, is sized once and never resized on redeployment, and is never
|
|
disposed (NS-018).
|
|
|
|
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
|
|
|
Re-reviewed at commit `1eb6e97` against the **materially-changed design**: per the
|
|
updated `Component-NotificationService.md` and CLAUDE.md, the Notification Service
|
|
is now **central-only**. Sites no longer deliver notifications over SMTP — a
|
|
script's `Notify.Send` enqueues to the site Store-and-Forward Engine and
|
|
`NotificationForwarder.DeliverAsync` (S&F handler in StoreAndForward) forwards
|
|
the payload to the central Notification Outbox, which dispatches via the
|
|
`INotificationDeliveryAdapter` registered for the list's `Type`. Email delivery
|
|
on central is performed by `EmailNotificationDeliveryAdapter` in the
|
|
NotificationOutbox project — it reuses this module's SMTP machinery
|
|
(`ISmtpClientWrapper`, `OAuth2TokenService`, `SmtpErrorClassifier`,
|
|
`SmtpTlsModeParser`, `EmailAddressValidator`, `CredentialRedactor`,
|
|
`SmtpPermanentException`, `NotificationOptions`) but is the actual production
|
|
caller. The intended residual responsibility of this module is to **supply that
|
|
shared SMTP machinery** plus list/SMTP-config definition management on central.
|
|
|
|
The re-review surfaced **seven new findings**. The dominant theme is **dead
|
|
code that contradicts the design doc**: `NotificationDeliveryService`, the
|
|
`INotificationDeliveryService` interface in Commons, the `NotificationResult`
|
|
record, the entire `DeliverBufferedAsync` S&F handler, and the prior NS-001…
|
|
NS-018 test fixtures that exercise them are now orphaned — no production code
|
|
path resolves `INotificationDeliveryService` on a site (sites no longer register
|
|
this module per `SiteServiceRegistration.cs:33-38`) and on central the
|
|
NotificationOutbox uses its own `EmailNotificationDeliveryAdapter` (which
|
|
duplicates the connect/auth/send/disconnect sequence rather than delegating to
|
|
`NotificationDeliveryService`). The class is still registered by
|
|
`AddNotificationService` on central (`Program.cs:77`) but no consumer resolves
|
|
it (NS-019). The `S&F handler must be registered` workaround that NS-001 added
|
|
to `AkkaHostedService` is itself superseded by the `NotificationForwarder`
|
|
registered for the same category at `AkkaHostedService.cs:654-660` (NS-020).
|
|
Secondary findings: a real-world correctness gap (the OAuth2
|
|
`SaslMechanismOAuth2` is constructed with an **empty user id** so server-side
|
|
account binding fails for any provider that requires it — NS-021); the SMTP
|
|
client wrapper holds a single `MailKit.SmtpClient` for the lifetime of the
|
|
wrapper but the factory delegate creates a new wrapper per send, so successive
|
|
sends through the same factory share NO connection but DO share a wrapper that
|
|
mutates `_client.Timeout` on every connect (benign because every wrapper has its
|
|
own client, but the design comment about pooling is now contradicted — NS-022);
|
|
the design-doc retention/maintenance language has no implementation in this
|
|
module and there is no test affirming the module is central-only (NS-023, NS-024);
|
|
and `CredentialRedactor` masks any component of the credential string that is
|
|
≥ 4 characters long — a 4-character user name like `root` or a 4-char tenant
|
|
prefix could be aggressively scrubbed out of unrelated log text (NS-025).
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ☑ | Re-review: OAuth2 SASL constructed with empty user id (NS-021); `CredentialRedactor` over-masks short components (NS-025). Earlier NS-005/NS-008 fixes hold. |
|
|
| 2 | Akka.NET conventions | ☑ | No actors in this module. `AddNotificationServiceActors` remains a documented no-op. |
|
|
| 3 | Concurrency & thread safety | ☑ | `OAuth2TokenService` per-credential locks now correct (NS-006 hold). No new issues. |
|
|
| 4 | Error handling & resilience | ☑ | NS-014/NS-015 classification fixes hold but the entire `DeliverBufferedAsync` / `SendAsync` error path is dead (NS-019/NS-020). |
|
|
| 5 | Security | ☑ | OAuth2 `SaslMechanismOAuth2` empty user id (NS-021); `CredentialRedactor` aggressiveness (NS-025); at-rest encryption still deferred (NS-013). |
|
|
| 6 | Performance & resource management | ☑ | `MailKitSmtpClientWrapper` keeps a single `SmtpClient` for the wrapper lifetime; combined with per-send factory this means no pooling — re-document or fix (NS-022). |
|
|
| 7 | Design-document adherence | ☑ | Critical drift: module still exposes site-style S&F sending; the design doc inverted delivery to central months ago (NS-019). Site registration removed but central still wires the dead service. |
|
|
| 8 | Code organization & conventions | ☑ | `INotificationDeliveryService` lives in Commons and is now unused — should be retired or relocated to a NotificationService-internal namespace (NS-019). Module-vs-NotificationOutbox boundary unclear. |
|
|
| 9 | Testing coverage | ☑ | 56 tests pass but ~40 of them assert behaviour of a code path no production caller exercises (NS-024). No test affirms the central-only design — i.e. that `AddNotificationService` registers no notification-sending service on a site. |
|
|
| 10 | Documentation & comments | ☑ | `NotificationDeliveryService` XML doc still claims "WP-11/12: Notification delivery via SMTP" with no warning that the class is orphaned; `INotificationDeliveryService` Commons doc claims "Implemented by NotificationService, consumed by ScriptRuntimeContext" — both consumers are wrong now (NS-023). |
|
|
|
|
## Findings
|
|
|
|
### NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler)
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Critical |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:96`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:8` |
|
|
|
|
**Description**
|
|
|
|
On a transient SMTP failure the service calls `_storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...)`. The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via `StoreAndForwardService.RegisterDeliveryHandler`. A repo-wide search shows the `Notification` category handler is never registered anywhere — `StoreAndForwardCategory.Notification` appears only in this module's `EnqueueAsync` call. As a result, every buffered notification falls into the `RetryMessageAsync` "No delivery handler for category" branch (`StoreAndForwardService.cs:201-204`), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while `SendAsync` returns `Success=true, WasBuffered=true`, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee.
|
|
|
|
**Recommendation**
|
|
|
|
Register a delivery handler for `StoreAndForwardCategory.Notification` during startup that deserializes the buffered payload (`ListName`, `Subject`, `Message`), re-resolves the list/recipients/SMTP config, and re-attempts `DeliverAsync`, returning `true` on success, `false` on permanent failure, and throwing on transient failure. Wire it in `AddNotificationService` or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16. A delivery handler for `StoreAndForwardCategory.Notification` is now
|
|
registered at site startup in `AkkaHostedService`. The handler resolves
|
|
`NotificationDeliveryService` in a fresh DI scope and calls the new `DeliverBufferedAsync`,
|
|
which re-resolves the list, recipients and SMTP config and re-attempts delivery —
|
|
returning `true` on success, `false` (park) on permanent failure or missing
|
|
configuration, and throwing on transient failure so the engine retries. `SendAsync` now
|
|
buffers with `attemptImmediateDelivery: false` so registering the handler does not send
|
|
the notification twice. Regression tests cover the happy path and the list-removed park
|
|
path. Fixed by the commit whose message references `NotificationService-001`.
|
|
|
|
### NotificationService-002 — `TimeoutException`/`OperationCanceledException` misclassified as transient
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:157-167` |
|
|
|
|
**Description**
|
|
|
|
`IsTransientSmtpError` treats `OperationCanceledException` (and its subtype `TaskCanceledException`) as a transient SMTP error. When the caller passes a `CancellationToken` that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting `OperationCanceledException` is caught by the `catch ... when (IsTransientSmtpError(ex))` clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any `IOException` as transient even though `IOException` covers unrelated failures (e.g. a serialization stream error). Additionally, `OperationCanceledException` raised by token cancellation in the OAuth2 path would be miscategorised the same way.
|
|
|
|
**Recommendation**
|
|
|
|
Re-throw `OperationCanceledException`/`TaskCanceledException` when `cancellationToken.IsCancellationRequested` is true rather than classifying it as transient. Narrow `IOException` handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (`SmtpCommandException`, `SmtpProtocolException`, `ServiceNotConnectedException`) instead of broad base types.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). Classification was rewritten around a typed
|
|
`ClassifySmtpError` helper: a caller-requested cancellation (`OperationCanceledException`/
|
|
`TaskCanceledException` while `cancellationToken.IsCancellationRequested`) now propagates
|
|
out of both `SendAsync` and `DeliverAsync` via dedicated `catch` filters instead of being
|
|
buffered. The broad `IOException` catch-all was dropped — only MailKit's typed exceptions
|
|
plus `SocketException`/`TimeoutException` are treated as transient. Regression tests
|
|
`Send_CancellationRequested_PropagatesAndDoesNotBuffer` and
|
|
`Send_TaskCanceledException_WithCancellation_Propagates`.
|
|
|
|
### NotificationService-003 — Error classification by substring matching on exception messages is fragile
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:144-147`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:163-166` |
|
|
|
|
**Description**
|
|
|
|
Transient/permanent classification depends on `ex.Message.Contains("5.")`, `Contains("4.")`, `Contains("550")`, `Contains("421")`, etc. This is unreliable: (a) `Message.Contains("5.")` matches any message containing the literal "5." anywhere — e.g. a host name `smtp5.example.com`, a version string, or a path — producing false permanent classification; (b) `Contains("4.")` likewise matches `"v4.0"` or an IP address octet; (c) MailKit exposes the actual SMTP status code on `SmtpCommandException.StatusCode`, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification.
|
|
|
|
**Recommendation**
|
|
|
|
Classify on MailKit's typed exceptions and `SmtpCommandException.StatusCode` (4xx → transient, 5xx → permanent), and `SocketException`/`SmtpProtocolException`/connection-refused → transient. Remove all `Message.Contains` checks.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). All `ex.Message.Contains(...)` checks were
|
|
removed. The new `ClassifySmtpError` helper inspects `SmtpCommandException.StatusCode`
|
|
(numeric SMTP code: 4xx → transient, 5xx → permanent) and treats `SmtpProtocolException`,
|
|
`ServiceNotConnectedException`, `SocketException` and `TimeoutException` as transient;
|
|
anything else is `Unknown` and propagates unclassified rather than being guessed. The
|
|
permanent-promotion `catch` block in `DeliverAsync` now keys off this classification.
|
|
Regression tests `Send_Smtp5xxCommandException_ClassifiedPermanent`,
|
|
`Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`,
|
|
`Send_SmtpProtocolException_ClassifiedTransient`, and
|
|
`Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent`.
|
|
|
|
### NotificationService-004 — `DeliverAsync` constructs two SMTP clients and leaks the used one
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Performance & resource management |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:118-119` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
using var client = _smtpClientFactory() as IDisposable;
|
|
var smtp = _smtpClientFactory();
|
|
```
|
|
|
|
The factory is invoked twice, creating two separate `MailKitSmtpClientWrapper` instances (each owning a real `SmtpClient` with a socket). The first instance is assigned to `client` and disposed by the `using`, but it is never used. The second instance, `smtp`, is the one actually connected, authenticated, used to send, and `DisconnectAsync`'d — but it is never `Dispose`d. `MailKitSmtpClientWrapper` implements `IDisposable` and wraps an unmanaged socket; the connected client is leaked on every send. `DisconnectAsync` closes the connection but does not dispose the `SmtpClient`. Over time this leaks sockets/handles.
|
|
|
|
**Recommendation**
|
|
|
|
Create exactly one client and dispose the one that is actually used:
|
|
`using var smtp = _smtpClientFactory();` then cast to `IDisposable` only if needed (the factory's `Func<ISmtpClientWrapper>` should ideally return a type that the `using` can dispose directly — consider having `ISmtpClientWrapper` extend `IAsyncDisposable`/`IDisposable`).
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). `DeliverAsync` now invokes `_smtpClientFactory()`
|
|
exactly once and disposes the client actually used via `using var disposable = smtp as
|
|
IDisposable;`. The previous code created two `MailKitSmtpClientWrapper` instances per send
|
|
and disposed the unused one while leaking the connected one. Regression test
|
|
`Send_CreatesExactlyOneSmtpClient_AndDisposesIt` verifies the factory is invoked once and
|
|
the resulting client is disposed.
|
|
|
|
### NotificationService-005 — Non-TLS path uses `SecureSocketOptions.Auto`, contradicting the requested mode
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:18`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:123` |
|
|
|
|
**Description**
|
|
|
|
`ConnectAsync` maps `useTls` to either `SecureSocketOptions.StartTls` or `SecureSocketOptions.Auto`. `useTls` is computed in `DeliverAsync` as `TlsMode == "starttls"`. So a configuration of `TlsMode = "none"` produces `useTls = false` → `SecureSocketOptions.Auto`, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — `None`, `StartTLS`, `SSL` — but the code collapses them to a single boolean, so `SSL` (implicit TLS, typically port 465) is treated identically to `None`/`Auto` and the SSL mode is effectively unsupported. The `bool useTls` parameter cannot represent the three-state requirement.
|
|
|
|
**Recommendation**
|
|
|
|
Pass the `TlsMode` string (or a `TlsMode` enum) through to the wrapper and map explicitly: `None` → `SecureSocketOptions.None`, `StartTLS` → `SecureSocketOptions.StartTls`, `SSL` → `SecureSocketOptions.SslOnConnect`. Validate the configured value and reject unknown modes.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
|
|
`bool useTls` parameter cannot represent three states, and the non-StartTLS branch used
|
|
`SecureSocketOptions.Auto`. A new `SmtpTlsMode` enum (`None`/`StartTls`/`Ssl`) and
|
|
`SmtpTlsModeParser` were added; `ISmtpClientWrapper.ConnectAsync` now takes `SmtpTlsMode`
|
|
and `MailKitSmtpClientWrapper` maps it explicitly to `SecureSocketOptions.None`/
|
|
`StartTls`/`SslOnConnect`. `SendAsync`/`DeliverBufferedAsync` validate the configured
|
|
`TlsMode` up front — an unknown value returns a clean `NotificationResult` failure (or
|
|
parks a buffered message) instead of silently negotiating TLS. Regression tests:
|
|
`Send_TlsModeNone_DoesNotNegotiateTls`, `Send_TlsModeSsl_UsesImplicitSsl`,
|
|
`Send_UnknownTlsMode_ReturnsErrorNotSilentFallback`, and the `SmtpTlsModeParserTests` set.
|
|
|
|
### NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:14-15`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:30-35` |
|
|
|
|
**Description**
|
|
|
|
`OAuth2TokenService` is registered as a singleton and stores a single `_cachedToken`/`_tokenExpiry` pair. `GetTokenAsync` ignores the `credentials` argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's `GetAllSmtpConfigurationsAsync` returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order.
|
|
|
|
**Recommendation**
|
|
|
|
Key the cache by the credential identity (e.g. a dictionary keyed by `tenantId:clientId`, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause confirmed: the singleton held a single
|
|
`_cachedToken`/`_tokenExpiry` pair and `GetTokenAsync` ignored the `credentials` argument
|
|
when validating the cache, so a second SMTP config got the first config's token.
|
|
`OAuth2TokenService` now stores a `ConcurrentDictionary<string, CacheEntry>` keyed by the
|
|
SHA-256 hash of the credential string; each distinct tenant/client/secret gets its own
|
|
cached token, expiry, and per-credential `SemaphoreSlim` (double-checked locking
|
|
preserved). Regression tests: `GetTokenAsync_DifferentCredentials_ReturnPerCredentialTokens`
|
|
and `GetTokenAsync_SameCredentials_CachedPerCredential`.
|
|
|
|
### NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Design-document adherence |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationOptions.cs:11-14`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:16-20`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:111-140` |
|
|
|
|
**Description**
|
|
|
|
The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and `NotificationOptions`/`SmtpConfiguration` both carry these fields. Neither is enforced: `MailKitSmtpClientWrapper.ConnectAsync` never sets `SmtpClient.Timeout`, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's `CancellationToken` bounds it, and callers may pass `default`). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so `MaxConcurrentConnections` has no effect. Both options exist but are dead configuration.
|
|
|
|
**Recommendation**
|
|
|
|
Set `SmtpClient.Timeout` from `ConnectionTimeoutSeconds` in `ConnectAsync` (and/or derive a linked `CancellationTokenSource`). Introduce a `SemaphoreSlim(MaxConcurrentConnections)` gating `DeliverAsync`. If these limits are intentionally deferred, mark the options `[Obsolete]`/document them as not-yet-enforced and note the gap in the design doc.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause confirmed: `ConnectAsync` never set
|
|
`SmtpClient.Timeout` and no throttle gated `DeliverAsync`. `ISmtpClientWrapper.ConnectAsync`
|
|
now takes a `connectionTimeoutSeconds` argument; `MailKitSmtpClientWrapper` sets
|
|
`SmtpClient.Timeout` from `SmtpConfiguration.ConnectionTimeoutSeconds`. `DeliverAsync`
|
|
acquires a lazily-created `SemaphoreSlim` sized to `SmtpConfiguration.MaxConcurrentConnections`
|
|
(default 5 when non-positive) and releases it in a `finally`, so concurrent SMTP
|
|
deliveries per site are bounded. The timeout is sourced from the deployed
|
|
`SmtpConfiguration` rather than `NotificationOptions`; the `NotificationOptions` fields
|
|
remain as operational fallback defaults. Regression tests:
|
|
`Send_PassesConfiguredConnectionTimeoutToClient` and
|
|
`Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.
|
|
|
|
### NotificationService-008 — Recipient email addresses are not validated before send
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:136-137`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:50-53` |
|
|
|
|
**Description**
|
|
|
|
`SendAsync` builds `bccAddresses` directly from `recipient.EmailAddress` and passes them to `MailboxAddress.Parse`. If any recipient row has a malformed address, `MailboxAddress.Parse` throws `ParseException`. `ParseException` is not a `TimeoutException`/`SocketException`/`IOException` and its message will not generally contain "4." or "5.", so it falls through `DeliverAsync`'s outer `catch ... when (... && !IsTransientSmtpError(ex))` filter, which re-throws it (`:153`); it then escapes `SendAsync` entirely as an unhandled exception (the `SendAsync` catch blocks only cover `SmtpPermanentException` and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean `NotificationResult` error. The same applies to a malformed `FromAddress`.
|
|
|
|
**Recommendation**
|
|
|
|
Validate addresses up front (e.g. `MailboxAddress.TryParse`) and return a `NotificationResult(false, ...)` listing invalid recipients, or wrap `DeliverAsync` so any non-classified exception becomes a permanent `NotificationResult` failure rather than escaping. Consider validating addresses at definition time in the Central UI as well.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause confirmed: `MailboxAddress.Parse` of a
|
|
malformed `FromAddress`/recipient threw `ParseException`, which is unclassified and
|
|
escaped `SendAsync` as an unhandled exception. A new `ValidateAddresses` helper uses
|
|
`MailboxAddress.TryParse` for the sender and every recipient; `SendAsync` now returns a
|
|
clean `NotificationResult(false, ...)` listing the invalid address(es) before any SMTP
|
|
attempt, and `DeliverBufferedAsync` parks a buffered message with a bad address (a fault
|
|
retrying cannot fix). Regression tests:
|
|
`Send_MalformedRecipientAddress_ReturnsCleanError_DoesNotThrow` and
|
|
`Send_MalformedFromAddress_ReturnsCleanError_DoesNotThrow`. Definition-time validation in
|
|
the Central UI is a separate component and out of this module's scope.
|
|
|
|
### NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium — re-triaged: split into an in-scope log-leak fix (resolved) and a Commons-scoped at-rest-encryption / structured-credential follow-up (NotificationService-013, Deferred). |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:127-134`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:30-65`, `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:9` |
|
|
|
|
**Description**
|
|
|
|
SMTP credentials — Basic Auth `user:pass` and OAuth2 `tenantId:clientId:clientSecret` — are stored and passed as a single colon-delimited plaintext `string` (`SmtpConfiguration.Credentials`). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a `:` will be split incorrectly (`Split(':', 2)` / `Split(':', 3)`), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (`_logger.LogWarning(ex, ...)`, `LogError(ex, ...)`) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing.
|
|
|
|
**Recommendation**
|
|
|
|
Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing `:` are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause re-triaged against source: the finding
|
|
conflates two concerns with different ownership.
|
|
|
|
1. **Log-leak risk (in scope — fixed).** The original code logged whole exception objects
|
|
(`_logger.LogWarning(ex, ...)` / `LogError(ex, ...)`); MailKit auth exceptions can echo
|
|
server responses quoting the supplied credentials. A new internal `CredentialRedactor`
|
|
masks every colon-delimited credential component out of any text. `SendAsync` and
|
|
`DeliverBufferedAsync` now log a scrubbed message string (not the raw exception) and the
|
|
permanent-failure `NotificationResult` is scrubbed before it returns to the caller.
|
|
`OAuth2TokenService` logs the tenant id only — never the client secret or access token.
|
|
Regression tests: `CredentialRedactorTests` and
|
|
`Send_PermanentError_RedactsCredentialFromResultMessage`.
|
|
2. **At-rest encryption + structured-credential modelling (out of scope — Deferred).**
|
|
Encrypting `SmtpConfiguration.Credentials` at rest and replacing the brittle
|
|
colon-packed `string` with structured fields requires editing
|
|
`src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs` and the
|
|
ConfigurationDatabase EF layer — both outside this module. Tracked separately as
|
|
**NotificationService-013** (Deferred) so it is not lost.
|
|
|
|
### NotificationService-013 — Encrypt SMTP credentials at rest; replace colon-packed string with structured fields
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Deferred |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:9`, ConfigurationDatabase EF mapping |
|
|
|
|
**Description**
|
|
|
|
Split out of NotificationService-009. `SmtpConfiguration.Credentials` packs Basic Auth
|
|
`user:pass` and OAuth2 `tenantId:clientId:clientSecret` into a single plaintext
|
|
colon-delimited `string`: (a) there is no encryption at rest in SQLite or the central
|
|
config DB; (b) a password or client secret containing a `:` is split incorrectly by
|
|
`Split(':', 2)` / `Split(':', 3)`, silently corrupting the secret.
|
|
|
|
**Recommendation**
|
|
|
|
Model credentials as structured fields (or an encrypted blob) on the Commons entity and
|
|
encrypt at rest via Data Protection / a secret store. The colon-delimited parsing in
|
|
`MailKitSmtpClientWrapper` and `OAuth2TokenService` would then consume the structured
|
|
fields directly.
|
|
|
|
**Resolution**
|
|
|
|
Deferred — requires changes to `src/ZB.MOM.WW.ScadaBridge.Commons` and the ConfigurationDatabase
|
|
component, which are outside the NotificationService module. To be addressed in a
|
|
Commons/ConfigurationDatabase-scoped change. The associated log-leak risk is resolved
|
|
under NotificationService-009.
|
|
|
|
### NotificationService-010 — `DeliverAsync` does not disconnect the SMTP client on failure
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:121-154` |
|
|
|
|
**Description**
|
|
|
|
`DisconnectAsync` is only called at `:139`, on the success path inside the `try` block. If `AuthenticateAsync` or `SendAsync` throws, control jumps to the `catch` filter at `:141` and the method exits (re-throwing or wrapping) without ever calling `DisconnectAsync`. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots.
|
|
|
|
**Recommendation**
|
|
|
|
Move disconnect/dispose into a `finally` block (or use `await using` once `ISmtpClientWrapper` supports `IAsyncDisposable`) so the connection is always torn down regardless of outcome.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Root cause confirmed against the current source:
|
|
`DeliverAsync` invoked `smtp.DisconnectAsync` only on the success path inside the `try`
|
|
block, so a failed `Connect`/`Authenticate`/`Send` left the connection open until
|
|
finalization. `DisconnectAsync` was moved into the existing `finally` block (which also
|
|
releases the NS-007 concurrency slot), so the SMTP connection is always torn down
|
|
regardless of outcome. The disconnect is best-effort — wrapped in its own `try/catch` that
|
|
logs at debug level — so a disconnect failure (e.g. the connection is already dead) cannot
|
|
mask the original delivery exception. Regression tests
|
|
`Send_TransientFailureDuringSend_StillDisconnectsClient` and
|
|
`Send_FailureDuringAuthenticate_StillDisconnectsClient`.
|
|
|
|
### NotificationService-011 — `SmtpPermanentException` declared in the wrong file; module conventions
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:173-177`, `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15` |
|
|
|
|
**Description**
|
|
|
|
Two minor convention issues. (1) `SmtpPermanentException` is a public exception type declared at the bottom of `NotificationDeliveryService.cs` rather than in its own file (`SmtpPermanentException.cs`), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) `SmtpConfiguration` (a Commons POCO) declares non-nullable `string` properties (`Host`, `AuthType`, `FromAddress`) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences `config.Host`, `config.AuthType`, `config.FromAddress` without null checks.
|
|
|
|
**Recommendation**
|
|
|
|
Move `SmtpPermanentException` to its own file. For `SmtpConfiguration`, either keep the constructor as the only path and document it, or use `required` members so the compiler enforces initialization.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Both issues confirmed against source.
|
|
|
|
1. **`SmtpPermanentException` placement (in scope — fixed).** The public exception type
|
|
was extracted from the bottom of `NotificationDeliveryService.cs` into its own file,
|
|
`src/ZB.MOM.WW.ScadaBridge.NotificationService/SmtpPermanentException.cs`, restoring the
|
|
one-type-per-file layout. No behaviour change, so no dedicated regression test — the
|
|
move is verified by the module test suite still compiling and passing (56 tests green).
|
|
2. **`SmtpConfiguration` non-nullable strings (out of scope — re-triaged).**
|
|
`SmtpConfiguration` lives in `src/ZB.MOM.WW.ScadaBridge.Commons`, which is outside the
|
|
NotificationService module's edit scope; it cannot be changed here. This is the same
|
|
Commons entity already flagged for follow-up under **NotificationService-013**
|
|
(Deferred). The `required`-members / documented-constructor change should be folded into
|
|
that Commons/ConfigurationDatabase-scoped work. Note the delivery service's actual risk
|
|
is bounded: `SendAsync`/`DeliverBufferedAsync` already validate `TlsMode` and addresses
|
|
up front (NS-005/NS-008), and a null `Host`/`AuthType` from a malformed config row
|
|
surfaces as a classified/clean failure rather than silent corruption.
|
|
|
|
### NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Status | Resolved |
|
|
| Location | `tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/OAuth2TokenServiceTests.cs` |
|
|
|
|
**Description**
|
|
|
|
The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in `DeliverAsync:128-132` — every test uses `tokenService: null` and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) `DeliverAsync`'s permanent-classification fallback (`:144-149`) that promotes a generic exception whose message contains "550"/"553"/"554" to `SmtpPermanentException` is never tested; (3) `OAuth2TokenServiceTests` never tests concurrent `GetTokenAsync` calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant.
|
|
|
|
**Recommendation**
|
|
|
|
Add tests for: OAuth2-authenticated send with a mocked `OAuth2TokenService`; the `DeliverAsync` 5xx-message permanent fallback; token expiry/refresh (short `expires_in`); concurrent token acquisition; and an end-to-end buffered-notification retry once a `Notification` S&F handler is registered.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit pending). Each listed gap was re-triaged against the current
|
|
source (the module was substantially rewritten by NS-001..009 since this finding was
|
|
filed) and the genuinely-missing tests were added:
|
|
|
|
1. **OAuth2 delivery path (gap real — covered).** New test
|
|
`Send_OAuth2Config_AuthenticatesWithResolvedAccessToken` drives `DeliverAsync` with an
|
|
`oauth2` `SmtpConfiguration` and a real `OAuth2TokenService` (mocked `IHttpClientFactory`),
|
|
asserting the SMTP client is authenticated with the *resolved access token*, not the raw
|
|
`tenant:client:secret` triple.
|
|
2. **5xx-message permanent fallback (gap obsolete — re-triaged).** The substring-based
|
|
permanent fallback this item describes no longer exists: NS-003 replaced message-text
|
|
matching with typed-exception/`SmtpStatusCode` classification. The equivalent path is
|
|
already covered by `Send_Smtp5xxCommandException_ClassifiedPermanent`, and
|
|
`Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent` proves a "5xx
|
|
lookalike" message is no longer promoted. No new test needed for a removed branch.
|
|
3. **Token expiry/refresh and concurrent acquisition (gaps real — covered).** New tests
|
|
`GetTokenAsync_ExpiredToken_RefreshesOnNextCall` (uses `expires_in: 60` so the token is
|
|
stale immediately given the 60s refresh margin) and
|
|
`GetTokenAsync_ConcurrentCalls_MakeExactlyOneHttpRequest` (20 racing callers collapse to
|
|
a single token fetch, exercising the per-credential double-checked lock).
|
|
4. **End-to-end buffer-and-retry (gap already closed — re-triaged).** The
|
|
buffer-then-retry-then-deliver round trip was added when NS-001 was resolved
|
|
(`DeliverBuffered_HappyPath_ReturnsTrue`, `DeliverBuffered_ListNoLongerExists_…`, plus
|
|
`AkkaHostedService` handler registration), so no further test is required here.
|
|
|
|
Module test suite is green at 56 tests.
|
|
|
|
### NotificationService-014 — OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:214-228`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:308-312`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:56-84` |
|
|
|
|
**Description**
|
|
|
|
`DeliverBufferedAsync` is the registered Store-and-Forward delivery handler for `StoreAndForwardCategory.Notification`. Per the S&F handler contract (`StoreAndForwardService.cs:40-53`), a handler must return `true` for success, `false` for permanent failure (park, no further retries), and **throw only for a transient failure** (retry). `DeliverBufferedAsync`'s `try` block (`:214-228`) catches only `SmtpPermanentException`; the comment at `:227` asserts "Transient SMTP errors propagate out of `DeliverAsync`". But `DeliverAsync` can also throw exceptions that `ClassifySmtpError` buckets as `Unknown` — most importantly the OAuth2 path at `:308-312`, where `OAuth2TokenService.GetTokenAsync` throws `HttpRequestException` (from `EnsureSuccessStatusCode` at `OAuth2TokenService.cs:78`) or `InvalidOperationException` (malformed credential triple, or missing `access_token`). None of these is an `SmtpCommandException`/`SmtpProtocolException`/`SocketException`/`TimeoutException`, so they are neither caught nor classified — they propagate straight out of `DeliverBufferedAsync`. The S&F engine treats *any* thrown exception as a transient failure, so a notification whose SMTP config has an invalid client secret (a 401 → `HttpRequestException`, a genuinely **permanent** condition) is retried on every sweep until `MaxRetries` is reached — burning OAuth2 token-endpoint calls and never parking promptly. A malformed credential string (`InvalidOperationException`, never fixable) is likewise retried instead of parked immediately.
|
|
|
|
**Recommendation**
|
|
|
|
Add a catch-all to `DeliverBufferedAsync` for exceptions that `ClassifySmtpError` classifies as `Unknown`: log and return `false` (park) for non-retryable causes such as `InvalidOperationException` from credential parsing, and decide deliberately whether an OAuth2 `HttpRequestException` should park or retry (a 5xx token endpoint is transient; a 401 is permanent — inspect `HttpRequestException.StatusCode`). Do not let an unclassified exception leave the handler and be silently reinterpreted as transient.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17. Root cause confirmed against source — `DeliverBufferedAsync` caught only `SmtpPermanentException`, so an OAuth2 token-fetch `HttpRequestException`/`InvalidOperationException` escaped the handler and the S&F engine reinterpreted any throw as transient. Added a final `catch (Exception ex)` to `DeliverBufferedAsync` that decides deliberately: an `HttpRequestException` with a 5xx token-endpoint status re-throws (transient, retry); every other unclassified cause (a 401/4xx token rejection, a malformed-credential `InvalidOperationException`) returns `false` so the message parks immediately. Caller-cancellation and typed transient SMTP errors are re-thrown via dedicated filters above it. Tests `DeliverBuffered_OAuth2MalformedCredentials_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint401_ReturnsFalseSoMessageParks`, `DeliverBuffered_OAuth2TokenEndpoint503_ThrowsSoEngineRetries`.
|
|
|
|
### NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:96-148`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:308-312` |
|
|
|
|
**Description**
|
|
|
|
`SendAsync`'s `try`/`catch` around `DeliverAsync` has exactly three catch clauses: `SmtpPermanentException` (`:101`), `OperationCanceledException` when cancellation was requested (`:111`), and `Exception` filtered by `IsTransientSmtpError` (`:116`). There is no catch-all for an exception that is none of those. `ClassifySmtpError` returns `SmtpErrorClass.Unknown` for anything it does not recognise, and `IsTransientSmtpError` only matches `Transient`, so an `Unknown`-classified exception falls through all three clauses and escapes `SendAsync` unhandled. Concretely, the OAuth2 branch at `:308-312` calls `OAuth2TokenService.GetTokenAsync`, which throws `HttpRequestException` (token endpoint unreachable / 4xx / 5xx) or `InvalidOperationException` (malformed `tenant:client:secret` triple, or no `access_token` in the response). Both reach the calling script (an instance/alarm/shared script) as a raw exception of a type the `INotificationDeliveryService` contract never advertises — the design doc says delivery returns either success, a buffered transient result, or a permanent `NotificationResult` error. `Notify.To().Send()` against an OAuth2 config with a bad secret therefore crashes the script instead of returning a clean `NotificationResult(false, ...)`. This is the same defect class NS-008 resolved for malformed addresses, but NS-008's `ValidateAddresses` fix never covered the OAuth2 token path. (A non-cancellation `OperationCanceledException` — e.g. an internal MailKit timeout surfaced as OCE while the caller's token is *not* cancelled — would escape the same way.)
|
|
|
|
**Recommendation**
|
|
|
|
Add a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a permanent `NotificationResult(false, ...)` (credential-scrubbed, consistent with NS-009), and log it. Treating an unclassified failure as permanent is the safe default — it returns control to the script rather than crashing it; an OAuth2 transient (token endpoint 5xx) being reported as permanent is an acceptable trade against an unhandled crash, and can be refined by inspecting `HttpRequestException.StatusCode` if transient buffering of token failures is wanted.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17. Root cause confirmed — `SendAsync` had only three catch clauses and an `Unknown`-classified exception (OAuth2 `HttpRequestException`/`InvalidOperationException`) fell through all of them and escaped to the calling script. Added a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a credential-scrubbed `NotificationResult(false, "Notification delivery failed: ...")` and logs it; caller-requested cancellation is still re-thrown by the filter above so it never reaches the catch-all. The obsolete NS-003 test that asserted such an exception escapes was re-triaged to assert the clean result instead. Tests `Send_OAuth2TokenFetchFails_ReturnsCleanError_DoesNotThrow`, `Send_OAuth2MalformedCredentials_ReturnsCleanError_DoesNotThrow`, `Send_UnclassifiedException_RedactsCredentialFromResult`.
|
|
|
|
### NotificationService-016 — `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:46-67` |
|
|
|
|
**Description**
|
|
|
|
`MailKitSmtpClientWrapper.AuthenticateAsync` returns without authenticating in two silent paths. (1) `if (string.IsNullOrEmpty(credentials)) return;` (`:48-49`) — if an SMTP config has `AuthType` "basic" or "oauth2" but a null/empty `Credentials` value (a misconfigured or partially-deployed row), the method returns and `DeliverAsync` proceeds straight to `SendAsync` on an *unauthenticated* connection. (2) The `switch (authType.ToLowerInvariant())` (`:51-66`) has cases only for `"basic"` and `"oauth2"` and **no `default`** — any other `AuthType` value (a typo, a future "ntlm", an empty string) falls through with no authentication attempted. Additionally, in the `"basic"` case, `credentials.Split(':', 2)` that yields fewer than 2 parts (a credential string with no colon) is silently skipped (`:55-58`). In every case the connection then attempts to send mail unauthenticated: against a relay that does not require auth this masks a misconfiguration; against a server that does require auth it fails later with a less obvious error, and at worst an unauthenticated send succeeds where it should not. Authentication being skipped should never be silent.
|
|
|
|
**Recommendation**
|
|
|
|
Make missing/unparseable credentials and an unrecognised `AuthType` hard errors: throw a typed exception (e.g. a permanent configuration exception so `SendAsync`/`DeliverBufferedAsync` surface it as a clean failure / park) rather than returning. Add a `default:` arm to the `switch` that throws `ArgumentException` naming the unsupported auth type, and reject a "basic" credential string that does not split into exactly two parts.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17. Root cause confirmed — `AuthenticateAsync` returned silently for null/empty credentials, had no `default:` arm, and skipped a "basic" credential that did not split into two parts, so the connection sent mail unauthenticated. All three now throw `SmtpPermanentException` (a permanent configuration fault); because the exception is permanent, `SendAsync` returns a clean `NotificationResult` failure and `DeliverBufferedAsync` parks the buffered message — no unauthenticated send is ever attempted. Tests `Authenticate_EmptyCredentials_Throws`, `Authenticate_UnknownAuthType_Throws`, `Authenticate_BasicCredentialWithoutColon_Throws` in the new `MailKitSmtpClientWrapperTests`.
|
|
|
|
### NotificationService-017 — `NotificationOptions` is bound from configuration but never read (dead config)
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationOptions.cs:1-15`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:10-11`, `src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs:70` |
|
|
|
|
**Description**
|
|
|
|
`NotificationOptions` (with `ConnectionTimeoutSeconds` and `MaxConcurrentConnections`) is bound from the `ScadaBridge:Notification` configuration section in two places — `ServiceCollectionExtensions.AddNotificationService` (`AddOptions<NotificationOptions>().BindConfiguration(...)`) and again in `Host/SiteServiceRegistration.cs:70` (`services.Configure<NotificationOptions>(...)`). However, a repo-wide search shows no code ever injects `IOptions<NotificationOptions>` or otherwise reads either property. When NS-007 enforced the connection timeout and concurrency limit, it sourced both values from the per-site `SmtpConfiguration` entity, not from `NotificationOptions` — so this options class is now entirely dead configuration. Its XML doc still claims it "provides fallback defaults and operational limits", which is misleading: nothing falls back to it. The double binding is also redundant. Dead, falsely-documented configuration invites an operator to set `ScadaBridge:Notification:ConnectionTimeoutSeconds` and expect it to take effect, when it has no effect at all.
|
|
|
|
**Recommendation**
|
|
|
|
Either delete `NotificationOptions` and both of its registrations, or genuinely wire it in as the documented fallback (e.g. `DeliverAsync` uses `NotificationOptions` values when the `SmtpConfiguration` field is non-positive). If kept, remove the duplicate `Configure` call and correct the XML doc. The NS-007 resolution note already states the `NotificationOptions` fields "remain as operational fallback defaults" — that intent should be implemented or the class removed.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17. Root cause confirmed — `NotificationOptions` was bound but never read. Implemented the documented-fallback intent rather than deleting it: `NotificationDeliveryService` now takes an optional `IOptions<NotificationOptions>` and uses its `ConnectionTimeoutSeconds`/`MaxConcurrentConnections` whenever the deployed `SmtpConfiguration` field is non-positive (a value on the row still wins). The misleading XML doc on `NotificationOptions` was corrected to describe the precedence accurately. The duplicate `services.Configure<NotificationOptions>` in `Host/SiteServiceRegistration.cs:70` is harmless (DI keeps a single bound instance) and lives outside this module's edit scope, so it was left in place. Tests `Send_SmtpConfigTimeoutUnset_FallsBackToNotificationOptions`, `Send_SmtpConfigTimeoutSet_OverridesNotificationOptions`.
|
|
|
|
### NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:237-255` |
|
|
|
|
**Description**
|
|
|
|
Three minor issues with the NS-007 concurrency limiter. (1) `GetConcurrencyLimiter` reads the plain (non-`volatile`) field `_concurrencyLimiter` at `:242` *outside* `_limiterLock` as a fast path. The field is only written inside the lock (`:252`), so under the .NET memory model a concurrent thread is not guaranteed to observe the fully-constructed `SemaphoreSlim` via the lock-free read. In practice on x86/x64 this is benign and the double-checked write inside the lock is correct, but the unsynchronized read of a reference set under a lock is the classic lazy-init memory-model pitfall — use `Volatile.Read`/`LazyInitializer`/`Lazy<T>`. (2) The limiter is sized once from the *first* `SmtpConfiguration` ever seen and the `??=` at `:252` means it is never re-created. The design doc says one SMTP config is deployed per site, but that config can be **redeployed** with a different `MaxConcurrentConnections`; the new value is silently ignored for the lifetime of the (scoped, but potentially long-lived per request) service — and more importantly the limit is captured per `NotificationDeliveryService` instance, so behaviour depends on which scoped instance happens to create it first. (3) `SemaphoreSlim` implements `IDisposable`; `_concurrencyLimiter` is never disposed and `NotificationDeliveryService` does not implement `IDisposable`. A scoped service leaking one undisposed `SemaphoreSlim` per scope is a slow handle leak.
|
|
|
|
**Recommendation**
|
|
|
|
Replace the hand-rolled double-checked init with `Lazy<SemaphoreSlim>` or `LazyInitializer.EnsureInitialized`, or at minimum make the field access `Volatile.Read`/`Volatile.Write`. Consider hoisting the limiter to a singleton keyed by site (the limit is a per-site invariant, not a per-scope one) so redeployment with a changed limit and disposal are handled in one place; if it stays on the scoped service, implement `IDisposable` to dispose it.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled double-checked init was replaced with a `Lazy<SemaphoreSlim>` — its publication is correctly synchronised, eliminating the lock-free read of a non-`volatile` reference. `NotificationDeliveryService` now implements `IDisposable` and disposes the limiter (if created) under the existing lock, with idempotent re-entry and an `ObjectDisposedException` guard in `SendAsync`/`GetConcurrencyLimiter`; the scoped DI registration disposes it per scope. The limiter remains scoped (not hoisted to a site singleton) — the design doc deploys one SMTP config per site and the per-instance capture is bounded; the redeploy-resize concern is acknowledged as low-impact and not changed here, since hoisting would require a registration change for marginal benefit. Tests `Service_Dispose_DisposesConcurrencyLimiter` plus the existing `Send_MaxConcurrentConnections_LimitsConcurrentDeliveries`.
|
|
|
|
### NotificationService-019 — `NotificationDeliveryService` and `INotificationDeliveryService` are orphaned by the central-only redesign
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:18-442`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:20-21`, `src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33`, `src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:77` |
|
|
|
|
**Resolution** — Executed option 1. Deleted `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs`, `src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs` (also retires `NotificationResult` + `BufferedNotification`), and the orphaned `tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs` suite; reduced `AddNotificationService` to the shared SMTP primitives (`OAuth2TokenService`, `Func<ISmtpClientWrapper>`, `NotificationOptions`), updated `CompositionRootTests` (assert the primitives instead of the dead types), and removed the `Notification_Send_MockSmtp_Delivers` assertion in `IntegrationSurfaceTests` (central delivery is covered by `EmailNotificationDeliveryAdapterTests`). Grep-verified `grep -rn "INotificationDeliveryService\|NotificationDeliveryService\|NotificationResult\|BufferedNotification\|DeliverBufferedAsync" --include="*.cs" src/ tests/` before delete: zero production callers (only XML-doc cross-references in NS, MailKit wrapper, NotificationOptions and `EmailNotificationDeliveryAdapter`, plus the dead test files); cross-reference comments updated to remove the stale class references. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` succeeds (0 warnings, 0 errors); affected test projects all pass (`NotificationService.Tests` 52/52, `NotificationOutbox.Tests` 86/86 on rerun — one flaky timing-sensitive Akka.TestKit test unrelated to NS-019, `Host.Tests` 205/205); `IntegrationTests` 64/66 with two pre-existing failures in `NotificationOutboxFlowTests` (SQLite "near IF: syntax error", reproducible on pristine `main`, unrelated to NS-019).
|
|
|
|
**Description**
|
|
|
|
The updated `Component-NotificationService.md` (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central **Notification Outbox** owns dispatch and delivery, calling an `INotificationDeliveryAdapter` supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites".
|
|
|
|
The current source does not match. `NotificationDeliveryService` is a site-shaped notification sender: it accepts `(listName, subject, message)`, performs an immediate SMTP `DeliverAsync`, catches transient failures and **buffers them to a `StoreAndForwardCategory.Notification` row**, and exposes `DeliverBufferedAsync` as the matching S&F handler. That is precisely the old site-side flow the design doc says was removed. The doc explicitly notes "there is no … local SQLite copy" of notification lists at sites, yet `DeliverBufferedAsync` re-resolves the list from a repository expected to be reachable on the buffering node.
|
|
|
|
Who actually calls it?
|
|
|
|
- **Sites** do **not**. `SiteServiceRegistration.cs:33-38` documents the deliberate omission: "AddNotificationService() is intentionally NOT registered on the site path." Sites register `NotificationForwarder` (in `ZB.MOM.WW.ScadaBridge.StoreAndForward`) as the S&F handler for `StoreAndForwardCategory.Notification` (`AkkaHostedService.cs:654-660`), which Asks the central comms actor and never touches SMTP. `ScriptRuntimeContext.NotifyHelper` (in `SiteRuntime`) enqueues directly to S&F as a serialized `NotificationSubmit`, **not** via `INotificationDeliveryService.SendAsync`.
|
|
- **Central** registers it (`Program.cs:77` calls `AddNotificationService`) but no central component resolves it. The central notification dispatcher is `NotificationOutboxActor` → `INotificationDeliveryAdapter` → `EmailNotificationDeliveryAdapter`. The adapter is a full re-implementation of the connect/auth/send/disconnect sequence (see `EmailNotificationDeliveryAdapter.cs:163-222`) — it deliberately does not call `NotificationDeliveryService.DeliverAsync` (XML-doc on the adapter says "Reuses the `ZB.MOM.WW.ScadaBridge.NotificationService` SMTP machinery — `ISmtpClientWrapper`, `SmtpTlsModeParser`, `OAuth2TokenService` and the typed `SmtpPermanentException`", i.e. only the leaf primitives).
|
|
|
|
The `NotificationDeliveryService` class, its `DeliverBufferedAsync`, the `Func<ISmtpClientWrapper>` registration consumed only by it, and the `INotificationDeliveryService` interface (still in Commons) and `NotificationResult` record are therefore dead code that contradicts the design. Worse, every prior finding NS-001..NS-018 was reviewed and resolved against this dead path. The 56-test green test suite (NS-012 resolution note) exercises behaviour no production caller invokes — it gives a false sense of coverage. The misleading XML doc on `NotificationDeliveryService` ("WP-11/12: Notification delivery via SMTP") tells a maintainer this is *the* delivery path; the registration on central does the same.
|
|
|
|
Risk: an operator following the design doc will look here for "the central email delivery code" and find a parallel implementation that is never called; a future feature change (e.g. retry policy tweak) made here will silently have no effect; the `Notify` script-API end-to-end behaviour now depends on `NotificationOutbox` + `EmailNotificationDeliveryAdapter` + `NotificationForwarder`, none of which are tested in this module's suite.
|
|
|
|
**Recommendation**
|
|
|
|
Decide and execute one of:
|
|
|
|
1. **Delete `NotificationDeliveryService`, `DeliverBufferedAsync`, the `BufferedNotification` payload type, the `Func<ISmtpClientWrapper>` scoped registration (move it to NotificationOutbox if still needed there — it already has its own), and `INotificationDeliveryService`/`NotificationResult` in Commons.** Reduce `AddNotificationService` to registering the shared primitives — `OAuth2TokenService`, `ISmtpClientWrapper` factory, `NotificationOptions`. Delete the NS-001..NS-018 tests that target the orphaned path; rebase the ones that exercise primitives (`SmtpErrorClassifier`, `SmtpTlsModeParser`, `CredentialRedactor`, `EmailAddressValidator`, `MailKitSmtpClientWrapper`, `OAuth2TokenService`) which remain genuinely shared. Update `CompositionRootTests` (`tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:208-209`) and `IntegrationSurfaceTests` (`tests/ZB.MOM.WW.ScadaBridge.IntegrationTests/IntegrationSurfaceTests.cs:122-135`) to drop the stale assertions.
|
|
|
|
2. **Keep the class as the central-only Email delivery primitive** and rewrite `EmailNotificationDeliveryAdapter` to delegate to it. This is the smaller diff but the larger semantic burden — `NotificationDeliveryService.SendAsync` returns `NotificationResult` (Success / WasBuffered) which cannot encode the three-way `DeliveryOutcome` (Success / Transient / Permanent) the outbox needs, so the contract still has to change.
|
|
|
|
Recommended path is option 1: the parallel implementation in `EmailNotificationDeliveryAdapter` is already complete and matches the new design's `DeliveryOutcome` model; salvaging the old class would re-introduce the very inversion this redesign removed.
|
|
|
|
### NotificationService-020 — NS-001 fix superseded; `AkkaHostedService` would register two competing `Notification` S&F handlers if both code paths ran
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.Host/Actors/AkkaHostedService.cs:654-660`, NS-001 resolution note (this file) |
|
|
|
|
**Resolution (2026-05-28):** Added a `_notificationDeliveryHandlerRegistered` sentinel field on `AkkaHostedService` and gated the canonical `NotificationForwarder` registration with an `InvalidOperationException` guard — a future code path that re-introduces the dead NS-001 site-SMTP handler now fails fast at startup with an explicit NS-020 diagnostic, rather than silently overwriting `RegisterDeliveryHandler`'s last-write-wins map and inverting the central-only design. The sentinel's XML doc cross-references NS-001/NS-019/NS-020 so a maintainer searching for the `Notification` S&F handler finds the one canonical registration and its history.
|
|
|
|
**Description**
|
|
|
|
NS-001 was resolved by registering an `S&F → DeliverBufferedAsync` handler for `StoreAndForwardCategory.Notification` at site startup in `AkkaHostedService`. The current source registers a **different** handler for the same category at `AkkaHostedService.cs:654-660` — `NotificationForwarder.DeliverAsync`, which forwards to central instead of sending SMTP. `StoreAndForwardService.RegisterDeliveryHandler` (verified by reading `StoreAndForward/StoreAndForwardService.cs` around line 109) takes a single handler per category — last-write-wins or first-write-wins, either way the two registrations cannot both be active.
|
|
|
|
The NS-001 resolution note in this file describes a state of the code that no longer exists: it says the handler "is now registered at site startup in `AkkaHostedService`" and points to a handler resolving `NotificationDeliveryService` via a fresh DI scope. That registration is gone from the current `AkkaHostedService` (only `ExternalSystem`, `CachedDbWrite`, and the `NotificationForwarder`-based `Notification` registration are present at the current location). So the NS-001 fix has been silently rolled back / replaced as part of the central-only redesign.
|
|
|
|
The risk this finding tracks is not the current state per se — `NotificationForwarder` registration is correct under the new design — but the **stale resolution note** plus the fact that `NotificationDeliveryService.DeliverBufferedAsync` still exists in this module and is still tested as an S&F handler. A future merge or revert that re-introduces the NS-001-style registration (because it is what the test suite shape implies) would conflict with `NotificationForwarder`. The two handlers do diametrically opposite things (forward to central vs. send SMTP locally on a site where there is no SMTP config), so a misregistration would cause a silent regression of the design inversion.
|
|
|
|
**Recommendation**
|
|
|
|
Mark the NS-001 resolution note in this file as **superseded by NS-019** with a one-line note explaining that the registration was removed when sites stopped delivering. Delete the orphan `DeliverBufferedAsync` and its tests as part of the NS-019 work. Add a comment on `NotificationForwarder` registration in `AkkaHostedService` cross-referencing NS-019/NS-020 so a maintainer searching for the `Notification` S&F handler finds the one canonical registration.
|
|
|
|
### NotificationService-021 — OAuth2 SASL constructed with empty user identifier; M365 SMTP will reject the auth handshake
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:76-79` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
case "oauth2":
|
|
// OAuth2 token is passed directly as credentials (pre-fetched by token service)
|
|
var oauth2 = new SaslMechanismOAuth2("", credentials);
|
|
await _client.AuthenticateAsync(oauth2, cancellationToken);
|
|
break;
|
|
```
|
|
|
|
`SaslMechanismOAuth2(string userName, string token)` — MailKit's XOAUTH2 mechanism — sends the SASL initial response as `user=<userName>\x01auth=Bearer <token>\x01\x01`. Microsoft 365 (and most OAuth2-enabled SMTP relays) **require the `userName` field to be the From mailbox identity the token was issued for**; an empty string is rejected with a server response like `535 5.7.3 Authentication unsuccessful` ("Either the user identity does not match the principal in the token, or the user is empty"). Office 365's documentation for SMTP AUTH XOAUTH2 calls this out explicitly.
|
|
|
|
The token-fetch path supports this: `OAuth2TokenService.GetTokenAsync` issues a Client Credentials grant against `login.microsoftonline.com/{tenantId}/oauth2/v2.0/token` with `scope=https://outlook.office365.com/.default`, which is the Microsoft 365 SMTP send scope — meaning the intended target is M365 SMTP, which is precisely the server that rejects an empty user. The `SmtpConfiguration.FromAddress` field is exactly the user identity that should be passed.
|
|
|
|
This bug is not caught by tests because every existing test uses a fake `ISmtpClientWrapper` (`Substitute.For<ISmtpClientWrapper>()`, `RecordingAuthClient`, etc.) — `MailKitSmtpClientWrapper.AuthenticateAsync` is never exercised against a real `SaslMechanismOAuth2`. The OAuth2 delivery test (NS-012, `Send_OAuth2Config_AuthenticatesWithResolvedAccessToken`) only asserts the wrapper's `AuthenticateAsync` is invoked with `("oauth2", "<access-token>")`; the wrapper itself is mocked out. The same defect is present in `EmailNotificationDeliveryAdapter` only because it routes through this same `AuthenticateAsync` method.
|
|
|
|
**Recommendation**
|
|
|
|
Pass the sender mailbox into the wrapper's `AuthenticateAsync` path. The cleanest fix is to thread `config.FromAddress` (or a dedicated `oauth2UserName` parameter) through `ISmtpClientWrapper.AuthenticateAsync` so the OAuth2 branch can construct `new SaslMechanismOAuth2(config.FromAddress, credentials)`. Add an integration-style test that runs `MailKitSmtpClientWrapper.AuthenticateAsync` against a stub `SmtpClient` and asserts the XOAUTH2 initial-response bytes contain the expected `user=<from>` field, so this regression is caught next time.
|
|
|
|
### NotificationService-022 — `MailKitSmtpClientWrapper` holds a long-lived `SmtpClient`; combined with per-send factory, the design comment about pooling is contradicted
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:14`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:19` |
|
|
|
|
**Resolution (2026-05-28):** Took option (a) — updated the class-level XML doc on `MailKitSmtpClientWrapper` with an explicit lifetime section (NS-022) describing single-connection ownership and the per-delivery factory contract (NOT a pool; `MailKit.Net.Smtp.SmtpClient` holds one TCP/TLS connection and is not thread-safe; the DI `Func<ISmtpClientWrapper>` is a factory, callers run connect/auth/send/disconnect/dispose per send). A field-level comment was added to the `_client` declaration cross-referencing the class docs so a maintainer hunting from the field-initializer immediately sees the constraint. The wrapper code itself is unchanged — option (b) (transient SmtpClient per Send) was deliberately not taken because it would re-handshake TLS per email which is materially more expensive than the current per-delivery factory model the callers already implement. The concurrent-connection limit regression on the central-side delivery path (per-site semaphore on `EmailNotificationDeliveryAdapter`) is out of NotificationService scope and tracked separately.
|
|
|
|
**Description**
|
|
|
|
`MailKitSmtpClientWrapper` declares `private readonly SmtpClient _client = new();` — a single `SmtpClient` is constructed when the wrapper is constructed and lives for the wrapper's lifetime. The DI registration is `services.AddSingleton<Func<ISmtpClientWrapper>>(_ => () => new MailKitSmtpClientWrapper());` (`ServiceCollectionExtensions.cs:19`) — every invocation of the factory creates a **new** wrapper and therefore a **new** `SmtpClient`. `NotificationDeliveryService.DeliverAsync` (the orphan, per NS-019) and `EmailNotificationDeliveryAdapter.SendAsync` both invoke the factory per send and dispose the wrapper at end of send. So in practice there is no connection pooling — every send pays a full TCP+TLS handshake.
|
|
|
|
This is internally consistent (and matches MailKit guidance — `SmtpClient` is not thread-safe and reusing across deliveries needs careful guarding). However:
|
|
|
|
1. The XML on the wrapper class says nothing about lifetime; the field-initializer `new SmtpClient()` *implies* a reusable connection. A maintainer might "fix" the factory to reuse a single wrapper (singleton) believing they are enabling pooling, and immediately introduce a concurrency bug: `MailKit.SmtpClient` rejects concurrent send calls and the wrapper carries no synchronization.
|
|
2. `ConnectAsync` mutates `_client.Timeout` (`MailKitSmtpClientWrapper.cs:39-42`) every time it runs. If a wrapper is ever reused across deliveries with different `SmtpConfiguration.ConnectionTimeoutSeconds` values, the timeout is silently overwritten — not a current bug, but a latent footgun.
|
|
3. The design doc requirement "Max concurrent connections (default 5)" is currently honoured by the NS-007 `SemaphoreSlim` on `NotificationDeliveryService`, but `EmailNotificationDeliveryAdapter` has **no equivalent throttle** — see `EmailNotificationDeliveryAdapter.cs:163-222`, no semaphore. So on central, where the actual delivery now happens, the design-doc concurrency limit is no longer enforced. This is a regression introduced by the redesign — the outbox does not carry NS-007's limiter forward.
|
|
|
|
**Recommendation**
|
|
|
|
Document the per-send lifecycle on `MailKitSmtpClientWrapper` (XML on the class: "one wrapper per delivery; the wrapper owns a single `SmtpClient` that is connected/authenticated/sent/disconnected/disposed once"). Either move the NS-007 `SemaphoreSlim` into a shared per-site holder consumed by `EmailNotificationDeliveryAdapter`, or accept the loss and update the design doc. Add `[Obsolete]` or `internal` to discourage re-using a wrapper across sends.
|
|
|
|
### NotificationService-023 — XML docs on the orphaned classes still describe the removed site-delivery flow; misleading to maintainers
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:12-17`, `src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs:3-12`, `src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:8-9` |
|
|
|
|
**Resolution (2026-05-28):** Closed by NS-019 — both `NotificationDeliveryService.cs` and `INotificationDeliveryService.cs` were removed in commit `ac96b83`, and `ServiceCollectionExtensions.AddNotificationService`'s XML doc was rewritten in the same commit to describe the central-only design (shared SMTP primitives consumed by `EmailNotificationDeliveryAdapter`, with an explicit NS-019 cross-reference and a note that sites no longer deliver notifications). No stale XML docs remain in this module.
|
|
|
|
**Description**
|
|
|
|
XML comments still claim the dead path is the live path:
|
|
|
|
- `NotificationDeliveryService` class summary: "WP-11: Notification delivery via SMTP. WP-12: Error classification and S&F integration. Transient: connection refused, timeout, SMTP 4xx → hand to S&F. Permanent: SMTP 5xx → returned to script." This is the pre-redesign behaviour. The site-S&F branch in particular is dead (see NS-019), and "returned to script" is no longer accurate — `Notify.Send` is async and never returns a permanent error to the script per the design doc.
|
|
- `INotificationDeliveryService` (Commons): "Interface for sending notifications. Implemented by NotificationService, consumed by ScriptRuntimeContext." Verified against source: `ScriptRuntimeContext` does **not** consume this interface — it enqueues directly to `StoreAndForwardService` (see `SiteRuntime/Scripts/ScriptRuntimeContext.cs:1770-1774`). The Commons-level claim therefore documents an interaction that no longer exists.
|
|
- `NotificationResult` is a record returned only by the orphaned `SendAsync`. The Notification Outbox uses `DeliveryOutcome` instead, which encodes the Success/Transient/Permanent three-way that `NotificationResult(Success, ErrorMessage, WasBuffered)` cannot.
|
|
- `ServiceCollectionExtensions.AddNotificationService` XML doc says "Registers the notification delivery services (SMTP, OAuth2 token, delivery adapter)" — no mention that the central-only redesign means most of what it registers is unused.
|
|
|
|
A reader following the XML docs from any entry point ends up at a path that does not run. The CLAUDE.md "External Integrations" section and `Component-NotificationService.md` describe the new design; the in-source docs contradict them.
|
|
|
|
**Recommendation**
|
|
|
|
Tied to NS-019: if the orphan classes are deleted, this finding closes itself. If they are kept temporarily, prepend each summary with "**Obsolete — superseded by NotificationOutbox's `EmailNotificationDeliveryAdapter`. Retained for transitional compatibility; do not add new callers.**" and update `INotificationDeliveryService`'s summary to reflect the inverted flow or remove the interface.
|
|
|
|
### NotificationService-024 — No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Status | Resolved |
|
|
| Location | `tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs`, `tests/ZB.MOM.WW.ScadaBridge.IntegrationTests/IntegrationSurfaceTests.cs:118-136`, `tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:207-209` |
|
|
|
|
**Resolution (2026-05-28):** Closed by NS-019 — the orphaned `NotificationDeliveryService` class, its `INotificationDeliveryService` Commons interface, and the associated `NotificationDeliveryServiceTests.cs` test file (~40 tests asserting `SendAsync`/`DeliverBufferedAsync` behaviour against a code path no production caller resolves) were all deleted in the NS-019 fix commit. Verification: a directory listing of `tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/` shows only `CredentialRedactorTests.cs`, `MailKitSmtpClientWrapperTests.cs`, `NotificationOptionsTests.cs`, `OAuth2TokenServiceTests.cs`, `SmtpErrorClassifierTests.cs`, and `SmtpTlsModeParserTests.cs` — every retained file exercises a primitive that the central NotificationOutbox `EmailNotificationDeliveryAdapter` still depends on, so the false-coverage signal this finding called out no longer exists. The "no test affirms the central-only invariant" gap was the consequence of the orphaned tests existing; with them gone, the module test suite genuinely scopes to the shared SMTP primitives. The architecture-test recommendation (banning new consumers of `INotificationDeliveryService`) is moot once the interface itself is gone.
|
|
|
|
**Description**
|
|
|
|
The module test suite has 56 tests; counting `NotificationDeliveryServiceTests.cs`, ~40 of them exercise `NotificationDeliveryService.SendAsync`/`DeliverBufferedAsync` — code paths that, per NS-019, no production caller resolves. They pass against the orphaned class and so the suite stays green, but the green is a false signal: changing the dead implementation (or deleting it) does not flag any regression in the live notification-delivery flow, which now lives in `EmailNotificationDeliveryAdapter` (covered by NotificationOutbox's own tests) and `NotificationForwarder` (covered, if at all, by StoreAndForward's tests).
|
|
|
|
In particular there is **no test in this module** that affirms the central-only invariant the design doc requires:
|
|
|
|
- No test that `AddNotificationService()` registered on a *site* role would be inert / no-op'd, or that `SiteServiceRegistration.Configure` does **not** call `AddNotificationService` (an obvious regression vector — re-adding it would silently restore the orphaned site-delivery path).
|
|
- No test that confirms `INotificationDeliveryService` has no production consumer (i.e. an architecture test that fails if anyone re-introduces a constructor parameter or `GetRequiredService<INotificationDeliveryService>()` call).
|
|
- The cross-module `CompositionRootTests` (`tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:208-209`) still asserts `NotificationDeliveryService` and `INotificationDeliveryService` are registered, locking in the orphan rather than catching it.
|
|
- `IntegrationSurfaceTests.cs:122-125` constructs `NotificationDeliveryService` directly to validate "the integration surface" — testing a surface that no script actually crosses.
|
|
|
|
**Recommendation**
|
|
|
|
After NS-019 is decided:
|
|
|
|
1. If the orphan is deleted, remove the orphaned-path tests (NS-001/004/005/007/008/009/010/014/015/016/017/018-style tests targeting `SendAsync`/`DeliverBufferedAsync`). Retain `SmtpErrorClassifierTests`, `SmtpTlsModeParserTests`, `CredentialRedactorTests`, `OAuth2TokenServiceTests`, and `MailKitSmtpClientWrapperTests` (primitives genuinely shared). Update `CompositionRootTests` to drop the stale rows and `IntegrationSurfaceTests` to call the live path via `INotificationDeliveryAdapter`/`EmailNotificationDeliveryAdapter`.
|
|
2. Add a one-shot architecture test in `tests/ZB.MOM.WW.ScadaBridge.Architecture.Tests` (if it exists, else this module) that scans for direct references to `INotificationDeliveryService` outside this project and the obsolete-interface declaration in Commons, failing if any new consumer reappears.
|
|
|
|
### NotificationService-025 — `CredentialRedactor` over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.NotificationService/CredentialRedactor.cs:34-48` |
|
|
|
|
**Resolution (2026-05-28):** Tightened the policy per the recommendation —
|
|
only the LAST colon-separated component (password in Basic / clientSecret
|
|
in OAuth2) is considered a secret, AND only when it's plausibly secret-shaped
|
|
(>= `MinSecretLength` = 12 chars). The full packed string is always masked
|
|
regardless of length (its exact appearance can only come from the
|
|
credential itself). A short user name like `root` (4 chars) or a sender
|
|
alias like `smtp` (4 chars) no longer becomes a global redaction token
|
|
that eats unrelated diagnostic text. 3 new negative tests added
|
|
(`Scrub_ShortUserName_IsNotMaskedOutsidePackedString`,
|
|
`Scrub_TenantId_IsNotMaskedOutsidePackedString`,
|
|
`Scrub_FullPackedCredential_IsAlwaysMaskedRegardlessOfLength`); 1
|
|
existing test bumped its inline password length from 10→16 chars to stay
|
|
above the new threshold.
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
var parts = credentials.Split(':')
|
|
.Where(p => p.Length >= 4)
|
|
.Append(credentials)
|
|
.Distinct()
|
|
.OrderByDescending(p => p.Length);
|
|
|
|
foreach (var part in parts)
|
|
{
|
|
result = result.Replace(part, Mask, StringComparison.Ordinal);
|
|
}
|
|
```
|
|
|
|
The threshold `p.Length >= 4` is permissive enough that common short identifiers used by operators become aggressive global redaction tokens:
|
|
|
|
- A Basic-Auth credential of `root:hunter2` produces components `["root", "hunter2", "root:hunter2"]`. Every literal `root` anywhere in the exception/log text is masked — including unrelated mentions like file paths (`/root/.config`) or default-account names in the server's reply. This obscures legitimate diagnostic information without protecting any additional secret.
|
|
- An OAuth2 tenant id is a GUID (long, safe). The client id is typically a GUID. The client secret is the high-entropy part. The full `tenant:client:secret` is the actual sensitive triple. A tenant GUID embedded in unrelated text (a tenant-bound error code, a partial URL) will be masked even when the appearance is non-sensitive.
|
|
- The user name in Basic Auth is sometimes the From address (`scada-notifications@company.com`) — masking *the company's notification mailbox* in every log line that mentions it has real operational cost.
|
|
|
|
The function also uses `String.Replace` ordinarily, not word-boundary aware — a 4-char prefix that happens to be a substring of a longer benign token gets eaten.
|
|
|
|
The threshold is a defence-in-depth choice; the existing tests assert that `Hunter2pw!` and `Sup3rSecretValue` are masked (good) and that `null` text/credentials are handled (good), but nothing pins the negative behaviour: e.g. a test that a 4-char user name `root` is **not** also masked when it appears in an unrelated path.
|
|
|
|
**Recommendation**
|
|
|
|
Tighten the redaction policy: mask only the obviously-secret components — the password (Basic), the client secret (OAuth2), and the whole packed string — not the user name / tenant / client id. The simplest implementation is to redact only the **last** colon-separated component (the secret) plus the full packed string. Bump the per-component minimum length to something high enough that a typical short user name does not match (≥ 12 chars is the usual heuristic for a password). Add a test asserting `Scrub("/root/.config", "root:hunter2")` does not mask `/root/.config`'s `root`.
|