Files
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00

83 KiB

Code Review — NotificationService

Field Value
Module src/ZB.MOM.WW.ScadaBridge.NotificationService
Design doc docs/requirements/Component-NotificationService.md
Status Reviewed
Last reviewed 2026-05-28
Reviewer claude-agent
Commit reviewed 1eb6e97
Open findings 0

Summary

The NotificationService module is small (6 source files) and structurally clean: it abstracts the SMTP client behind an interface, isolates the OAuth2 token lifecycle, and integrates with the Store-and-Forward Engine for transient-failure buffering. However, the review surfaced several substantive defects. The most serious is that no Store-and-Forward delivery handler is ever registered for the Notification category — buffered notifications are persisted but never retried or delivered, silently losing every notification that hit a transient SMTP failure. Error classification is fragile (substring matching on exception messages) and is applied inconsistently between SendAsync and DeliverAsync. DeliverAsync also contains a resource-management bug that constructs and leaks two SMTP clients per call. Secondary themes: the OAuth2TokenService singleton caches a single token keyed to no credential identity (incorrect if multiple SMTP configs exist), several design-doc requirements are unimplemented (connection timeout, max concurrent connections, TLS SSL/None modes), and credentials are stored and passed as plaintext string values. Test coverage exercises the happy path and the main error branches but misses the OAuth2 delivery path, the permanent-classification fallback in DeliverAsync, and concurrency on the token cache.

Re-review 2026-05-17 (commit 39d737e)

Re-reviewed at commit 39d737e after the NS-001..013 fixes. All twelve prior findings remain closed (eleven Resolved, NS-013 Deferred) and the fixes hold up against the current source — error classification is now typed, the SMTP client is created once and always disconnected/disposed, the S&F Notification delivery handler is registered in AkkaHostedService, and the OAuth2 token cache is keyed per credential. The re-review surfaced five new findings. The recurring theme is unclassified-exception handling: the typed ClassifySmtpError helper introduced by NS-002/003 returns Unknown for anything that is not a recognised SMTP/socket failure, but neither SendAsync nor DeliverBufferedAsync has a catch-all for the Unknown bucket. As a result an OAuth2 token-fetch failure (HttpRequestException from EnsureSuccessStatusCode, or InvalidOperationException from a malformed credential string) escapes SendAsync as a raw exception to the calling script (NS-015) and escapes DeliverBufferedAsync out of the S&F handler, where the engine treats any thrown exception as transient and retries a permanently-broken config forever (NS-014) — the same defect class NS-008 fixed for address parsing, but the OAuth2 path was never covered. Secondary findings: AuthenticateAsync silently proceeds unauthenticated for an unknown authType or empty credentials (NS-016); NotificationOptions is bound from configuration in two places but never read by any code (NS-017, dead config — NS-007 sourced the timeout/limit from SmtpConfiguration instead); and the lazily-created concurrency limiter is read outside its lock, is sized once and never resized on redeployment, and is never disposed (NS-018).

Re-review 2026-05-28 (commit 1eb6e97)

Re-reviewed at commit 1eb6e97 against the materially-changed design: per the updated Component-NotificationService.md and CLAUDE.md, the Notification Service is now central-only. Sites no longer deliver notifications over SMTP — a script's Notify.Send enqueues to the site Store-and-Forward Engine and NotificationForwarder.DeliverAsync (S&F handler in StoreAndForward) forwards the payload to the central Notification Outbox, which dispatches via the INotificationDeliveryAdapter registered for the list's Type. Email delivery on central is performed by EmailNotificationDeliveryAdapter in the NotificationOutbox project — it reuses this module's SMTP machinery (ISmtpClientWrapper, OAuth2TokenService, SmtpErrorClassifier, SmtpTlsModeParser, EmailAddressValidator, CredentialRedactor, SmtpPermanentException, NotificationOptions) but is the actual production caller. The intended residual responsibility of this module is to supply that shared SMTP machinery plus list/SMTP-config definition management on central.

The re-review surfaced seven new findings. The dominant theme is dead code that contradicts the design doc: NotificationDeliveryService, the INotificationDeliveryService interface in Commons, the NotificationResult record, the entire DeliverBufferedAsync S&F handler, and the prior NS-001… NS-018 test fixtures that exercise them are now orphaned — no production code path resolves INotificationDeliveryService on a site (sites no longer register this module per SiteServiceRegistration.cs:33-38) and on central the NotificationOutbox uses its own EmailNotificationDeliveryAdapter (which duplicates the connect/auth/send/disconnect sequence rather than delegating to NotificationDeliveryService). The class is still registered by AddNotificationService on central (Program.cs:77) but no consumer resolves it (NS-019). The S&F handler must be registered workaround that NS-001 added to AkkaHostedService is itself superseded by the NotificationForwarder registered for the same category at AkkaHostedService.cs:654-660 (NS-020). Secondary findings: a real-world correctness gap (the OAuth2 SaslMechanismOAuth2 is constructed with an empty user id so server-side account binding fails for any provider that requires it — NS-021); the SMTP client wrapper holds a single MailKit.SmtpClient for the lifetime of the wrapper but the factory delegate creates a new wrapper per send, so successive sends through the same factory share NO connection but DO share a wrapper that mutates _client.Timeout on every connect (benign because every wrapper has its own client, but the design comment about pooling is now contradicted — NS-022); the design-doc retention/maintenance language has no implementation in this module and there is no test affirming the module is central-only (NS-023, NS-024); and CredentialRedactor masks any component of the credential string that is ≥ 4 characters long — a 4-character user name like root or a 4-char tenant prefix could be aggressively scrubbed out of unrelated log text (NS-025).

Checklist coverage

# Category Examined Notes
1 Correctness & logic bugs Re-review: OAuth2 SASL constructed with empty user id (NS-021); CredentialRedactor over-masks short components (NS-025). Earlier NS-005/NS-008 fixes hold.
2 Akka.NET conventions No actors in this module. AddNotificationServiceActors remains a documented no-op.
3 Concurrency & thread safety OAuth2TokenService per-credential locks now correct (NS-006 hold). No new issues.
4 Error handling & resilience NS-014/NS-015 classification fixes hold but the entire DeliverBufferedAsync / SendAsync error path is dead (NS-019/NS-020).
5 Security OAuth2 SaslMechanismOAuth2 empty user id (NS-021); CredentialRedactor aggressiveness (NS-025); at-rest encryption still deferred (NS-013).
6 Performance & resource management MailKitSmtpClientWrapper keeps a single SmtpClient for the wrapper lifetime; combined with per-send factory this means no pooling — re-document or fix (NS-022).
7 Design-document adherence Critical drift: module still exposes site-style S&F sending; the design doc inverted delivery to central months ago (NS-019). Site registration removed but central still wires the dead service.
8 Code organization & conventions INotificationDeliveryService lives in Commons and is now unused — should be retired or relocated to a NotificationService-internal namespace (NS-019). Module-vs-NotificationOutbox boundary unclear.
9 Testing coverage 56 tests pass but ~40 of them assert behaviour of a code path no production caller exercises (NS-024). No test affirms the central-only design — i.e. that AddNotificationService registers no notification-sending service on a site.
10 Documentation & comments NotificationDeliveryService XML doc still claims "WP-11/12: Notification delivery via SMTP" with no warning that the class is orphaned; INotificationDeliveryService Commons doc claims "Implemented by NotificationService, consumed by ScriptRuntimeContext" — both consumers are wrong now (NS-023).

Findings

NotificationService-001 — Buffered notifications are never retried (no S&F delivery handler)

Severity Critical
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:96, src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:8

Description

On a transient SMTP failure the service calls _storeAndForward.EnqueueAsync(StoreAndForwardCategory.Notification, ...). The Store-and-Forward Engine only delivers (immediately or on retry sweep) a category for which a delivery handler has been registered via StoreAndForwardService.RegisterDeliveryHandler. A repo-wide search shows the Notification category handler is never registered anywhere — StoreAndForwardCategory.Notification appears only in this module's EnqueueAsync call. As a result, every buffered notification falls into the RetryMessageAsync "No delivery handler for category" branch (StoreAndForwardService.cs:201-204), which logs a warning and returns without ever delivering or removing the message. Buffered notifications accumulate in SQLite forever and are never sent. This silently loses every notification that hit a transient failure, while SendAsync returns Success=true, WasBuffered=true, telling the caller the notification is safely queued. This directly violates the design doc's "integrates with the Store-and-Forward Engine for reliable delivery" guarantee.

Recommendation

Register a delivery handler for StoreAndForwardCategory.Notification during startup that deserializes the buffered payload (ListName, Subject, Message), re-resolves the list/recipients/SMTP config, and re-attempts DeliverAsync, returning true on success, false on permanent failure, and throwing on transient failure. Wire it in AddNotificationService or the host bootstrap. Add an integration test covering the buffer-then-retry-then-deliver round trip.

Resolution

Resolved 2026-05-16. A delivery handler for StoreAndForwardCategory.Notification is now registered at site startup in AkkaHostedService. The handler resolves NotificationDeliveryService in a fresh DI scope and calls the new DeliverBufferedAsync, which re-resolves the list, recipients and SMTP config and re-attempts delivery — returning true on success, false (park) on permanent failure or missing configuration, and throwing on transient failure so the engine retries. SendAsync now buffers with attemptImmediateDelivery: false so registering the handler does not send the notification twice. Regression tests cover the happy path and the list-removed park path. Fixed by the commit whose message references NotificationService-001.

NotificationService-002 — TimeoutException/OperationCanceledException misclassified as transient

Severity High
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:157-167

Description

IsTransientSmtpError treats OperationCanceledException (and its subtype TaskCanceledException) as a transient SMTP error. When the caller passes a CancellationToken that is cancelled — e.g. the Script Execution Actor is stopped, or the script times out — the resulting OperationCanceledException is caught by the catch ... when (IsTransientSmtpError(ex)) clause and the notification is buffered as if SMTP had failed. A deliberate cancellation should propagate, not be silently buffered for retry. The same clause classifies any IOException as transient even though IOException covers unrelated failures (e.g. a serialization stream error). Additionally, OperationCanceledException raised by token cancellation in the OAuth2 path would be miscategorised the same way.

Recommendation

Re-throw OperationCanceledException/TaskCanceledException when cancellationToken.IsCancellationRequested is true rather than classifying it as transient. Narrow IOException handling to SMTP-specific I/O failures, or rely on MailKit's typed exceptions (SmtpCommandException, SmtpProtocolException, ServiceNotConnectedException) instead of broad base types.

Resolution

Resolved 2026-05-16 (commit <pending>). Classification was rewritten around a typed ClassifySmtpError helper: a caller-requested cancellation (OperationCanceledException/ TaskCanceledException while cancellationToken.IsCancellationRequested) now propagates out of both SendAsync and DeliverAsync via dedicated catch filters instead of being buffered. The broad IOException catch-all was dropped — only MailKit's typed exceptions plus SocketException/TimeoutException are treated as transient. Regression tests Send_CancellationRequested_PropagatesAndDoesNotBuffer and Send_TaskCanceledException_WithCancellation_Propagates.

NotificationService-003 — Error classification by substring matching on exception messages is fragile

Severity High
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:144-147, src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:163-166

Description

Transient/permanent classification depends on ex.Message.Contains("5."), Contains("4."), Contains("550"), Contains("421"), etc. This is unreliable: (a) Message.Contains("5.") matches any message containing the literal "5." anywhere — e.g. a host name smtp5.example.com, a version string, or a path — producing false permanent classification; (b) Contains("4.") likewise matches "v4.0" or an IP address octet; (c) MailKit exposes the actual SMTP status code on SmtpCommandException.StatusCode, which is the correct, locale-independent source of truth and is being ignored; (d) message text is culture/version-dependent and not part of any stable contract. Misclassification has real consequences: a permanent failure misread as transient floods the S&F buffer (which the design doc explicitly says must be prevented), and a transient failure misread as permanent loses the notification.

Recommendation

Classify on MailKit's typed exceptions and SmtpCommandException.StatusCode (4xx → transient, 5xx → permanent), and SocketException/SmtpProtocolException/connection-refused → transient. Remove all Message.Contains checks.

Resolution

Resolved 2026-05-16 (commit <pending>). All ex.Message.Contains(...) checks were removed. The new ClassifySmtpError helper inspects SmtpCommandException.StatusCode (numeric SMTP code: 4xx → transient, 5xx → permanent) and treats SmtpProtocolException, ServiceNotConnectedException, SocketException and TimeoutException as transient; anything else is Unknown and propagates unclassified rather than being guessed. The permanent-promotion catch block in DeliverAsync now keys off this classification. Regression tests Send_Smtp5xxCommandException_ClassifiedPermanent, Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered, Send_SmtpProtocolException_ClassifiedTransient, and Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent.

NotificationService-004 — DeliverAsync constructs two SMTP clients and leaks the used one

Severity High
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:118-119

Description

using var client = _smtpClientFactory() as IDisposable;
var smtp = _smtpClientFactory();

The factory is invoked twice, creating two separate MailKitSmtpClientWrapper instances (each owning a real SmtpClient with a socket). The first instance is assigned to client and disposed by the using, but it is never used. The second instance, smtp, is the one actually connected, authenticated, used to send, and DisconnectAsync'd — but it is never Disposed. MailKitSmtpClientWrapper implements IDisposable and wraps an unmanaged socket; the connected client is leaked on every send. DisconnectAsync closes the connection but does not dispose the SmtpClient. Over time this leaks sockets/handles.

Recommendation

Create exactly one client and dispose the one that is actually used: using var smtp = _smtpClientFactory(); then cast to IDisposable only if needed (the factory's Func<ISmtpClientWrapper> should ideally return a type that the using can dispose directly — consider having ISmtpClientWrapper extend IAsyncDisposable/IDisposable).

Resolution

Resolved 2026-05-16 (commit <pending>). DeliverAsync now invokes _smtpClientFactory() exactly once and disposes the client actually used via using var disposable = smtp as IDisposable;. The previous code created two MailKitSmtpClientWrapper instances per send and disposed the unused one while leaking the connected one. Regression test Send_CreatesExactlyOneSmtpClient_AndDisposesIt verifies the factory is invoked once and the resulting client is disposed.

NotificationService-005 — Non-TLS path uses SecureSocketOptions.Auto, contradicting the requested mode

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:18, src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:123

Description

ConnectAsync maps useTls to either SecureSocketOptions.StartTls or SecureSocketOptions.Auto. useTls is computed in DeliverAsync as TlsMode == "starttls". So a configuration of TlsMode = "none" produces useTls = falseSecureSocketOptions.Auto, which lets MailKit opportunistically negotiate TLS — the opposite of "None". Worse, the design doc defines three TLS modes — None, StartTLS, SSL — but the code collapses them to a single boolean, so SSL (implicit TLS, typically port 465) is treated identically to None/Auto and the SSL mode is effectively unsupported. The bool useTls parameter cannot represent the three-state requirement.

Recommendation

Pass the TlsMode string (or a TlsMode enum) through to the wrapper and map explicitly: NoneSecureSocketOptions.None, StartTLSSecureSocketOptions.StartTls, SSLSecureSocketOptions.SslOnConnect. Validate the configured value and reject unknown modes.

Resolution

Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the bool useTls parameter cannot represent three states, and the non-StartTLS branch used SecureSocketOptions.Auto. A new SmtpTlsMode enum (None/StartTls/Ssl) and SmtpTlsModeParser were added; ISmtpClientWrapper.ConnectAsync now takes SmtpTlsMode and MailKitSmtpClientWrapper maps it explicitly to SecureSocketOptions.None/ StartTls/SslOnConnect. SendAsync/DeliverBufferedAsync validate the configured TlsMode up front — an unknown value returns a clean NotificationResult failure (or parks a buffered message) instead of silently negotiating TLS. Regression tests: Send_TlsModeNone_DoesNotNegotiateTls, Send_TlsModeSsl_UsesImplicitSsl, Send_UnknownTlsMode_ReturnsErrorNotSilentFallback, and the SmtpTlsModeParserTests set.

NotificationService-006 — OAuth2 token cache is keyed to nothing; wrong token returned when multiple SMTP configs exist

Severity Medium
Category Concurrency & thread safety
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:14-15, src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:30-35

Description

OAuth2TokenService is registered as a singleton and stores a single _cachedToken/_tokenExpiry pair. GetTokenAsync ignores the credentials argument when deciding whether the cache is valid — it only checks expiry. If two SMTP configurations with different tenant/client credentials are ever used (the repository's GetAllSmtpConfigurationsAsync returns a list, implying multiple configs are possible), the second caller receives the first caller's token, which will fail authentication against the second tenant. Even with a single config today this is a latent correctness bug and makes the service's behaviour depend on call order.

Recommendation

Key the cache by the credential identity (e.g. a dictionary keyed by tenantId:clientId, or by a hash of the credential string), or document and enforce the single-SMTP-config invariant. Given the design doc says one SMTP config is deployed per site, enforcing the invariant is acceptable but should be explicit.

Resolution

Resolved 2026-05-16 (commit pending). Root cause confirmed: the singleton held a single _cachedToken/_tokenExpiry pair and GetTokenAsync ignored the credentials argument when validating the cache, so a second SMTP config got the first config's token. OAuth2TokenService now stores a ConcurrentDictionary<string, CacheEntry> keyed by the SHA-256 hash of the credential string; each distinct tenant/client/secret gets its own cached token, expiry, and per-credential SemaphoreSlim (double-checked locking preserved). Regression tests: GetTokenAsync_DifferentCredentials_ReturnPerCredentialTokens and GetTokenAsync_SameCredentials_CachedPerCredential.

NotificationService-007 — Connection timeout and max-concurrent-connections from the design doc are not implemented

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationOptions.cs:11-14, src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:16-20, src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:111-140

Description

The design doc specifies an SMTP "Connection timeout (default 30s)" and "Max concurrent connections (default 5)", and NotificationOptions/SmtpConfiguration both carry these fields. Neither is enforced: MailKitSmtpClientWrapper.ConnectAsync never sets SmtpClient.Timeout, so the connection relies on MailKit's default timeout rather than the configured value (only the caller's CancellationToken bounds it, and callers may pass default). There is no semaphore or other throttle limiting concurrent SMTP connections per site, so MaxConcurrentConnections has no effect. Both options exist but are dead configuration.

Recommendation

Set SmtpClient.Timeout from ConnectionTimeoutSeconds in ConnectAsync (and/or derive a linked CancellationTokenSource). Introduce a SemaphoreSlim(MaxConcurrentConnections) gating DeliverAsync. If these limits are intentionally deferred, mark the options [Obsolete]/document them as not-yet-enforced and note the gap in the design doc.

Resolution

Resolved 2026-05-16 (commit pending). Root cause confirmed: ConnectAsync never set SmtpClient.Timeout and no throttle gated DeliverAsync. ISmtpClientWrapper.ConnectAsync now takes a connectionTimeoutSeconds argument; MailKitSmtpClientWrapper sets SmtpClient.Timeout from SmtpConfiguration.ConnectionTimeoutSeconds. DeliverAsync acquires a lazily-created SemaphoreSlim sized to SmtpConfiguration.MaxConcurrentConnections (default 5 when non-positive) and releases it in a finally, so concurrent SMTP deliveries per site are bounded. The timeout is sourced from the deployed SmtpConfiguration rather than NotificationOptions; the NotificationOptions fields remain as operational fallback defaults. Regression tests: Send_PassesConfiguredConnectionTimeoutToClient and Send_MaxConcurrentConnections_LimitsConcurrentDeliveries.

NotificationService-008 — Recipient email addresses are not validated before send

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:136-137, src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:50-53

Description

SendAsync builds bccAddresses directly from recipient.EmailAddress and passes them to MailboxAddress.Parse. If any recipient row has a malformed address, MailboxAddress.Parse throws ParseException. ParseException is not a TimeoutException/SocketException/IOException and its message will not generally contain "4." or "5.", so it falls through DeliverAsync's outer catch ... when (... && !IsTransientSmtpError(ex)) filter, which re-throws it (:153); it then escapes SendAsync entirely as an unhandled exception (the SendAsync catch blocks only cover SmtpPermanentException and transient errors). A single bad address in a list therefore crashes the send with an exception type the calling script is not told to expect, instead of producing a clean NotificationResult error. The same applies to a malformed FromAddress.

Recommendation

Validate addresses up front (e.g. MailboxAddress.TryParse) and return a NotificationResult(false, ...) listing invalid recipients, or wrap DeliverAsync so any non-classified exception becomes a permanent NotificationResult failure rather than escaping. Consider validating addresses at definition time in the Central UI as well.

Resolution

Resolved 2026-05-16 (commit pending). Root cause confirmed: MailboxAddress.Parse of a malformed FromAddress/recipient threw ParseException, which is unclassified and escaped SendAsync as an unhandled exception. A new ValidateAddresses helper uses MailboxAddress.TryParse for the sender and every recipient; SendAsync now returns a clean NotificationResult(false, ...) listing the invalid address(es) before any SMTP attempt, and DeliverBufferedAsync parks a buffered message with a bad address (a fault retrying cannot fix). Regression tests: Send_MalformedRecipientAddress_ReturnsCleanError_DoesNotThrow and Send_MalformedFromAddress_ReturnsCleanError_DoesNotThrow. Definition-time validation in the Central UI is a separate component and out of this module's scope.

NotificationService-009 — Credentials handled as plaintext strings; OAuth2 client secret logged risk

Severity Medium — re-triaged: split into an in-scope log-leak fix (resolved) and a Commons-scoped at-rest-encryption / structured-credential follow-up (NotificationService-013, Deferred).
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:127-134, src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:30-65, src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:9

Description

SMTP credentials — Basic Auth user:pass and OAuth2 tenantId:clientId:clientSecret — are stored and passed as a single colon-delimited plaintext string (SmtpConfiguration.Credentials). There is no indication the value is encrypted at rest in SQLite or in the central config DB. The colon-delimited packing is also brittle: a password or client secret containing a : will be split incorrectly (Split(':', 2) / Split(':', 3)), silently corrupting the secret. Separately, while the current code does not log the secret directly, the substring-based error classification logs full exception messages (_logger.LogWarning(ex, ...), LogError(ex, ...)) and MailKit exceptions can echo back server responses; an authentication failure message could surface credential fragments into logs. There is no defensive scrubbing.

Recommendation

Store credentials encrypted at rest (DPAPI/Data Protection or a secret store) and model them as structured fields rather than a colon-packed string, so secrets containing : are safe. Ensure credential values are never written to logs; consider a redaction step on exception messages before logging.

Resolution

Resolved 2026-05-16 (commit pending). Root cause re-triaged against source: the finding conflates two concerns with different ownership.

  1. Log-leak risk (in scope — fixed). The original code logged whole exception objects (_logger.LogWarning(ex, ...) / LogError(ex, ...)); MailKit auth exceptions can echo server responses quoting the supplied credentials. A new internal CredentialRedactor masks every colon-delimited credential component out of any text. SendAsync and DeliverBufferedAsync now log a scrubbed message string (not the raw exception) and the permanent-failure NotificationResult is scrubbed before it returns to the caller. OAuth2TokenService logs the tenant id only — never the client secret or access token. Regression tests: CredentialRedactorTests and Send_PermanentError_RedactsCredentialFromResultMessage.
  2. At-rest encryption + structured-credential modelling (out of scope — Deferred). Encrypting SmtpConfiguration.Credentials at rest and replacing the brittle colon-packed string with structured fields requires editing src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs and the ConfigurationDatabase EF layer — both outside this module. Tracked separately as NotificationService-013 (Deferred) so it is not lost.

NotificationService-013 — Encrypt SMTP credentials at rest; replace colon-packed string with structured fields

Severity Medium
Category Security
Status Deferred
Location src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:9, ConfigurationDatabase EF mapping

Description

Split out of NotificationService-009. SmtpConfiguration.Credentials packs Basic Auth user:pass and OAuth2 tenantId:clientId:clientSecret into a single plaintext colon-delimited string: (a) there is no encryption at rest in SQLite or the central config DB; (b) a password or client secret containing a : is split incorrectly by Split(':', 2) / Split(':', 3), silently corrupting the secret.

Recommendation

Model credentials as structured fields (or an encrypted blob) on the Commons entity and encrypt at rest via Data Protection / a secret store. The colon-delimited parsing in MailKitSmtpClientWrapper and OAuth2TokenService would then consume the structured fields directly.

Resolution

Deferred — requires changes to src/ZB.MOM.WW.ScadaBridge.Commons and the ConfigurationDatabase component, which are outside the NotificationService module. To be addressed in a Commons/ConfigurationDatabase-scoped change. The associated log-leak risk is resolved under NotificationService-009.

NotificationService-010 — DeliverAsync does not disconnect the SMTP client on failure

Severity Low
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:121-154

Description

DisconnectAsync is only called at :139, on the success path inside the try block. If AuthenticateAsync or SendAsync throws, control jumps to the catch filter at :141 and the method exits (re-throwing or wrapping) without ever calling DisconnectAsync. Combined with NS-004 (the client is never disposed either), a failed send leaves an open, authenticated SMTP connection until the socket is eventually reclaimed by finalization. Under sustained transient failures this can exhaust the SMTP server's connection slots.

Recommendation

Move disconnect/dispose into a finally block (or use await using once ISmtpClientWrapper supports IAsyncDisposable) so the connection is always torn down regardless of outcome.

Resolution

Resolved 2026-05-16 (commit pending). Root cause confirmed against the current source: DeliverAsync invoked smtp.DisconnectAsync only on the success path inside the try block, so a failed Connect/Authenticate/Send left the connection open until finalization. DisconnectAsync was moved into the existing finally block (which also releases the NS-007 concurrency slot), so the SMTP connection is always torn down regardless of outcome. The disconnect is best-effort — wrapped in its own try/catch that logs at debug level — so a disconnect failure (e.g. the connection is already dead) cannot mask the original delivery exception. Regression tests Send_TransientFailureDuringSend_StillDisconnectsClient and Send_FailureDuringAuthenticate_StillDisconnectsClient.

NotificationService-011 — SmtpPermanentException declared in the wrong file; module conventions

Severity Low
Category Code organization & conventions
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:173-177, src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Notifications/SmtpConfiguration.cs:5-15

Description

Two minor convention issues. (1) SmtpPermanentException is a public exception type declared at the bottom of NotificationDeliveryService.cs rather than in its own file (SmtpPermanentException.cs), which is inconsistent with the one-type-per-file layout used elsewhere and makes it harder to locate. (2) SmtpConfiguration (a Commons POCO) declares non-nullable string properties (Host, AuthType, FromAddress) that are only guaranteed by the constructor; EF Core materialization or object-initializer use can leave them null while the type system says otherwise. These are persistence-ignorant POCO concerns but worth flagging because the delivery service dereferences config.Host, config.AuthType, config.FromAddress without null checks.

Recommendation

Move SmtpPermanentException to its own file. For SmtpConfiguration, either keep the constructor as the only path and document it, or use required members so the compiler enforces initialization.

Resolution

Resolved 2026-05-16 (commit pending). Both issues confirmed against source.

  1. SmtpPermanentException placement (in scope — fixed). The public exception type was extracted from the bottom of NotificationDeliveryService.cs into its own file, src/ZB.MOM.WW.ScadaBridge.NotificationService/SmtpPermanentException.cs, restoring the one-type-per-file layout. No behaviour change, so no dedicated regression test — the move is verified by the module test suite still compiling and passing (56 tests green).
  2. SmtpConfiguration non-nullable strings (out of scope — re-triaged). SmtpConfiguration lives in src/ZB.MOM.WW.ScadaBridge.Commons, which is outside the NotificationService module's edit scope; it cannot be changed here. This is the same Commons entity already flagged for follow-up under NotificationService-013 (Deferred). The required-members / documented-constructor change should be folded into that Commons/ConfigurationDatabase-scoped work. Note the delivery service's actual risk is bounded: SendAsync/DeliverBufferedAsync already validate TlsMode and addresses up front (NS-005/NS-008), and a null Host/AuthType from a malformed config row surfaces as a classified/clean failure rather than silent corruption.

NotificationService-012 — Test coverage gaps: OAuth2 delivery path, permanent-classification fallback, token-cache concurrency

Severity Low
Category Testing coverage
Status Resolved
Location tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs, tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/OAuth2TokenServiceTests.cs

Description

The tests cover the happy path, list-not-found, no-recipients, no-SMTP-config, permanent failure, transient-without-S&F, and transient-with-S&F buffering. Notable untested paths: (1) the OAuth2 delivery branch in DeliverAsync:128-132 — every test uses tokenService: null and Basic Auth, so OAuth2 token resolution during a send is never exercised; (2) DeliverAsync's permanent-classification fallback (:144-149) that promotes a generic exception whose message contains "550"/"553"/"554" to SmtpPermanentException is never tested; (3) OAuth2TokenServiceTests never tests concurrent GetTokenAsync calls (the double-checked-locking path) or token expiry/refresh — the cache test uses a 3600s token so refresh never triggers; (4) no test covers the transient-with-S&F path actually delivering after retry (which would also have caught NS-001). Given NS-001 is a critical defect, the absence of an end-to-end buffer-and-retry test is significant.

Recommendation

Add tests for: OAuth2-authenticated send with a mocked OAuth2TokenService; the DeliverAsync 5xx-message permanent fallback; token expiry/refresh (short expires_in); concurrent token acquisition; and an end-to-end buffered-notification retry once a Notification S&F handler is registered.

Resolution

Resolved 2026-05-16 (commit pending). Each listed gap was re-triaged against the current source (the module was substantially rewritten by NS-001..009 since this finding was filed) and the genuinely-missing tests were added:

  1. OAuth2 delivery path (gap real — covered). New test Send_OAuth2Config_AuthenticatesWithResolvedAccessToken drives DeliverAsync with an oauth2 SmtpConfiguration and a real OAuth2TokenService (mocked IHttpClientFactory), asserting the SMTP client is authenticated with the resolved access token, not the raw tenant:client:secret triple.
  2. 5xx-message permanent fallback (gap obsolete — re-triaged). The substring-based permanent fallback this item describes no longer exists: NS-003 replaced message-text matching with typed-exception/SmtpStatusCode classification. The equivalent path is already covered by Send_Smtp5xxCommandException_ClassifiedPermanent, and Send_NonSmtpExceptionWith5xxLookalikeText_NotPromotedToPermanent proves a "5xx lookalike" message is no longer promoted. No new test needed for a removed branch.
  3. Token expiry/refresh and concurrent acquisition (gaps real — covered). New tests GetTokenAsync_ExpiredToken_RefreshesOnNextCall (uses expires_in: 60 so the token is stale immediately given the 60s refresh margin) and GetTokenAsync_ConcurrentCalls_MakeExactlyOneHttpRequest (20 racing callers collapse to a single token fetch, exercising the per-credential double-checked lock).
  4. End-to-end buffer-and-retry (gap already closed — re-triaged). The buffer-then-retry-then-deliver round trip was added when NS-001 was resolved (DeliverBuffered_HappyPath_ReturnsTrue, DeliverBuffered_ListNoLongerExists_…, plus AkkaHostedService handler registration), so no further test is required here.

Module test suite is green at 56 tests.

NotificationService-014 — OAuth2 token-fetch failure escapes DeliverBufferedAsync; a permanently-broken config is retried forever

Severity High
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:214-228, src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:308-312, src/ZB.MOM.WW.ScadaBridge.NotificationService/OAuth2TokenService.cs:56-84

Description

DeliverBufferedAsync is the registered Store-and-Forward delivery handler for StoreAndForwardCategory.Notification. Per the S&F handler contract (StoreAndForwardService.cs:40-53), a handler must return true for success, false for permanent failure (park, no further retries), and throw only for a transient failure (retry). DeliverBufferedAsync's try block (:214-228) catches only SmtpPermanentException; the comment at :227 asserts "Transient SMTP errors propagate out of DeliverAsync". But DeliverAsync can also throw exceptions that ClassifySmtpError buckets as Unknown — most importantly the OAuth2 path at :308-312, where OAuth2TokenService.GetTokenAsync throws HttpRequestException (from EnsureSuccessStatusCode at OAuth2TokenService.cs:78) or InvalidOperationException (malformed credential triple, or missing access_token). None of these is an SmtpCommandException/SmtpProtocolException/SocketException/TimeoutException, so they are neither caught nor classified — they propagate straight out of DeliverBufferedAsync. The S&F engine treats any thrown exception as a transient failure, so a notification whose SMTP config has an invalid client secret (a 401 → HttpRequestException, a genuinely permanent condition) is retried on every sweep until MaxRetries is reached — burning OAuth2 token-endpoint calls and never parking promptly. A malformed credential string (InvalidOperationException, never fixable) is likewise retried instead of parked immediately.

Recommendation

Add a catch-all to DeliverBufferedAsync for exceptions that ClassifySmtpError classifies as Unknown: log and return false (park) for non-retryable causes such as InvalidOperationException from credential parsing, and decide deliberately whether an OAuth2 HttpRequestException should park or retry (a 5xx token endpoint is transient; a 401 is permanent — inspect HttpRequestException.StatusCode). Do not let an unclassified exception leave the handler and be silently reinterpreted as transient.

Resolution

Resolved 2026-05-17. Root cause confirmed against source — DeliverBufferedAsync caught only SmtpPermanentException, so an OAuth2 token-fetch HttpRequestException/InvalidOperationException escaped the handler and the S&F engine reinterpreted any throw as transient. Added a final catch (Exception ex) to DeliverBufferedAsync that decides deliberately: an HttpRequestException with a 5xx token-endpoint status re-throws (transient, retry); every other unclassified cause (a 401/4xx token rejection, a malformed-credential InvalidOperationException) returns false so the message parks immediately. Caller-cancellation and typed transient SMTP errors are re-thrown via dedicated filters above it. Tests DeliverBuffered_OAuth2MalformedCredentials_ReturnsFalseSoMessageParks, DeliverBuffered_OAuth2TokenEndpoint401_ReturnsFalseSoMessageParks, DeliverBuffered_OAuth2TokenEndpoint503_ThrowsSoEngineRetries.

NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape SendAsync to the calling script

Severity High
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:96-148, src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:308-312

Description

SendAsync's try/catch around DeliverAsync has exactly three catch clauses: SmtpPermanentException (:101), OperationCanceledException when cancellation was requested (:111), and Exception filtered by IsTransientSmtpError (:116). There is no catch-all for an exception that is none of those. ClassifySmtpError returns SmtpErrorClass.Unknown for anything it does not recognise, and IsTransientSmtpError only matches Transient, so an Unknown-classified exception falls through all three clauses and escapes SendAsync unhandled. Concretely, the OAuth2 branch at :308-312 calls OAuth2TokenService.GetTokenAsync, which throws HttpRequestException (token endpoint unreachable / 4xx / 5xx) or InvalidOperationException (malformed tenant:client:secret triple, or no access_token in the response). Both reach the calling script (an instance/alarm/shared script) as a raw exception of a type the INotificationDeliveryService contract never advertises — the design doc says delivery returns either success, a buffered transient result, or a permanent NotificationResult error. Notify.To().Send() against an OAuth2 config with a bad secret therefore crashes the script instead of returning a clean NotificationResult(false, ...). This is the same defect class NS-008 resolved for malformed addresses, but NS-008's ValidateAddresses fix never covered the OAuth2 token path. (A non-cancellation OperationCanceledException — e.g. an internal MailKit timeout surfaced as OCE while the caller's token is not cancelled — would escape the same way.)

Recommendation

Add a final catch (Exception ex) to SendAsync that converts any otherwise-unhandled exception into a permanent NotificationResult(false, ...) (credential-scrubbed, consistent with NS-009), and log it. Treating an unclassified failure as permanent is the safe default — it returns control to the script rather than crashing it; an OAuth2 transient (token endpoint 5xx) being reported as permanent is an acceptable trade against an unhandled crash, and can be refined by inspecting HttpRequestException.StatusCode if transient buffering of token failures is wanted.

Resolution

Resolved 2026-05-17. Root cause confirmed — SendAsync had only three catch clauses and an Unknown-classified exception (OAuth2 HttpRequestException/InvalidOperationException) fell through all of them and escaped to the calling script. Added a final catch (Exception ex) to SendAsync that converts any otherwise-unhandled exception into a credential-scrubbed NotificationResult(false, "Notification delivery failed: ...") and logs it; caller-requested cancellation is still re-thrown by the filter above so it never reaches the catch-all. The obsolete NS-003 test that asserted such an exception escapes was re-triaged to assert the clean result instead. Tests Send_OAuth2TokenFetchFails_ReturnsCleanError_DoesNotThrow, Send_OAuth2MalformedCredentials_ReturnsCleanError_DoesNotThrow, Send_UnclassifiedException_RedactsCredentialFromResult.

NotificationService-016 — AuthenticateAsync silently sends unauthenticated for an unknown auth type or empty credentials

Severity Medium
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:46-67

Description

MailKitSmtpClientWrapper.AuthenticateAsync returns without authenticating in two silent paths. (1) if (string.IsNullOrEmpty(credentials)) return; (:48-49) — if an SMTP config has AuthType "basic" or "oauth2" but a null/empty Credentials value (a misconfigured or partially-deployed row), the method returns and DeliverAsync proceeds straight to SendAsync on an unauthenticated connection. (2) The switch (authType.ToLowerInvariant()) (:51-66) has cases only for "basic" and "oauth2" and no default — any other AuthType value (a typo, a future "ntlm", an empty string) falls through with no authentication attempted. Additionally, in the "basic" case, credentials.Split(':', 2) that yields fewer than 2 parts (a credential string with no colon) is silently skipped (:55-58). In every case the connection then attempts to send mail unauthenticated: against a relay that does not require auth this masks a misconfiguration; against a server that does require auth it fails later with a less obvious error, and at worst an unauthenticated send succeeds where it should not. Authentication being skipped should never be silent.

Recommendation

Make missing/unparseable credentials and an unrecognised AuthType hard errors: throw a typed exception (e.g. a permanent configuration exception so SendAsync/DeliverBufferedAsync surface it as a clean failure / park) rather than returning. Add a default: arm to the switch that throws ArgumentException naming the unsupported auth type, and reject a "basic" credential string that does not split into exactly two parts.

Resolution

Resolved 2026-05-17. Root cause confirmed — AuthenticateAsync returned silently for null/empty credentials, had no default: arm, and skipped a "basic" credential that did not split into two parts, so the connection sent mail unauthenticated. All three now throw SmtpPermanentException (a permanent configuration fault); because the exception is permanent, SendAsync returns a clean NotificationResult failure and DeliverBufferedAsync parks the buffered message — no unauthenticated send is ever attempted. Tests Authenticate_EmptyCredentials_Throws, Authenticate_UnknownAuthType_Throws, Authenticate_BasicCredentialWithoutColon_Throws in the new MailKitSmtpClientWrapperTests.

NotificationService-017 — NotificationOptions is bound from configuration but never read (dead config)

Severity Low
Category Code organization & conventions
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationOptions.cs:1-15, src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:10-11, src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs:70

Description

NotificationOptions (with ConnectionTimeoutSeconds and MaxConcurrentConnections) is bound from the ScadaBridge:Notification configuration section in two places — ServiceCollectionExtensions.AddNotificationService (AddOptions<NotificationOptions>().BindConfiguration(...)) and again in Host/SiteServiceRegistration.cs:70 (services.Configure<NotificationOptions>(...)). However, a repo-wide search shows no code ever injects IOptions<NotificationOptions> or otherwise reads either property. When NS-007 enforced the connection timeout and concurrency limit, it sourced both values from the per-site SmtpConfiguration entity, not from NotificationOptions — so this options class is now entirely dead configuration. Its XML doc still claims it "provides fallback defaults and operational limits", which is misleading: nothing falls back to it. The double binding is also redundant. Dead, falsely-documented configuration invites an operator to set ScadaBridge:Notification:ConnectionTimeoutSeconds and expect it to take effect, when it has no effect at all.

Recommendation

Either delete NotificationOptions and both of its registrations, or genuinely wire it in as the documented fallback (e.g. DeliverAsync uses NotificationOptions values when the SmtpConfiguration field is non-positive). If kept, remove the duplicate Configure call and correct the XML doc. The NS-007 resolution note already states the NotificationOptions fields "remain as operational fallback defaults" — that intent should be implemented or the class removed.

Resolution

Resolved 2026-05-17. Root cause confirmed — NotificationOptions was bound but never read. Implemented the documented-fallback intent rather than deleting it: NotificationDeliveryService now takes an optional IOptions<NotificationOptions> and uses its ConnectionTimeoutSeconds/MaxConcurrentConnections whenever the deployed SmtpConfiguration field is non-positive (a value on the row still wins). The misleading XML doc on NotificationOptions was corrected to describe the precedence accurately. The duplicate services.Configure<NotificationOptions> in Host/SiteServiceRegistration.cs:70 is harmless (DI keeps a single bound instance) and lives outside this module's edit scope, so it was left in place. Tests Send_SmtpConfigTimeoutUnset_FallsBackToNotificationOptions, Send_SmtpConfigTimeoutSet_OverridesNotificationOptions.

NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed

Severity Low
Category Concurrency & thread safety
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:237-255

Description

Three minor issues with the NS-007 concurrency limiter. (1) GetConcurrencyLimiter reads the plain (non-volatile) field _concurrencyLimiter at :242 outside _limiterLock as a fast path. The field is only written inside the lock (:252), so under the .NET memory model a concurrent thread is not guaranteed to observe the fully-constructed SemaphoreSlim via the lock-free read. In practice on x86/x64 this is benign and the double-checked write inside the lock is correct, but the unsynchronized read of a reference set under a lock is the classic lazy-init memory-model pitfall — use Volatile.Read/LazyInitializer/Lazy<T>. (2) The limiter is sized once from the first SmtpConfiguration ever seen and the ??= at :252 means it is never re-created. The design doc says one SMTP config is deployed per site, but that config can be redeployed with a different MaxConcurrentConnections; the new value is silently ignored for the lifetime of the (scoped, but potentially long-lived per request) service — and more importantly the limit is captured per NotificationDeliveryService instance, so behaviour depends on which scoped instance happens to create it first. (3) SemaphoreSlim implements IDisposable; _concurrencyLimiter is never disposed and NotificationDeliveryService does not implement IDisposable. A scoped service leaking one undisposed SemaphoreSlim per scope is a slow handle leak.

Recommendation

Replace the hand-rolled double-checked init with Lazy<SemaphoreSlim> or LazyInitializer.EnsureInitialized, or at minimum make the field access Volatile.Read/Volatile.Write. Consider hoisting the limiter to a singleton keyed by site (the limit is a per-site invariant, not a per-scope one) so redeployment with a changed limit and disposal are handled in one place; if it stays on the scoped service, implement IDisposable to dispose it.

Resolution

Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled double-checked init was replaced with a Lazy<SemaphoreSlim> — its publication is correctly synchronised, eliminating the lock-free read of a non-volatile reference. NotificationDeliveryService now implements IDisposable and disposes the limiter (if created) under the existing lock, with idempotent re-entry and an ObjectDisposedException guard in SendAsync/GetConcurrencyLimiter; the scoped DI registration disposes it per scope. The limiter remains scoped (not hoisted to a site singleton) — the design doc deploys one SMTP config per site and the per-instance capture is bounded; the redeploy-resize concern is acknowledged as low-impact and not changed here, since hoisting would require a registration change for marginal benefit. Tests Service_Dispose_DisposesConcurrencyLimiter plus the existing Send_MaxConcurrentConnections_LimitsConcurrentDeliveries.

NotificationService-019 — NotificationDeliveryService and INotificationDeliveryService are orphaned by the central-only redesign

Severity High
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:18-442, src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:20-21, src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33, src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:77

Resolution — Executed option 1. Deleted src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs, src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs (also retires NotificationResult + BufferedNotification), and the orphaned tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs suite; reduced AddNotificationService to the shared SMTP primitives (OAuth2TokenService, Func<ISmtpClientWrapper>, NotificationOptions), updated CompositionRootTests (assert the primitives instead of the dead types), and removed the Notification_Send_MockSmtp_Delivers assertion in IntegrationSurfaceTests (central delivery is covered by EmailNotificationDeliveryAdapterTests). Grep-verified grep -rn "INotificationDeliveryService\|NotificationDeliveryService\|NotificationResult\|BufferedNotification\|DeliverBufferedAsync" --include="*.cs" src/ tests/ before delete: zero production callers (only XML-doc cross-references in NS, MailKit wrapper, NotificationOptions and EmailNotificationDeliveryAdapter, plus the dead test files); cross-reference comments updated to remove the stale class references. dotnet build ZB.MOM.WW.ScadaBridge.slnx succeeds (0 warnings, 0 errors); affected test projects all pass (NotificationService.Tests 52/52, NotificationOutbox.Tests 86/86 on rerun — one flaky timing-sensitive Akka.TestKit test unrelated to NS-019, Host.Tests 205/205); IntegrationTests 64/66 with two pre-existing failures in NotificationOutboxFlowTests (SQLite "near IF: syntax error", reproducible on pristine main, unrelated to NS-019).

Description

The updated Component-NotificationService.md (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central Notification Outbox owns dispatch and delivery, calling an INotificationDeliveryAdapter supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites".

The current source does not match. NotificationDeliveryService is a site-shaped notification sender: it accepts (listName, subject, message), performs an immediate SMTP DeliverAsync, catches transient failures and buffers them to a StoreAndForwardCategory.Notification row, and exposes DeliverBufferedAsync as the matching S&F handler. That is precisely the old site-side flow the design doc says was removed. The doc explicitly notes "there is no … local SQLite copy" of notification lists at sites, yet DeliverBufferedAsync re-resolves the list from a repository expected to be reachable on the buffering node.

Who actually calls it?

  • Sites do not. SiteServiceRegistration.cs:33-38 documents the deliberate omission: "AddNotificationService() is intentionally NOT registered on the site path." Sites register NotificationForwarder (in ZB.MOM.WW.ScadaBridge.StoreAndForward) as the S&F handler for StoreAndForwardCategory.Notification (AkkaHostedService.cs:654-660), which Asks the central comms actor and never touches SMTP. ScriptRuntimeContext.NotifyHelper (in SiteRuntime) enqueues directly to S&F as a serialized NotificationSubmit, not via INotificationDeliveryService.SendAsync.
  • Central registers it (Program.cs:77 calls AddNotificationService) but no central component resolves it. The central notification dispatcher is NotificationOutboxActorINotificationDeliveryAdapterEmailNotificationDeliveryAdapter. The adapter is a full re-implementation of the connect/auth/send/disconnect sequence (see EmailNotificationDeliveryAdapter.cs:163-222) — it deliberately does not call NotificationDeliveryService.DeliverAsync (XML-doc on the adapter says "Reuses the ZB.MOM.WW.ScadaBridge.NotificationService SMTP machinery — ISmtpClientWrapper, SmtpTlsModeParser, OAuth2TokenService and the typed SmtpPermanentException", i.e. only the leaf primitives).

The NotificationDeliveryService class, its DeliverBufferedAsync, the Func<ISmtpClientWrapper> registration consumed only by it, and the INotificationDeliveryService interface (still in Commons) and NotificationResult record are therefore dead code that contradicts the design. Worse, every prior finding NS-001..NS-018 was reviewed and resolved against this dead path. The 56-test green test suite (NS-012 resolution note) exercises behaviour no production caller invokes — it gives a false sense of coverage. The misleading XML doc on NotificationDeliveryService ("WP-11/12: Notification delivery via SMTP") tells a maintainer this is the delivery path; the registration on central does the same.

Risk: an operator following the design doc will look here for "the central email delivery code" and find a parallel implementation that is never called; a future feature change (e.g. retry policy tweak) made here will silently have no effect; the Notify script-API end-to-end behaviour now depends on NotificationOutbox + EmailNotificationDeliveryAdapter + NotificationForwarder, none of which are tested in this module's suite.

Recommendation

Decide and execute one of:

  1. Delete NotificationDeliveryService, DeliverBufferedAsync, the BufferedNotification payload type, the Func<ISmtpClientWrapper> scoped registration (move it to NotificationOutbox if still needed there — it already has its own), and INotificationDeliveryService/NotificationResult in Commons. Reduce AddNotificationService to registering the shared primitives — OAuth2TokenService, ISmtpClientWrapper factory, NotificationOptions. Delete the NS-001..NS-018 tests that target the orphaned path; rebase the ones that exercise primitives (SmtpErrorClassifier, SmtpTlsModeParser, CredentialRedactor, EmailAddressValidator, MailKitSmtpClientWrapper, OAuth2TokenService) which remain genuinely shared. Update CompositionRootTests (tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:208-209) and IntegrationSurfaceTests (tests/ZB.MOM.WW.ScadaBridge.IntegrationTests/IntegrationSurfaceTests.cs:122-135) to drop the stale assertions.

  2. Keep the class as the central-only Email delivery primitive and rewrite EmailNotificationDeliveryAdapter to delegate to it. This is the smaller diff but the larger semantic burden — NotificationDeliveryService.SendAsync returns NotificationResult (Success / WasBuffered) which cannot encode the three-way DeliveryOutcome (Success / Transient / Permanent) the outbox needs, so the contract still has to change.

Recommended path is option 1: the parallel implementation in EmailNotificationDeliveryAdapter is already complete and matches the new design's DeliveryOutcome model; salvaging the old class would re-introduce the very inversion this redesign removed.

NotificationService-020 — NS-001 fix superseded; AkkaHostedService would register two competing Notification S&F handlers if both code paths ran

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.Host/Actors/AkkaHostedService.cs:654-660, NS-001 resolution note (this file)

Resolution (2026-05-28): Added a _notificationDeliveryHandlerRegistered sentinel field on AkkaHostedService and gated the canonical NotificationForwarder registration with an InvalidOperationException guard — a future code path that re-introduces the dead NS-001 site-SMTP handler now fails fast at startup with an explicit NS-020 diagnostic, rather than silently overwriting RegisterDeliveryHandler's last-write-wins map and inverting the central-only design. The sentinel's XML doc cross-references NS-001/NS-019/NS-020 so a maintainer searching for the Notification S&F handler finds the one canonical registration and its history.

Description

NS-001 was resolved by registering an S&F → DeliverBufferedAsync handler for StoreAndForwardCategory.Notification at site startup in AkkaHostedService. The current source registers a different handler for the same category at AkkaHostedService.cs:654-660NotificationForwarder.DeliverAsync, which forwards to central instead of sending SMTP. StoreAndForwardService.RegisterDeliveryHandler (verified by reading StoreAndForward/StoreAndForwardService.cs around line 109) takes a single handler per category — last-write-wins or first-write-wins, either way the two registrations cannot both be active.

The NS-001 resolution note in this file describes a state of the code that no longer exists: it says the handler "is now registered at site startup in AkkaHostedService" and points to a handler resolving NotificationDeliveryService via a fresh DI scope. That registration is gone from the current AkkaHostedService (only ExternalSystem, CachedDbWrite, and the NotificationForwarder-based Notification registration are present at the current location). So the NS-001 fix has been silently rolled back / replaced as part of the central-only redesign.

The risk this finding tracks is not the current state per se — NotificationForwarder registration is correct under the new design — but the stale resolution note plus the fact that NotificationDeliveryService.DeliverBufferedAsync still exists in this module and is still tested as an S&F handler. A future merge or revert that re-introduces the NS-001-style registration (because it is what the test suite shape implies) would conflict with NotificationForwarder. The two handlers do diametrically opposite things (forward to central vs. send SMTP locally on a site where there is no SMTP config), so a misregistration would cause a silent regression of the design inversion.

Recommendation

Mark the NS-001 resolution note in this file as superseded by NS-019 with a one-line note explaining that the registration was removed when sites stopped delivering. Delete the orphan DeliverBufferedAsync and its tests as part of the NS-019 work. Add a comment on NotificationForwarder registration in AkkaHostedService cross-referencing NS-019/NS-020 so a maintainer searching for the Notification S&F handler finds the one canonical registration.

NotificationService-021 — OAuth2 SASL constructed with empty user identifier; M365 SMTP will reject the auth handshake

Severity High
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:76-79

Description

case "oauth2":
    // OAuth2 token is passed directly as credentials (pre-fetched by token service)
    var oauth2 = new SaslMechanismOAuth2("", credentials);
    await _client.AuthenticateAsync(oauth2, cancellationToken);
    break;

SaslMechanismOAuth2(string userName, string token) — MailKit's XOAUTH2 mechanism — sends the SASL initial response as user=<userName>\x01auth=Bearer <token>\x01\x01. Microsoft 365 (and most OAuth2-enabled SMTP relays) require the userName field to be the From mailbox identity the token was issued for; an empty string is rejected with a server response like 535 5.7.3 Authentication unsuccessful ("Either the user identity does not match the principal in the token, or the user is empty"). Office 365's documentation for SMTP AUTH XOAUTH2 calls this out explicitly.

The token-fetch path supports this: OAuth2TokenService.GetTokenAsync issues a Client Credentials grant against login.microsoftonline.com/{tenantId}/oauth2/v2.0/token with scope=https://outlook.office365.com/.default, which is the Microsoft 365 SMTP send scope — meaning the intended target is M365 SMTP, which is precisely the server that rejects an empty user. The SmtpConfiguration.FromAddress field is exactly the user identity that should be passed.

This bug is not caught by tests because every existing test uses a fake ISmtpClientWrapper (Substitute.For<ISmtpClientWrapper>(), RecordingAuthClient, etc.) — MailKitSmtpClientWrapper.AuthenticateAsync is never exercised against a real SaslMechanismOAuth2. The OAuth2 delivery test (NS-012, Send_OAuth2Config_AuthenticatesWithResolvedAccessToken) only asserts the wrapper's AuthenticateAsync is invoked with ("oauth2", "<access-token>"); the wrapper itself is mocked out. The same defect is present in EmailNotificationDeliveryAdapter only because it routes through this same AuthenticateAsync method.

Recommendation

Pass the sender mailbox into the wrapper's AuthenticateAsync path. The cleanest fix is to thread config.FromAddress (or a dedicated oauth2UserName parameter) through ISmtpClientWrapper.AuthenticateAsync so the OAuth2 branch can construct new SaslMechanismOAuth2(config.FromAddress, credentials). Add an integration-style test that runs MailKitSmtpClientWrapper.AuthenticateAsync against a stub SmtpClient and asserts the XOAUTH2 initial-response bytes contain the expected user=<from> field, so this regression is caught next time.

NotificationService-022 — MailKitSmtpClientWrapper holds a long-lived SmtpClient; combined with per-send factory, the design comment about pooling is contradicted

Severity Low
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:14, src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:19

Resolution (2026-05-28): Took option (a) — updated the class-level XML doc on MailKitSmtpClientWrapper with an explicit lifetime section (NS-022) describing single-connection ownership and the per-delivery factory contract (NOT a pool; MailKit.Net.Smtp.SmtpClient holds one TCP/TLS connection and is not thread-safe; the DI Func<ISmtpClientWrapper> is a factory, callers run connect/auth/send/disconnect/dispose per send). A field-level comment was added to the _client declaration cross-referencing the class docs so a maintainer hunting from the field-initializer immediately sees the constraint. The wrapper code itself is unchanged — option (b) (transient SmtpClient per Send) was deliberately not taken because it would re-handshake TLS per email which is materially more expensive than the current per-delivery factory model the callers already implement. The concurrent-connection limit regression on the central-side delivery path (per-site semaphore on EmailNotificationDeliveryAdapter) is out of NotificationService scope and tracked separately.

Description

MailKitSmtpClientWrapper declares private readonly SmtpClient _client = new(); — a single SmtpClient is constructed when the wrapper is constructed and lives for the wrapper's lifetime. The DI registration is services.AddSingleton<Func<ISmtpClientWrapper>>(_ => () => new MailKitSmtpClientWrapper()); (ServiceCollectionExtensions.cs:19) — every invocation of the factory creates a new wrapper and therefore a new SmtpClient. NotificationDeliveryService.DeliverAsync (the orphan, per NS-019) and EmailNotificationDeliveryAdapter.SendAsync both invoke the factory per send and dispose the wrapper at end of send. So in practice there is no connection pooling — every send pays a full TCP+TLS handshake.

This is internally consistent (and matches MailKit guidance — SmtpClient is not thread-safe and reusing across deliveries needs careful guarding). However:

  1. The XML on the wrapper class says nothing about lifetime; the field-initializer new SmtpClient() implies a reusable connection. A maintainer might "fix" the factory to reuse a single wrapper (singleton) believing they are enabling pooling, and immediately introduce a concurrency bug: MailKit.SmtpClient rejects concurrent send calls and the wrapper carries no synchronization.
  2. ConnectAsync mutates _client.Timeout (MailKitSmtpClientWrapper.cs:39-42) every time it runs. If a wrapper is ever reused across deliveries with different SmtpConfiguration.ConnectionTimeoutSeconds values, the timeout is silently overwritten — not a current bug, but a latent footgun.
  3. The design doc requirement "Max concurrent connections (default 5)" is currently honoured by the NS-007 SemaphoreSlim on NotificationDeliveryService, but EmailNotificationDeliveryAdapter has no equivalent throttle — see EmailNotificationDeliveryAdapter.cs:163-222, no semaphore. So on central, where the actual delivery now happens, the design-doc concurrency limit is no longer enforced. This is a regression introduced by the redesign — the outbox does not carry NS-007's limiter forward.

Recommendation

Document the per-send lifecycle on MailKitSmtpClientWrapper (XML on the class: "one wrapper per delivery; the wrapper owns a single SmtpClient that is connected/authenticated/sent/disconnected/disposed once"). Either move the NS-007 SemaphoreSlim into a shared per-site holder consumed by EmailNotificationDeliveryAdapter, or accept the loss and update the design doc. Add [Obsolete] or internal to discourage re-using a wrapper across sends.

NotificationService-023 — XML docs on the orphaned classes still describe the removed site-delivery flow; misleading to maintainers

Severity Low
Category Documentation & comments
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/NotificationDeliveryService.cs:12-17, src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/INotificationDeliveryService.cs:3-12, src/ZB.MOM.WW.ScadaBridge.NotificationService/ServiceCollectionExtensions.cs:8-9

Resolution (2026-05-28): Closed by NS-019 — both NotificationDeliveryService.cs and INotificationDeliveryService.cs were removed in commit ac96b83, and ServiceCollectionExtensions.AddNotificationService's XML doc was rewritten in the same commit to describe the central-only design (shared SMTP primitives consumed by EmailNotificationDeliveryAdapter, with an explicit NS-019 cross-reference and a note that sites no longer deliver notifications). No stale XML docs remain in this module.

Description

XML comments still claim the dead path is the live path:

  • NotificationDeliveryService class summary: "WP-11: Notification delivery via SMTP. WP-12: Error classification and S&F integration. Transient: connection refused, timeout, SMTP 4xx → hand to S&F. Permanent: SMTP 5xx → returned to script." This is the pre-redesign behaviour. The site-S&F branch in particular is dead (see NS-019), and "returned to script" is no longer accurate — Notify.Send is async and never returns a permanent error to the script per the design doc.
  • INotificationDeliveryService (Commons): "Interface for sending notifications. Implemented by NotificationService, consumed by ScriptRuntimeContext." Verified against source: ScriptRuntimeContext does not consume this interface — it enqueues directly to StoreAndForwardService (see SiteRuntime/Scripts/ScriptRuntimeContext.cs:1770-1774). The Commons-level claim therefore documents an interaction that no longer exists.
  • NotificationResult is a record returned only by the orphaned SendAsync. The Notification Outbox uses DeliveryOutcome instead, which encodes the Success/Transient/Permanent three-way that NotificationResult(Success, ErrorMessage, WasBuffered) cannot.
  • ServiceCollectionExtensions.AddNotificationService XML doc says "Registers the notification delivery services (SMTP, OAuth2 token, delivery adapter)" — no mention that the central-only redesign means most of what it registers is unused.

A reader following the XML docs from any entry point ends up at a path that does not run. The CLAUDE.md "External Integrations" section and Component-NotificationService.md describe the new design; the in-source docs contradict them.

Recommendation

Tied to NS-019: if the orphan classes are deleted, this finding closes itself. If they are kept temporarily, prepend each summary with "Obsolete — superseded by NotificationOutbox's EmailNotificationDeliveryAdapter. Retained for transitional compatibility; do not add new callers." and update INotificationDeliveryService's summary to reflect the inverted flow or remove the interface.

NotificationService-024 — No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal

Severity Medium
Category Testing coverage
Status Resolved
Location tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/NotificationDeliveryServiceTests.cs, tests/ZB.MOM.WW.ScadaBridge.IntegrationTests/IntegrationSurfaceTests.cs:118-136, tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:207-209

Resolution (2026-05-28): Closed by NS-019 — the orphaned NotificationDeliveryService class, its INotificationDeliveryService Commons interface, and the associated NotificationDeliveryServiceTests.cs test file (~40 tests asserting SendAsync/DeliverBufferedAsync behaviour against a code path no production caller resolves) were all deleted in the NS-019 fix commit. Verification: a directory listing of tests/ZB.MOM.WW.ScadaBridge.NotificationService.Tests/ shows only CredentialRedactorTests.cs, MailKitSmtpClientWrapperTests.cs, NotificationOptionsTests.cs, OAuth2TokenServiceTests.cs, SmtpErrorClassifierTests.cs, and SmtpTlsModeParserTests.cs — every retained file exercises a primitive that the central NotificationOutbox EmailNotificationDeliveryAdapter still depends on, so the false-coverage signal this finding called out no longer exists. The "no test affirms the central-only invariant" gap was the consequence of the orphaned tests existing; with them gone, the module test suite genuinely scopes to the shared SMTP primitives. The architecture-test recommendation (banning new consumers of INotificationDeliveryService) is moot once the interface itself is gone.

Description

The module test suite has 56 tests; counting NotificationDeliveryServiceTests.cs, ~40 of them exercise NotificationDeliveryService.SendAsync/DeliverBufferedAsync — code paths that, per NS-019, no production caller resolves. They pass against the orphaned class and so the suite stays green, but the green is a false signal: changing the dead implementation (or deleting it) does not flag any regression in the live notification-delivery flow, which now lives in EmailNotificationDeliveryAdapter (covered by NotificationOutbox's own tests) and NotificationForwarder (covered, if at all, by StoreAndForward's tests).

In particular there is no test in this module that affirms the central-only invariant the design doc requires:

  • No test that AddNotificationService() registered on a site role would be inert / no-op'd, or that SiteServiceRegistration.Configure does not call AddNotificationService (an obvious regression vector — re-adding it would silently restore the orphaned site-delivery path).
  • No test that confirms INotificationDeliveryService has no production consumer (i.e. an architecture test that fails if anyone re-introduces a constructor parameter or GetRequiredService<INotificationDeliveryService>() call).
  • The cross-module CompositionRootTests (tests/ZB.MOM.WW.ScadaBridge.Host.Tests/CompositionRootTests.cs:208-209) still asserts NotificationDeliveryService and INotificationDeliveryService are registered, locking in the orphan rather than catching it.
  • IntegrationSurfaceTests.cs:122-125 constructs NotificationDeliveryService directly to validate "the integration surface" — testing a surface that no script actually crosses.

Recommendation

After NS-019 is decided:

  1. If the orphan is deleted, remove the orphaned-path tests (NS-001/004/005/007/008/009/010/014/015/016/017/018-style tests targeting SendAsync/DeliverBufferedAsync). Retain SmtpErrorClassifierTests, SmtpTlsModeParserTests, CredentialRedactorTests, OAuth2TokenServiceTests, and MailKitSmtpClientWrapperTests (primitives genuinely shared). Update CompositionRootTests to drop the stale rows and IntegrationSurfaceTests to call the live path via INotificationDeliveryAdapter/EmailNotificationDeliveryAdapter.
  2. Add a one-shot architecture test in tests/ZB.MOM.WW.ScadaBridge.Architecture.Tests (if it exists, else this module) that scans for direct references to INotificationDeliveryService outside this project and the obsolete-interface declaration in Commons, failing if any new consumer reappears.

NotificationService-025 — CredentialRedactor over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text

Severity Low
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationService/CredentialRedactor.cs:34-48

Resolution (2026-05-28): Tightened the policy per the recommendation — only the LAST colon-separated component (password in Basic / clientSecret in OAuth2) is considered a secret, AND only when it's plausibly secret-shaped (>= MinSecretLength = 12 chars). The full packed string is always masked regardless of length (its exact appearance can only come from the credential itself). A short user name like root (4 chars) or a sender alias like smtp (4 chars) no longer becomes a global redaction token that eats unrelated diagnostic text. 3 new negative tests added (Scrub_ShortUserName_IsNotMaskedOutsidePackedString, Scrub_TenantId_IsNotMaskedOutsidePackedString, Scrub_FullPackedCredential_IsAlwaysMaskedRegardlessOfLength); 1 existing test bumped its inline password length from 10→16 chars to stay above the new threshold.

Description

var parts = credentials.Split(':')
    .Where(p => p.Length >= 4)
    .Append(credentials)
    .Distinct()
    .OrderByDescending(p => p.Length);

foreach (var part in parts)
{
    result = result.Replace(part, Mask, StringComparison.Ordinal);
}

The threshold p.Length >= 4 is permissive enough that common short identifiers used by operators become aggressive global redaction tokens:

  • A Basic-Auth credential of root:hunter2 produces components ["root", "hunter2", "root:hunter2"]. Every literal root anywhere in the exception/log text is masked — including unrelated mentions like file paths (/root/.config) or default-account names in the server's reply. This obscures legitimate diagnostic information without protecting any additional secret.
  • An OAuth2 tenant id is a GUID (long, safe). The client id is typically a GUID. The client secret is the high-entropy part. The full tenant:client:secret is the actual sensitive triple. A tenant GUID embedded in unrelated text (a tenant-bound error code, a partial URL) will be masked even when the appearance is non-sensitive.
  • The user name in Basic Auth is sometimes the From address (scada-notifications@company.com) — masking the company's notification mailbox in every log line that mentions it has real operational cost.

The function also uses String.Replace ordinarily, not word-boundary aware — a 4-char prefix that happens to be a substring of a longer benign token gets eaten.

The threshold is a defence-in-depth choice; the existing tests assert that Hunter2pw! and Sup3rSecretValue are masked (good) and that null text/credentials are handled (good), but nothing pins the negative behaviour: e.g. a test that a 4-char user name root is not also masked when it appears in an unrelated path.

Recommendation

Tighten the redaction policy: mask only the obviously-secret components — the password (Basic), the client secret (OAuth2), and the whole packed string — not the user name / tenant / client id. The simplest implementation is to redact only the last colon-separated component (the secret) plus the full packed string. Bump the per-component minimum length to something high enough that a typical short user name does not match (≥ 12 chars is the usual heuristic for a password). Add a test asserting Scrub("/root/.config", "root:hunter2") does not mask /root/.config's root.