docs(code-reviews): re-review batch 3 at 39d737e — Host, InboundAPI, ManagementService, NotificationService, Security

21 new findings: Host-012..015, InboundAPI-014..017, ManagementService-014..017, NotificationService-014..018, Security-012..015.
This commit is contained in:
Joseph Doherty
2026-05-17 00:48:25 -04:00
parent 89636e2bbf
commit 3b3760f026
6 changed files with 873 additions and 41 deletions

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.NotificationService` |
| Design doc | `docs/requirements/Component-NotificationService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 0 |
| Commit reviewed | `39d737e` |
| Open findings | 5 |
## Summary
@@ -30,6 +30,31 @@ plaintext `string` values. Test coverage exercises the happy path and the main
error branches but misses the OAuth2 delivery path, the permanent-classification
fallback in `DeliverAsync`, and concurrency on the token cache.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed at commit `39d737e` after the NS-001..013 fixes. All twelve prior
findings remain closed (eleven Resolved, NS-013 Deferred) and the fixes hold up
against the current source — error classification is now typed, the SMTP client is
created once and always disconnected/disposed, the S&F `Notification` delivery
handler is registered in `AkkaHostedService`, and the OAuth2 token cache is keyed
per credential. The re-review surfaced **five new findings**. The recurring theme is
**unclassified-exception handling**: the typed `ClassifySmtpError` helper introduced
by NS-002/003 returns `Unknown` for anything that is not a recognised SMTP/socket
failure, but neither `SendAsync` nor `DeliverBufferedAsync` has a catch-all for the
`Unknown` bucket. As a result an OAuth2 token-fetch failure (`HttpRequestException`
from `EnsureSuccessStatusCode`, or `InvalidOperationException` from a malformed
credential string) escapes `SendAsync` as a raw exception to the calling script
(NS-015) and escapes `DeliverBufferedAsync` out of the S&F handler, where the engine
treats any thrown exception as transient and retries a permanently-broken config
forever (NS-014) — the same defect class NS-008 fixed for address parsing, but the
OAuth2 path was never covered. Secondary findings: `AuthenticateAsync` silently
proceeds unauthenticated for an unknown `authType` or empty credentials (NS-016);
`NotificationOptions` is bound from configuration in two places but never read by
any code (NS-017, dead config — NS-007 sourced the timeout/limit from
`SmtpConfiguration` instead); and the lazily-created concurrency limiter is read
outside its lock, is sized once and never resized on redeployment, and is never
disposed (NS-018).
## Checklist coverage
| # | Category | Examined | Notes |
@@ -465,3 +490,108 @@ filed) and the genuinely-missing tests were added:
`AkkaHostedService` handler registration), so no further test is required here.
Module test suite is green at 56 tests.
### NotificationService-014 — OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:214-228`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312`, `src/ScadaLink.NotificationService/OAuth2TokenService.cs:56-84` |
**Description**
`DeliverBufferedAsync` is the registered Store-and-Forward delivery handler for `StoreAndForwardCategory.Notification`. Per the S&F handler contract (`StoreAndForwardService.cs:40-53`), a handler must return `true` for success, `false` for permanent failure (park, no further retries), and **throw only for a transient failure** (retry). `DeliverBufferedAsync`'s `try` block (`:214-228`) catches only `SmtpPermanentException`; the comment at `:227` asserts "Transient SMTP errors propagate out of `DeliverAsync`". But `DeliverAsync` can also throw exceptions that `ClassifySmtpError` buckets as `Unknown` — most importantly the OAuth2 path at `:308-312`, where `OAuth2TokenService.GetTokenAsync` throws `HttpRequestException` (from `EnsureSuccessStatusCode` at `OAuth2TokenService.cs:78`) or `InvalidOperationException` (malformed credential triple, or missing `access_token`). None of these is an `SmtpCommandException`/`SmtpProtocolException`/`SocketException`/`TimeoutException`, so they are neither caught nor classified — they propagate straight out of `DeliverBufferedAsync`. The S&F engine treats *any* thrown exception as a transient failure, so a notification whose SMTP config has an invalid client secret (a 401 → `HttpRequestException`, a genuinely **permanent** condition) is retried on every sweep until `MaxRetries` is reached — burning OAuth2 token-endpoint calls and never parking promptly. A malformed credential string (`InvalidOperationException`, never fixable) is likewise retried instead of parked immediately.
**Recommendation**
Add a catch-all to `DeliverBufferedAsync` for exceptions that `ClassifySmtpError` classifies as `Unknown`: log and return `false` (park) for non-retryable causes such as `InvalidOperationException` from credential parsing, and decide deliberately whether an OAuth2 `HttpRequestException` should park or retry (a 5xx token endpoint is transient; a 401 is permanent — inspect `HttpRequestException.StatusCode`). Do not let an unclassified exception leave the handler and be silently reinterpreted as transient.
**Resolution**
_Unresolved._
### NotificationService-015 — Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96-148`, `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:308-312` |
**Description**
`SendAsync`'s `try`/`catch` around `DeliverAsync` has exactly three catch clauses: `SmtpPermanentException` (`:101`), `OperationCanceledException` when cancellation was requested (`:111`), and `Exception` filtered by `IsTransientSmtpError` (`:116`). There is no catch-all for an exception that is none of those. `ClassifySmtpError` returns `SmtpErrorClass.Unknown` for anything it does not recognise, and `IsTransientSmtpError` only matches `Transient`, so an `Unknown`-classified exception falls through all three clauses and escapes `SendAsync` unhandled. Concretely, the OAuth2 branch at `:308-312` calls `OAuth2TokenService.GetTokenAsync`, which throws `HttpRequestException` (token endpoint unreachable / 4xx / 5xx) or `InvalidOperationException` (malformed `tenant:client:secret` triple, or no `access_token` in the response). Both reach the calling script (an instance/alarm/shared script) as a raw exception of a type the `INotificationDeliveryService` contract never advertises — the design doc says delivery returns either success, a buffered transient result, or a permanent `NotificationResult` error. `Notify.To().Send()` against an OAuth2 config with a bad secret therefore crashes the script instead of returning a clean `NotificationResult(false, ...)`. This is the same defect class NS-008 resolved for malformed addresses, but NS-008's `ValidateAddresses` fix never covered the OAuth2 token path. (A non-cancellation `OperationCanceledException` — e.g. an internal MailKit timeout surfaced as OCE while the caller's token is *not* cancelled — would escape the same way.)
**Recommendation**
Add a final `catch (Exception ex)` to `SendAsync` that converts any otherwise-unhandled exception into a permanent `NotificationResult(false, ...)` (credential-scrubbed, consistent with NS-009), and log it. Treating an unclassified failure as permanent is the safe default — it returns control to the script rather than crashing it; an OAuth2 transient (token endpoint 5xx) being reported as permanent is an acceptable trade against an unhandled crash, and can be refined by inspecting `HttpRequestException.StatusCode` if transient buffering of token failures is wanted.
**Resolution**
_Unresolved._
### NotificationService-016 — `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:46-67` |
**Description**
`MailKitSmtpClientWrapper.AuthenticateAsync` returns without authenticating in two silent paths. (1) `if (string.IsNullOrEmpty(credentials)) return;` (`:48-49`) — if an SMTP config has `AuthType` "basic" or "oauth2" but a null/empty `Credentials` value (a misconfigured or partially-deployed row), the method returns and `DeliverAsync` proceeds straight to `SendAsync` on an *unauthenticated* connection. (2) The `switch (authType.ToLowerInvariant())` (`:51-66`) has cases only for `"basic"` and `"oauth2"` and **no `default`** — any other `AuthType` value (a typo, a future "ntlm", an empty string) falls through with no authentication attempted. Additionally, in the `"basic"` case, `credentials.Split(':', 2)` that yields fewer than 2 parts (a credential string with no colon) is silently skipped (`:55-58`). In every case the connection then attempts to send mail unauthenticated: against a relay that does not require auth this masks a misconfiguration; against a server that does require auth it fails later with a less obvious error, and at worst an unauthenticated send succeeds where it should not. Authentication being skipped should never be silent.
**Recommendation**
Make missing/unparseable credentials and an unrecognised `AuthType` hard errors: throw a typed exception (e.g. a permanent configuration exception so `SendAsync`/`DeliverBufferedAsync` surface it as a clean failure / park) rather than returning. Add a `default:` arm to the `switch` that throws `ArgumentException` naming the unsupported auth type, and reject a "basic" credential string that does not split into exactly two parts.
**Resolution**
_Unresolved._
### NotificationService-017 — `NotificationOptions` is bound from configuration but never read (dead config)
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationOptions.cs:1-15`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:10-11`, `src/ScadaLink.Host/SiteServiceRegistration.cs:70` |
**Description**
`NotificationOptions` (with `ConnectionTimeoutSeconds` and `MaxConcurrentConnections`) is bound from the `ScadaLink:Notification` configuration section in two places — `ServiceCollectionExtensions.AddNotificationService` (`AddOptions<NotificationOptions>().BindConfiguration(...)`) and again in `Host/SiteServiceRegistration.cs:70` (`services.Configure<NotificationOptions>(...)`). However, a repo-wide search shows no code ever injects `IOptions<NotificationOptions>` or otherwise reads either property. When NS-007 enforced the connection timeout and concurrency limit, it sourced both values from the per-site `SmtpConfiguration` entity, not from `NotificationOptions` — so this options class is now entirely dead configuration. Its XML doc still claims it "provides fallback defaults and operational limits", which is misleading: nothing falls back to it. The double binding is also redundant. Dead, falsely-documented configuration invites an operator to set `ScadaLink:Notification:ConnectionTimeoutSeconds` and expect it to take effect, when it has no effect at all.
**Recommendation**
Either delete `NotificationOptions` and both of its registrations, or genuinely wire it in as the documented fallback (e.g. `DeliverAsync` uses `NotificationOptions` values when the `SmtpConfiguration` field is non-positive). If kept, remove the duplicate `Configure` call and correct the XML doc. The NS-007 resolution note already states the `NotificationOptions` fields "remain as operational fallback defaults" — that intent should be implemented or the class removed.
**Resolution**
_Unresolved._
### NotificationService-018 — Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed
| | |
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:237-255` |
**Description**
Three minor issues with the NS-007 concurrency limiter. (1) `GetConcurrencyLimiter` reads the plain (non-`volatile`) field `_concurrencyLimiter` at `:242` *outside* `_limiterLock` as a fast path. The field is only written inside the lock (`:252`), so under the .NET memory model a concurrent thread is not guaranteed to observe the fully-constructed `SemaphoreSlim` via the lock-free read. In practice on x86/x64 this is benign and the double-checked write inside the lock is correct, but the unsynchronized read of a reference set under a lock is the classic lazy-init memory-model pitfall — use `Volatile.Read`/`LazyInitializer`/`Lazy<T>`. (2) The limiter is sized once from the *first* `SmtpConfiguration` ever seen and the `??=` at `:252` means it is never re-created. The design doc says one SMTP config is deployed per site, but that config can be **redeployed** with a different `MaxConcurrentConnections`; the new value is silently ignored for the lifetime of the (scoped, but potentially long-lived per request) service — and more importantly the limit is captured per `NotificationDeliveryService` instance, so behaviour depends on which scoped instance happens to create it first. (3) `SemaphoreSlim` implements `IDisposable`; `_concurrencyLimiter` is never disposed and `NotificationDeliveryService` does not implement `IDisposable`. A scoped service leaking one undisposed `SemaphoreSlim` per scope is a slow handle leak.
**Recommendation**
Replace the hand-rolled double-checked init with `Lazy<SemaphoreSlim>` or `LazyInitializer.EnsureInitialized`, or at minimum make the field access `Volatile.Read`/`Volatile.Write`. Consider hoisting the limiter to a singleton keyed by site (the limit is a per-site invariant, not a per-scope one) so redeployment with a changed limit and disposal are handled in one place; if it stays on the scoped service, implement `IDisposable` to dispose it.
**Resolution**
_Unresolved._