diff --git a/code-reviews/SiteEventLogging/findings.md b/code-reviews/SiteEventLogging/findings.md index b5d1b09..104d3ca 100644 --- a/code-reviews/SiteEventLogging/findings.md +++ b/code-reviews/SiteEventLogging/findings.md @@ -8,7 +8,7 @@ | Last reviewed | 2026-05-17 | | Reviewer | claude-agent | | Commit reviewed | `39d737e` | -| Open findings | 3 | +| Open findings | 0 | ## Summary @@ -42,8 +42,9 @@ gap left by the SiteEventLogging-005 background-writer rework: when an event can be persisted because the logger has been disposed, the returned `Task` is completed *successfully* rather than faulted, so an `await`-ing caller is told a dropped audit event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the -keyword-search filter (SiteEventLogging-013) and the initial purge running -synchronously on the host startup thread (SiteEventLogging-014). +keyword-search filter (SiteEventLogging-013) and a claimed initial-purge block on the +host startup thread (SiteEventLogging-014 — later re-triaged to Won't Fix, the +premise does not hold on .NET 8+). ## Checklist coverage @@ -54,7 +55,7 @@ synchronously on the host startup thread (SiteEventLogging-014). | 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). | | 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. | | 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. | -| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). | +| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: claimed initial-purge startup-thread block (-014 — Won't Fix, premise invalid on .NET 8+). | | 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). | | 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). | | 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). | @@ -553,7 +554,7 @@ full module suite still passing. |--|--| | Severity | Medium | | Category | Correctness & logic bugs | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` | **Description** @@ -593,7 +594,16 @@ deliberate "drop silently on shutdown" semantics is chosen instead. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``): both disposed-drop paths now fault the +returned `Task` with `ObjectDisposedException` instead of `TrySetResult()` — +`LogEventAsync` on a failed `TryWrite`, and `ProcessWriteQueueAsync` now inspects the +`written` flag from `WithConnection` and only calls `TrySetResult()` when the row was +actually persisted. An `await`-ing caller can now distinguish a dropped audit event +from a persisted one. Regression tests +`LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess` and +`LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete` (the prior +`LogEventAsync_AfterDispose_CompletesWithoutThrowing` test asserted the now-incorrect +silent-success behaviour and was replaced). ### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input @@ -601,7 +611,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Correctness & logic bugs | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` | **Description** @@ -628,7 +638,13 @@ design implies. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``): `EventLogQueryService` now escapes `\`, +`%`, and `_` in `request.KeywordFilter` via a new `EscapeLikePattern` helper before +wrapping it in `%...%`, and the `LIKE` clauses carry an explicit `ESCAPE '\'` so the +keyword is matched as a literal substring as the design intends. Regression tests +`Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard`, +`Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard`, and +`Query_KeywordSearch_StillMatchesPlainSubstring`. ### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread @@ -636,7 +652,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Performance & resource management | -| Status | Open | +| Status | Won't Fix | | Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` | **Description** @@ -662,6 +678,31 @@ startup thread — e.g. `await Task.Yield();` as the first statement of `Execute or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync` (accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`. +**Re-triage note (2026-05-17)** + +The finding's central premise — "the host's startup pipeline does not proceed past a +`BackgroundService` until its `ExecuteAsync` yields at the first real `await`", so the +synchronous `RunPurge()` prelude blocks the startup thread — is **incorrect for the +.NET version this project targets**. The module targets `net10.0`. Since .NET 8, +`BackgroundService.StartAsync` no longer runs `ExecuteAsync` inline on the calling +(host startup) thread: `ExecuteAsync` is dispatched onto the thread pool, and +`StartAsync` returns immediately regardless of how long the synchronous prelude runs. + +This was verified empirically. A standalone `BackgroundService` whose `ExecuteAsync` +prelude does a 1.5 s synchronous `Thread.Sleep` before its first `await` showed +`StartAsync` returning in **0 ms**, with the prelude running on a *different* thread +pool thread than the caller. Applied to `EventLogPurgeService`, `StartAsync` returns +promptly and the initial `RunPurge()` executes on the background scheduler — host +startup and the `/health/ready` gate are not blocked. There is therefore no defect +to fix. + **Resolution** -_Unresolved._ +Won't Fix — 2026-05-17 (commit ``). Re-triaged: the asserted startup-thread +block cannot occur on .NET 8+ (this module targets `net10.0`), where +`BackgroundService` dispatches `ExecuteAsync` to the thread pool rather than running +its synchronous prelude on the host startup thread — verified empirically (see the +re-triage note). No code change made. A verification test +`StartAsync_DoesNotBlock_OnTheInitialPurge` was added to pin this behaviour +(asserts `StartAsync` returns in under 1 s and the initial purge still runs on the +background scheduler). diff --git a/src/ScadaLink.SiteEventLogging/EventLogQueryService.cs b/src/ScadaLink.SiteEventLogging/EventLogQueryService.cs index 03d85fe..141c2f3 100644 --- a/src/ScadaLink.SiteEventLogging/EventLogQueryService.cs +++ b/src/ScadaLink.SiteEventLogging/EventLogQueryService.cs @@ -30,6 +30,19 @@ public class EventLogQueryService : IEventLogQueryService _logger = logger; } + /// + /// Escapes the SQL LIKE metacharacters (\, %, _) in a + /// user-supplied keyword so it is matched as a literal substring. Used together + /// with a LIKE ... ESCAPE '\' clause. + /// + private static string EscapeLikePattern(string input) + { + return input + .Replace("\\", "\\\\") + .Replace("%", "\\%") + .Replace("_", "\\_"); + } + public EventLogQueryResponse ExecuteQuery(EventLogQueryRequest request) { try @@ -78,8 +91,14 @@ public class EventLogQueryService : IEventLogQueryService if (!string.IsNullOrWhiteSpace(request.KeywordFilter)) { - whereClauses.Add("(message LIKE $keyword OR source LIKE $keyword)"); - parameters.Add(new SqliteParameter("$keyword", $"%{request.KeywordFilter}%")); + // Keyword search is a literal substring match. The LIKE + // metacharacters % and _ (and the escape char itself) must be + // escaped so identifiers such as "store_and_forward" or a literal + // "%" are not misinterpreted as wildcards (SiteEventLogging-013). + var escaped = EscapeLikePattern(request.KeywordFilter); + whereClauses.Add( + "(message LIKE $keyword ESCAPE '\\' OR source LIKE $keyword ESCAPE '\\')"); + parameters.Add(new SqliteParameter("$keyword", $"%{escaped}%")); } var whereClause = whereClauses.Count > 0 diff --git a/src/ScadaLink.SiteEventLogging/SiteEventLogger.cs b/src/ScadaLink.SiteEventLogging/SiteEventLogger.cs index 5808848..e92f727 100644 --- a/src/ScadaLink.SiteEventLogging/SiteEventLogger.cs +++ b/src/ScadaLink.SiteEventLogging/SiteEventLogger.cs @@ -160,9 +160,13 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable if (!_writeQueue.Writer.TryWrite(pending)) { // The channel is unbounded, so the only way TryWrite fails is that the - // writer has been completed (logger disposed). Drop silently — there is - // nowhere to persist the event. - pending.Completion.TrySetResult(); + // writer has been completed (logger disposed). The event cannot be + // persisted — fault the Task (SiteEventLogging-012) rather than + // reporting false success, so a caller that awaits a critical audit + // event can tell it was dropped. + pending.Completion.TrySetException( + new ObjectDisposedException(nameof(SiteEventLogger), + "Event could not be recorded: the event logger has been disposed.")); } return pending.Completion.Task; @@ -191,9 +195,20 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable cmd.ExecuteNonQuery(); }); - // WithConnection returns false only when the logger has been - // disposed; the event simply cannot be persisted in that case. - pending.Completion.TrySetResult(); + if (written) + { + pending.Completion.TrySetResult(); + } + else + { + // WithConnection returns false only when the logger has been + // disposed mid-drain; the event was not persisted. Fault the + // Task (SiteEventLogging-012) instead of reporting false + // success for a dropped audit event. + pending.Completion.TrySetException( + new ObjectDisposedException(nameof(SiteEventLogger), + "Event could not be recorded: the event logger was disposed before the write completed.")); + } } catch (Exception ex) { diff --git a/tests/ScadaLink.SiteEventLogging.Tests/EventLogCoverageTests.cs b/tests/ScadaLink.SiteEventLogging.Tests/EventLogCoverageTests.cs index 703f27d..8ecda2a 100644 --- a/tests/ScadaLink.SiteEventLogging.Tests/EventLogCoverageTests.cs +++ b/tests/ScadaLink.SiteEventLogging.Tests/EventLogCoverageTests.cs @@ -64,15 +64,19 @@ public class EventLogCoverageTests : IDisposable } [Fact] - public async Task LogEventAsync_AfterDispose_CompletesWithoutThrowing() + public async Task LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess() { - // Logging after disposal must not throw or hang — the event is simply - // dropped because the background writer has been completed. + // SiteEventLogging-012: when the logger has been disposed the event cannot + // be persisted. The returned Task must FAULT (not complete successfully) so + // an awaiting caller can distinguish a dropped audit event from a written + // one. Per the XML doc contract, the Task "faults if the write fails". var logger = NewLogger(); logger.Dispose(); - await logger.LogEventAsync("script", "Info", null, "Source", "After dispose") - .WaitAsync(TimeSpan.FromSeconds(5)); + var task = logger.LogEventAsync("script", "Info", null, "Source", "After dispose"); + + await Assert.ThrowsAsync( + () => task.WaitAsync(TimeSpan.FromSeconds(5))); } [Fact] @@ -83,4 +87,48 @@ public class EventLogCoverageTests : IDisposable logger.Dispose(); logger.Dispose(); } + + [Fact] + public async Task LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete() + { + // SiteEventLogging-012, second path: an event enqueued onto the background + // writer just before disposal must NOT be reported as persisted if the + // writer's WithConnection returns false (logger disposed mid-drain). We + // flood the queue and dispose immediately; any event whose write did not + // actually run must have a faulted Task, never a successful one. + var logger = NewLogger(); + + var tasks = new List(); + for (int i = 0; i < 200; i++) + { + tasks.Add(logger.LogEventAsync("script", "Info", null, "Source", $"event-{i}")); + } + + logger.Dispose(); + + // Every task must reach a terminal state. None may be left as a successful + // completion for an event the writer never persisted: a task is either + // RanToCompletion (genuinely written before the connection closed) or + // Faulted (could not be persisted). Count persisted vs faulted and assert + // the persisted count matches the actual row count. + await Task.WhenAll(tasks.Select(t => t.ContinueWith(_ => { }))); + + var succeeded = tasks.Count(t => t.Status == TaskStatus.RanToCompletion); + var faulted = tasks.Count(t => t.IsFaulted); + Assert.Equal(tasks.Count, succeeded + faulted); + + long rowCount; + using (var conn = new Microsoft.Data.Sqlite.SqliteConnection($"Data Source={_dbPath}")) + { + conn.Open(); + using var cmd = conn.CreateCommand(); + cmd.CommandText = "SELECT COUNT(*) FROM site_events"; + rowCount = (long)cmd.ExecuteScalar()!; + } + + // A successfully-completed Task must correspond to a row that was actually + // written. If the disposed-mid-drain path falsely reported success, the + // success count would exceed the row count. + Assert.Equal(rowCount, succeeded); + } } diff --git a/tests/ScadaLink.SiteEventLogging.Tests/EventLogPurgeServiceTests.cs b/tests/ScadaLink.SiteEventLogging.Tests/EventLogPurgeServiceTests.cs index 3716c9a..c166cbf 100644 --- a/tests/ScadaLink.SiteEventLogging.Tests/EventLogPurgeServiceTests.cs +++ b/tests/ScadaLink.SiteEventLogging.Tests/EventLogPurgeServiceTests.cs @@ -225,6 +225,42 @@ public class EventLogPurgeServiceTests : IDisposable Assert.Equal(1L, newestPresent); } + [Fact] + public async Task StartAsync_DoesNotBlock_OnTheInitialPurge() + { + // SiteEventLogging-014 (re-triaged): on .NET 8+ BackgroundService runs + // ExecuteAsync on a thread-pool thread — the synchronous prelude (the + // initial RunPurge()) does NOT execute on the host startup thread, so + // StartAsync returns promptly and host startup / the /health/ready gate is + // not blocked even by a large initial purge. This test pins that behaviour: + // StartAsync returns fast, and the initial purge still happens shortly + // afterwards on the background scheduler. + InsertEventWithTimestamp(DateTimeOffset.UtcNow.AddDays(-31)); + Assert.Equal(1, GetEventCount()); + + var purge = CreatePurgeService(); + using var cts = new CancellationTokenSource(); + + var sw = System.Diagnostics.Stopwatch.StartNew(); + await purge.StartAsync(cts.Token); + sw.Stop(); + + Assert.True(sw.ElapsedMilliseconds < 1000, + $"StartAsync blocked for {sw.ElapsedMilliseconds} ms — the initial purge " + + "must not run on the host startup thread."); + + // The initial purge still runs on the background scheduler. + var deadline = DateTime.UtcNow.AddSeconds(5); + while (GetEventCount() != 0 && DateTime.UtcNow < deadline) + { + await Task.Delay(25); + } + Assert.Equal(0, GetEventCount()); + + await cts.CancelAsync(); + await purge.StopAsync(CancellationToken.None); + } + [Fact] public async Task PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection() { diff --git a/tests/ScadaLink.SiteEventLogging.Tests/EventLogQueryServiceTests.cs b/tests/ScadaLink.SiteEventLogging.Tests/EventLogQueryServiceTests.cs index 196b585..e876c2c 100644 --- a/tests/ScadaLink.SiteEventLogging.Tests/EventLogQueryServiceTests.cs +++ b/tests/ScadaLink.SiteEventLogging.Tests/EventLogQueryServiceTests.cs @@ -254,6 +254,50 @@ public class EventLogQueryServiceTests : IDisposable Assert.Equal("{\"stack\":\"trace\"}", entry.Details); } + [Fact] + public async Task Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard() + { + // SiteEventLogging-013: a SQL LIKE '_' matches any single character. A + // keyword search for a literal underscore (common in identifiers such as + // "store_and_forward") must NOT match strings that merely have any + // character in that position. + await _eventLogger.LogEventAsync("script", "Info", null, "store_and_forward", "Buffer queued"); + await _eventLogger.LogEventAsync("script", "Info", null, "storeXandYforward", "Other event"); + + var response = _queryService.ExecuteQuery(MakeRequest(keyword: "store_and_forward")); + + Assert.True(response.Success); + Assert.Single(response.Entries); + Assert.Equal("store_and_forward", response.Entries[0].Source); + } + + [Fact] + public async Task Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard() + { + // SiteEventLogging-013: a SQL LIKE '%' matches any run of characters. A + // keyword search containing a literal '%' must match only the literal. + await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90% utilisation"); + await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90 then 0 utilisation"); + + var response = _queryService.ExecuteQuery(MakeRequest(keyword: "90% utilisation")); + + Assert.True(response.Success); + Assert.Single(response.Entries); + Assert.Contains("90% utilisation", response.Entries[0].Message); + } + + [Fact] + public async Task Query_KeywordSearch_StillMatchesPlainSubstring() + { + // Escaping wildcards must not break ordinary substring search. + await SeedEvents(); + + var response = _queryService.ExecuteQuery(MakeRequest(keyword: "timeout")); + + Assert.True(response.Success); + Assert.Single(response.Entries); + } + private void InsertEventAt(DateTimeOffset timestamp, string eventType, string severity, string? instanceId, string source, string message) { _eventLogger.WithConnection(connection =>