fix(site-event-logging): resolve SiteEventLogging-012..014 — fault dropped-event tasks, escape LIKE wildcards, re-triage startup-purge finding (Won't Fix)

This commit is contained in:
Joseph Doherty
2026-05-17 03:18:41 -04:00
parent a58cec5776
commit 6d63fef934
6 changed files with 226 additions and 23 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `39d737e` |
| Open findings | 3 |
| Open findings | 0 |
## Summary
@@ -42,8 +42,9 @@ gap left by the SiteEventLogging-005 background-writer rework: when an event can
be persisted because the logger has been disposed, the returned `Task` is completed
*successfully* rather than faulted, so an `await`-ing caller is told a dropped audit
event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the
keyword-search filter (SiteEventLogging-013) and the initial purge running
synchronously on the host startup thread (SiteEventLogging-014).
keyword-search filter (SiteEventLogging-013) and a claimed initial-purge block on the
host startup thread (SiteEventLogging-014 — later re-triaged to Won't Fix, the
premise does not hold on .NET 8+).
## Checklist coverage
@@ -54,7 +55,7 @@ synchronously on the host startup thread (SiteEventLogging-014).
| 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
| 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
| 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). |
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: claimed initial-purge startup-thread block (-014 — Won't Fix, premise invalid on .NET 8+). |
| 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
| 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
@@ -553,7 +554,7 @@ full module suite still passing.
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` |
**Description**
@@ -593,7 +594,16 @@ deliberate "drop silently on shutdown" semantics is chosen instead.
**Resolution**
_Unresolved._
Resolved 2026-05-17 (commit `<pending>`): both disposed-drop paths now fault the
returned `Task` with `ObjectDisposedException` instead of `TrySetResult()`
`LogEventAsync` on a failed `TryWrite`, and `ProcessWriteQueueAsync` now inspects the
`written` flag from `WithConnection` and only calls `TrySetResult()` when the row was
actually persisted. An `await`-ing caller can now distinguish a dropped audit event
from a persisted one. Regression tests
`LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess` and
`LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete` (the prior
`LogEventAsync_AfterDispose_CompletesWithoutThrowing` test asserted the now-incorrect
silent-success behaviour and was replaced).
### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input
@@ -601,7 +611,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` |
**Description**
@@ -628,7 +638,13 @@ design implies.
**Resolution**
_Unresolved._
Resolved 2026-05-17 (commit `<pending>`): `EventLogQueryService` now escapes `\`,
`%`, and `_` in `request.KeywordFilter` via a new `EscapeLikePattern` helper before
wrapping it in `%...%`, and the `LIKE` clauses carry an explicit `ESCAPE '\'` so the
keyword is matched as a literal substring as the design intends. Regression tests
`Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard`,
`Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard`, and
`Query_KeywordSearch_StillMatchesPlainSubstring`.
### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread
@@ -636,7 +652,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Status | Won't Fix |
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` |
**Description**
@@ -662,6 +678,31 @@ startup thread — e.g. `await Task.Yield();` as the first statement of `Execute
or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync`
(accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`.
**Re-triage note (2026-05-17)**
The finding's central premise — "the host's startup pipeline does not proceed past a
`BackgroundService` until its `ExecuteAsync` yields at the first real `await`", so the
synchronous `RunPurge()` prelude blocks the startup thread — is **incorrect for the
.NET version this project targets**. The module targets `net10.0`. Since .NET 8,
`BackgroundService.StartAsync` no longer runs `ExecuteAsync` inline on the calling
(host startup) thread: `ExecuteAsync` is dispatched onto the thread pool, and
`StartAsync` returns immediately regardless of how long the synchronous prelude runs.
This was verified empirically. A standalone `BackgroundService` whose `ExecuteAsync`
prelude does a 1.5 s synchronous `Thread.Sleep` before its first `await` showed
`StartAsync` returning in **0 ms**, with the prelude running on a *different* thread
pool thread than the caller. Applied to `EventLogPurgeService`, `StartAsync` returns
promptly and the initial `RunPurge()` executes on the background scheduler — host
startup and the `/health/ready` gate are not blocked. There is therefore no defect
to fix.
**Resolution**
_Unresolved._
Won't Fix — 2026-05-17 (commit `<pending>`). Re-triaged: the asserted startup-thread
block cannot occur on .NET 8+ (this module targets `net10.0`), where
`BackgroundService` dispatches `ExecuteAsync` to the thread pool rather than running
its synchronous prelude on the host startup thread — verified empirically (see the
re-triage note). No code change made. A verification test
`StartAsync_DoesNotBlock_OnTheInitialPurge` was added to pin this behaviour
(asserts `StartAsync` returns in under 1 s and the initial purge still runs on the
background scheduler).

View File

@@ -30,6 +30,19 @@ public class EventLogQueryService : IEventLogQueryService
_logger = logger;
}
/// <summary>
/// Escapes the SQL <c>LIKE</c> metacharacters (<c>\</c>, <c>%</c>, <c>_</c>) in a
/// user-supplied keyword so it is matched as a literal substring. Used together
/// with a <c>LIKE ... ESCAPE '\'</c> clause.
/// </summary>
private static string EscapeLikePattern(string input)
{
return input
.Replace("\\", "\\\\")
.Replace("%", "\\%")
.Replace("_", "\\_");
}
public EventLogQueryResponse ExecuteQuery(EventLogQueryRequest request)
{
try
@@ -78,8 +91,14 @@ public class EventLogQueryService : IEventLogQueryService
if (!string.IsNullOrWhiteSpace(request.KeywordFilter))
{
whereClauses.Add("(message LIKE $keyword OR source LIKE $keyword)");
parameters.Add(new SqliteParameter("$keyword", $"%{request.KeywordFilter}%"));
// Keyword search is a literal substring match. The LIKE
// metacharacters % and _ (and the escape char itself) must be
// escaped so identifiers such as "store_and_forward" or a literal
// "%" are not misinterpreted as wildcards (SiteEventLogging-013).
var escaped = EscapeLikePattern(request.KeywordFilter);
whereClauses.Add(
"(message LIKE $keyword ESCAPE '\\' OR source LIKE $keyword ESCAPE '\\')");
parameters.Add(new SqliteParameter("$keyword", $"%{escaped}%"));
}
var whereClause = whereClauses.Count > 0

View File

@@ -160,9 +160,13 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable
if (!_writeQueue.Writer.TryWrite(pending))
{
// The channel is unbounded, so the only way TryWrite fails is that the
// writer has been completed (logger disposed). Drop silently — there is
// nowhere to persist the event.
pending.Completion.TrySetResult();
// writer has been completed (logger disposed). The event cannot be
// persisted — fault the Task (SiteEventLogging-012) rather than
// reporting false success, so a caller that awaits a critical audit
// event can tell it was dropped.
pending.Completion.TrySetException(
new ObjectDisposedException(nameof(SiteEventLogger),
"Event could not be recorded: the event logger has been disposed."));
}
return pending.Completion.Task;
@@ -191,9 +195,20 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable
cmd.ExecuteNonQuery();
});
// WithConnection returns false only when the logger has been
// disposed; the event simply cannot be persisted in that case.
pending.Completion.TrySetResult();
if (written)
{
pending.Completion.TrySetResult();
}
else
{
// WithConnection returns false only when the logger has been
// disposed mid-drain; the event was not persisted. Fault the
// Task (SiteEventLogging-012) instead of reporting false
// success for a dropped audit event.
pending.Completion.TrySetException(
new ObjectDisposedException(nameof(SiteEventLogger),
"Event could not be recorded: the event logger was disposed before the write completed."));
}
}
catch (Exception ex)
{

View File

@@ -64,15 +64,19 @@ public class EventLogCoverageTests : IDisposable
}
[Fact]
public async Task LogEventAsync_AfterDispose_CompletesWithoutThrowing()
public async Task LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess()
{
// Logging after disposal must not throw or hang — the event is simply
// dropped because the background writer has been completed.
// SiteEventLogging-012: when the logger has been disposed the event cannot
// be persisted. The returned Task must FAULT (not complete successfully) so
// an awaiting caller can distinguish a dropped audit event from a written
// one. Per the XML doc contract, the Task "faults if the write fails".
var logger = NewLogger();
logger.Dispose();
await logger.LogEventAsync("script", "Info", null, "Source", "After dispose")
.WaitAsync(TimeSpan.FromSeconds(5));
var task = logger.LogEventAsync("script", "Info", null, "Source", "After dispose");
await Assert.ThrowsAsync<ObjectDisposedException>(
() => task.WaitAsync(TimeSpan.FromSeconds(5)));
}
[Fact]
@@ -83,4 +87,48 @@ public class EventLogCoverageTests : IDisposable
logger.Dispose();
logger.Dispose();
}
[Fact]
public async Task LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete()
{
// SiteEventLogging-012, second path: an event enqueued onto the background
// writer just before disposal must NOT be reported as persisted if the
// writer's WithConnection returns false (logger disposed mid-drain). We
// flood the queue and dispose immediately; any event whose write did not
// actually run must have a faulted Task, never a successful one.
var logger = NewLogger();
var tasks = new List<Task>();
for (int i = 0; i < 200; i++)
{
tasks.Add(logger.LogEventAsync("script", "Info", null, "Source", $"event-{i}"));
}
logger.Dispose();
// Every task must reach a terminal state. None may be left as a successful
// completion for an event the writer never persisted: a task is either
// RanToCompletion (genuinely written before the connection closed) or
// Faulted (could not be persisted). Count persisted vs faulted and assert
// the persisted count matches the actual row count.
await Task.WhenAll(tasks.Select(t => t.ContinueWith(_ => { })));
var succeeded = tasks.Count(t => t.Status == TaskStatus.RanToCompletion);
var faulted = tasks.Count(t => t.IsFaulted);
Assert.Equal(tasks.Count, succeeded + faulted);
long rowCount;
using (var conn = new Microsoft.Data.Sqlite.SqliteConnection($"Data Source={_dbPath}"))
{
conn.Open();
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM site_events";
rowCount = (long)cmd.ExecuteScalar()!;
}
// A successfully-completed Task must correspond to a row that was actually
// written. If the disposed-mid-drain path falsely reported success, the
// success count would exceed the row count.
Assert.Equal(rowCount, succeeded);
}
}

View File

@@ -225,6 +225,42 @@ public class EventLogPurgeServiceTests : IDisposable
Assert.Equal(1L, newestPresent);
}
[Fact]
public async Task StartAsync_DoesNotBlock_OnTheInitialPurge()
{
// SiteEventLogging-014 (re-triaged): on .NET 8+ BackgroundService runs
// ExecuteAsync on a thread-pool thread — the synchronous prelude (the
// initial RunPurge()) does NOT execute on the host startup thread, so
// StartAsync returns promptly and host startup / the /health/ready gate is
// not blocked even by a large initial purge. This test pins that behaviour:
// StartAsync returns fast, and the initial purge still happens shortly
// afterwards on the background scheduler.
InsertEventWithTimestamp(DateTimeOffset.UtcNow.AddDays(-31));
Assert.Equal(1, GetEventCount());
var purge = CreatePurgeService();
using var cts = new CancellationTokenSource();
var sw = System.Diagnostics.Stopwatch.StartNew();
await purge.StartAsync(cts.Token);
sw.Stop();
Assert.True(sw.ElapsedMilliseconds < 1000,
$"StartAsync blocked for {sw.ElapsedMilliseconds} ms — the initial purge " +
"must not run on the host startup thread.");
// The initial purge still runs on the background scheduler.
var deadline = DateTime.UtcNow.AddSeconds(5);
while (GetEventCount() != 0 && DateTime.UtcNow < deadline)
{
await Task.Delay(25);
}
Assert.Equal(0, GetEventCount());
await cts.CancelAsync();
await purge.StopAsync(CancellationToken.None);
}
[Fact]
public async Task PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection()
{

View File

@@ -254,6 +254,50 @@ public class EventLogQueryServiceTests : IDisposable
Assert.Equal("{\"stack\":\"trace\"}", entry.Details);
}
[Fact]
public async Task Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard()
{
// SiteEventLogging-013: a SQL LIKE '_' matches any single character. A
// keyword search for a literal underscore (common in identifiers such as
// "store_and_forward") must NOT match strings that merely have any
// character in that position.
await _eventLogger.LogEventAsync("script", "Info", null, "store_and_forward", "Buffer queued");
await _eventLogger.LogEventAsync("script", "Info", null, "storeXandYforward", "Other event");
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "store_and_forward"));
Assert.True(response.Success);
Assert.Single(response.Entries);
Assert.Equal("store_and_forward", response.Entries[0].Source);
}
[Fact]
public async Task Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard()
{
// SiteEventLogging-013: a SQL LIKE '%' matches any run of characters. A
// keyword search containing a literal '%' must match only the literal.
await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90% utilisation");
await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90 then 0 utilisation");
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "90% utilisation"));
Assert.True(response.Success);
Assert.Single(response.Entries);
Assert.Contains("90% utilisation", response.Entries[0].Message);
}
[Fact]
public async Task Query_KeywordSearch_StillMatchesPlainSubstring()
{
// Escaping wildcards must not break ordinary substring search.
await SeedEvents();
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "timeout"));
Assert.True(response.Success);
Assert.Single(response.Entries);
}
private void InsertEventAt(DateTimeOffset timestamp, string eventType, string severity, string? instanceId, string source, string message)
{
_eventLogger.WithConnection(connection =>