fix(site-event-logging): resolve SiteEventLogging-012..014 — fault dropped-event tasks, escape LIKE wildcards, re-triage startup-purge finding (Won't Fix)
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 3 |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -42,8 +42,9 @@ gap left by the SiteEventLogging-005 background-writer rework: when an event can
|
||||
be persisted because the logger has been disposed, the returned `Task` is completed
|
||||
*successfully* rather than faulted, so an `await`-ing caller is told a dropped audit
|
||||
event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the
|
||||
keyword-search filter (SiteEventLogging-013) and the initial purge running
|
||||
synchronously on the host startup thread (SiteEventLogging-014).
|
||||
keyword-search filter (SiteEventLogging-013) and a claimed initial-purge block on the
|
||||
host startup thread (SiteEventLogging-014 — later re-triaged to Won't Fix, the
|
||||
premise does not hold on .NET 8+).
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -54,7 +55,7 @@ synchronously on the host startup thread (SiteEventLogging-014).
|
||||
| 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
|
||||
| 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
|
||||
| 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
|
||||
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). |
|
||||
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: claimed initial-purge startup-thread block (-014 — Won't Fix, premise invalid on .NET 8+). |
|
||||
| 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
|
||||
| 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
|
||||
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
|
||||
@@ -553,7 +554,7 @@ full module suite still passing.
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` |
|
||||
|
||||
**Description**
|
||||
@@ -593,7 +594,16 @@ deliberate "drop silently on shutdown" semantics is chosen instead.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`): both disposed-drop paths now fault the
|
||||
returned `Task` with `ObjectDisposedException` instead of `TrySetResult()` —
|
||||
`LogEventAsync` on a failed `TryWrite`, and `ProcessWriteQueueAsync` now inspects the
|
||||
`written` flag from `WithConnection` and only calls `TrySetResult()` when the row was
|
||||
actually persisted. An `await`-ing caller can now distinguish a dropped audit event
|
||||
from a persisted one. Regression tests
|
||||
`LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess` and
|
||||
`LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete` (the prior
|
||||
`LogEventAsync_AfterDispose_CompletesWithoutThrowing` test asserted the now-incorrect
|
||||
silent-success behaviour and was replaced).
|
||||
|
||||
### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input
|
||||
|
||||
@@ -601,7 +611,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` |
|
||||
|
||||
**Description**
|
||||
@@ -628,7 +638,13 @@ design implies.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`): `EventLogQueryService` now escapes `\`,
|
||||
`%`, and `_` in `request.KeywordFilter` via a new `EscapeLikePattern` helper before
|
||||
wrapping it in `%...%`, and the `LIKE` clauses carry an explicit `ESCAPE '\'` so the
|
||||
keyword is matched as a literal substring as the design intends. Regression tests
|
||||
`Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard`,
|
||||
`Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard`, and
|
||||
`Query_KeywordSearch_StillMatchesPlainSubstring`.
|
||||
|
||||
### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread
|
||||
|
||||
@@ -636,7 +652,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Status | Won't Fix |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` |
|
||||
|
||||
**Description**
|
||||
@@ -662,6 +678,31 @@ startup thread — e.g. `await Task.Yield();` as the first statement of `Execute
|
||||
or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync`
|
||||
(accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`.
|
||||
|
||||
**Re-triage note (2026-05-17)**
|
||||
|
||||
The finding's central premise — "the host's startup pipeline does not proceed past a
|
||||
`BackgroundService` until its `ExecuteAsync` yields at the first real `await`", so the
|
||||
synchronous `RunPurge()` prelude blocks the startup thread — is **incorrect for the
|
||||
.NET version this project targets**. The module targets `net10.0`. Since .NET 8,
|
||||
`BackgroundService.StartAsync` no longer runs `ExecuteAsync` inline on the calling
|
||||
(host startup) thread: `ExecuteAsync` is dispatched onto the thread pool, and
|
||||
`StartAsync` returns immediately regardless of how long the synchronous prelude runs.
|
||||
|
||||
This was verified empirically. A standalone `BackgroundService` whose `ExecuteAsync`
|
||||
prelude does a 1.5 s synchronous `Thread.Sleep` before its first `await` showed
|
||||
`StartAsync` returning in **0 ms**, with the prelude running on a *different* thread
|
||||
pool thread than the caller. Applied to `EventLogPurgeService`, `StartAsync` returns
|
||||
promptly and the initial `RunPurge()` executes on the background scheduler — host
|
||||
startup and the `/health/ready` gate are not blocked. There is therefore no defect
|
||||
to fix.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Won't Fix — 2026-05-17 (commit `<pending>`). Re-triaged: the asserted startup-thread
|
||||
block cannot occur on .NET 8+ (this module targets `net10.0`), where
|
||||
`BackgroundService` dispatches `ExecuteAsync` to the thread pool rather than running
|
||||
its synchronous prelude on the host startup thread — verified empirically (see the
|
||||
re-triage note). No code change made. A verification test
|
||||
`StartAsync_DoesNotBlock_OnTheInitialPurge` was added to pin this behaviour
|
||||
(asserts `StartAsync` returns in under 1 s and the initial purge still runs on the
|
||||
background scheduler).
|
||||
|
||||
@@ -30,6 +30,19 @@ public class EventLogQueryService : IEventLogQueryService
|
||||
_logger = logger;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Escapes the SQL <c>LIKE</c> metacharacters (<c>\</c>, <c>%</c>, <c>_</c>) in a
|
||||
/// user-supplied keyword so it is matched as a literal substring. Used together
|
||||
/// with a <c>LIKE ... ESCAPE '\'</c> clause.
|
||||
/// </summary>
|
||||
private static string EscapeLikePattern(string input)
|
||||
{
|
||||
return input
|
||||
.Replace("\\", "\\\\")
|
||||
.Replace("%", "\\%")
|
||||
.Replace("_", "\\_");
|
||||
}
|
||||
|
||||
public EventLogQueryResponse ExecuteQuery(EventLogQueryRequest request)
|
||||
{
|
||||
try
|
||||
@@ -78,8 +91,14 @@ public class EventLogQueryService : IEventLogQueryService
|
||||
|
||||
if (!string.IsNullOrWhiteSpace(request.KeywordFilter))
|
||||
{
|
||||
whereClauses.Add("(message LIKE $keyword OR source LIKE $keyword)");
|
||||
parameters.Add(new SqliteParameter("$keyword", $"%{request.KeywordFilter}%"));
|
||||
// Keyword search is a literal substring match. The LIKE
|
||||
// metacharacters % and _ (and the escape char itself) must be
|
||||
// escaped so identifiers such as "store_and_forward" or a literal
|
||||
// "%" are not misinterpreted as wildcards (SiteEventLogging-013).
|
||||
var escaped = EscapeLikePattern(request.KeywordFilter);
|
||||
whereClauses.Add(
|
||||
"(message LIKE $keyword ESCAPE '\\' OR source LIKE $keyword ESCAPE '\\')");
|
||||
parameters.Add(new SqliteParameter("$keyword", $"%{escaped}%"));
|
||||
}
|
||||
|
||||
var whereClause = whereClauses.Count > 0
|
||||
|
||||
@@ -160,9 +160,13 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable
|
||||
if (!_writeQueue.Writer.TryWrite(pending))
|
||||
{
|
||||
// The channel is unbounded, so the only way TryWrite fails is that the
|
||||
// writer has been completed (logger disposed). Drop silently — there is
|
||||
// nowhere to persist the event.
|
||||
pending.Completion.TrySetResult();
|
||||
// writer has been completed (logger disposed). The event cannot be
|
||||
// persisted — fault the Task (SiteEventLogging-012) rather than
|
||||
// reporting false success, so a caller that awaits a critical audit
|
||||
// event can tell it was dropped.
|
||||
pending.Completion.TrySetException(
|
||||
new ObjectDisposedException(nameof(SiteEventLogger),
|
||||
"Event could not be recorded: the event logger has been disposed."));
|
||||
}
|
||||
|
||||
return pending.Completion.Task;
|
||||
@@ -191,9 +195,20 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable
|
||||
cmd.ExecuteNonQuery();
|
||||
});
|
||||
|
||||
// WithConnection returns false only when the logger has been
|
||||
// disposed; the event simply cannot be persisted in that case.
|
||||
pending.Completion.TrySetResult();
|
||||
if (written)
|
||||
{
|
||||
pending.Completion.TrySetResult();
|
||||
}
|
||||
else
|
||||
{
|
||||
// WithConnection returns false only when the logger has been
|
||||
// disposed mid-drain; the event was not persisted. Fault the
|
||||
// Task (SiteEventLogging-012) instead of reporting false
|
||||
// success for a dropped audit event.
|
||||
pending.Completion.TrySetException(
|
||||
new ObjectDisposedException(nameof(SiteEventLogger),
|
||||
"Event could not be recorded: the event logger was disposed before the write completed."));
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
|
||||
@@ -64,15 +64,19 @@ public class EventLogCoverageTests : IDisposable
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task LogEventAsync_AfterDispose_CompletesWithoutThrowing()
|
||||
public async Task LogEventAsync_AfterDispose_FaultsTask_NotReportsSuccess()
|
||||
{
|
||||
// Logging after disposal must not throw or hang — the event is simply
|
||||
// dropped because the background writer has been completed.
|
||||
// SiteEventLogging-012: when the logger has been disposed the event cannot
|
||||
// be persisted. The returned Task must FAULT (not complete successfully) so
|
||||
// an awaiting caller can distinguish a dropped audit event from a written
|
||||
// one. Per the XML doc contract, the Task "faults if the write fails".
|
||||
var logger = NewLogger();
|
||||
logger.Dispose();
|
||||
|
||||
await logger.LogEventAsync("script", "Info", null, "Source", "After dispose")
|
||||
.WaitAsync(TimeSpan.FromSeconds(5));
|
||||
var task = logger.LogEventAsync("script", "Info", null, "Source", "After dispose");
|
||||
|
||||
await Assert.ThrowsAsync<ObjectDisposedException>(
|
||||
() => task.WaitAsync(TimeSpan.FromSeconds(5)));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
@@ -83,4 +87,48 @@ public class EventLogCoverageTests : IDisposable
|
||||
logger.Dispose();
|
||||
logger.Dispose();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task LogEventAsync_EnqueuedThenDisposed_FaultsTask_WhenWriteCannotComplete()
|
||||
{
|
||||
// SiteEventLogging-012, second path: an event enqueued onto the background
|
||||
// writer just before disposal must NOT be reported as persisted if the
|
||||
// writer's WithConnection returns false (logger disposed mid-drain). We
|
||||
// flood the queue and dispose immediately; any event whose write did not
|
||||
// actually run must have a faulted Task, never a successful one.
|
||||
var logger = NewLogger();
|
||||
|
||||
var tasks = new List<Task>();
|
||||
for (int i = 0; i < 200; i++)
|
||||
{
|
||||
tasks.Add(logger.LogEventAsync("script", "Info", null, "Source", $"event-{i}"));
|
||||
}
|
||||
|
||||
logger.Dispose();
|
||||
|
||||
// Every task must reach a terminal state. None may be left as a successful
|
||||
// completion for an event the writer never persisted: a task is either
|
||||
// RanToCompletion (genuinely written before the connection closed) or
|
||||
// Faulted (could not be persisted). Count persisted vs faulted and assert
|
||||
// the persisted count matches the actual row count.
|
||||
await Task.WhenAll(tasks.Select(t => t.ContinueWith(_ => { })));
|
||||
|
||||
var succeeded = tasks.Count(t => t.Status == TaskStatus.RanToCompletion);
|
||||
var faulted = tasks.Count(t => t.IsFaulted);
|
||||
Assert.Equal(tasks.Count, succeeded + faulted);
|
||||
|
||||
long rowCount;
|
||||
using (var conn = new Microsoft.Data.Sqlite.SqliteConnection($"Data Source={_dbPath}"))
|
||||
{
|
||||
conn.Open();
|
||||
using var cmd = conn.CreateCommand();
|
||||
cmd.CommandText = "SELECT COUNT(*) FROM site_events";
|
||||
rowCount = (long)cmd.ExecuteScalar()!;
|
||||
}
|
||||
|
||||
// A successfully-completed Task must correspond to a row that was actually
|
||||
// written. If the disposed-mid-drain path falsely reported success, the
|
||||
// success count would exceed the row count.
|
||||
Assert.Equal(rowCount, succeeded);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -225,6 +225,42 @@ public class EventLogPurgeServiceTests : IDisposable
|
||||
Assert.Equal(1L, newestPresent);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task StartAsync_DoesNotBlock_OnTheInitialPurge()
|
||||
{
|
||||
// SiteEventLogging-014 (re-triaged): on .NET 8+ BackgroundService runs
|
||||
// ExecuteAsync on a thread-pool thread — the synchronous prelude (the
|
||||
// initial RunPurge()) does NOT execute on the host startup thread, so
|
||||
// StartAsync returns promptly and host startup / the /health/ready gate is
|
||||
// not blocked even by a large initial purge. This test pins that behaviour:
|
||||
// StartAsync returns fast, and the initial purge still happens shortly
|
||||
// afterwards on the background scheduler.
|
||||
InsertEventWithTimestamp(DateTimeOffset.UtcNow.AddDays(-31));
|
||||
Assert.Equal(1, GetEventCount());
|
||||
|
||||
var purge = CreatePurgeService();
|
||||
using var cts = new CancellationTokenSource();
|
||||
|
||||
var sw = System.Diagnostics.Stopwatch.StartNew();
|
||||
await purge.StartAsync(cts.Token);
|
||||
sw.Stop();
|
||||
|
||||
Assert.True(sw.ElapsedMilliseconds < 1000,
|
||||
$"StartAsync blocked for {sw.ElapsedMilliseconds} ms — the initial purge " +
|
||||
"must not run on the host startup thread.");
|
||||
|
||||
// The initial purge still runs on the background scheduler.
|
||||
var deadline = DateTime.UtcNow.AddSeconds(5);
|
||||
while (GetEventCount() != 0 && DateTime.UtcNow < deadline)
|
||||
{
|
||||
await Task.Delay(25);
|
||||
}
|
||||
Assert.Equal(0, GetEventCount());
|
||||
|
||||
await cts.CancelAsync();
|
||||
await purge.StopAsync(CancellationToken.None);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection()
|
||||
{
|
||||
|
||||
@@ -254,6 +254,50 @@ public class EventLogQueryServiceTests : IDisposable
|
||||
Assert.Equal("{\"stack\":\"trace\"}", entry.Details);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_KeywordSearch_TreatsUnderscoreAsLiteral_NotWildcard()
|
||||
{
|
||||
// SiteEventLogging-013: a SQL LIKE '_' matches any single character. A
|
||||
// keyword search for a literal underscore (common in identifiers such as
|
||||
// "store_and_forward") must NOT match strings that merely have any
|
||||
// character in that position.
|
||||
await _eventLogger.LogEventAsync("script", "Info", null, "store_and_forward", "Buffer queued");
|
||||
await _eventLogger.LogEventAsync("script", "Info", null, "storeXandYforward", "Other event");
|
||||
|
||||
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "store_and_forward"));
|
||||
|
||||
Assert.True(response.Success);
|
||||
Assert.Single(response.Entries);
|
||||
Assert.Equal("store_and_forward", response.Entries[0].Source);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_KeywordSearch_TreatsPercentAsLiteral_NotWildcard()
|
||||
{
|
||||
// SiteEventLogging-013: a SQL LIKE '%' matches any run of characters. A
|
||||
// keyword search containing a literal '%' must match only the literal.
|
||||
await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90% utilisation");
|
||||
await _eventLogger.LogEventAsync("script", "Warning", null, "ScriptActor:Run", "CPU at 90 then 0 utilisation");
|
||||
|
||||
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "90% utilisation"));
|
||||
|
||||
Assert.True(response.Success);
|
||||
Assert.Single(response.Entries);
|
||||
Assert.Contains("90% utilisation", response.Entries[0].Message);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_KeywordSearch_StillMatchesPlainSubstring()
|
||||
{
|
||||
// Escaping wildcards must not break ordinary substring search.
|
||||
await SeedEvents();
|
||||
|
||||
var response = _queryService.ExecuteQuery(MakeRequest(keyword: "timeout"));
|
||||
|
||||
Assert.True(response.Success);
|
||||
Assert.Single(response.Entries);
|
||||
}
|
||||
|
||||
private void InsertEventAt(DateTimeOffset timestamp, string eventType, string severity, string? instanceId, string source, string message)
|
||||
{
|
||||
_eventLogger.WithConnection(connection =>
|
||||
|
||||
Reference in New Issue
Block a user