docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine

11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
2026-05-17 00:51:58 -04:00
parent 3b3760f026
commit 0ba4e49e11
5 changed files with 613 additions and 27 deletions
--- a/code-reviews/SiteEventLogging/findings.md
+++ b/code-reviews/SiteEventLogging/findings.md
@@ -5,10 +5,10 @@
 | Module | `src/ScadaLink.SiteEventLogging` |
 | Design doc | `docs/requirements/Component-SiteEventLogging.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-16 |
+| Last reviewed | 2026-05-17 |
 | Reviewer | claude-agent |
-| Commit reviewed | `9c60592` |
-| Open findings | 0 |
+| Commit reviewed | `39d737e` |
+| Open findings | 3 |

 ## Summary

@@ -28,16 +28,33 @@ cluster-singleton placement of the handler actor (which can pin to the standby
 node), missing indexes for common query filters, retention/cap purge not enforcing
 the requirement strictly, and several documentation/maintainability issues.

+#### Re-review 2026-05-17 (commit `39d737e`)
+
+Re-reviewed the module at commit `39d737e`. All eleven prior findings remain closed
+(SiteEventLogging-001..003, 005..011 Resolved; 004 Won't Fix) and the resolutions
+hold up under inspection — the background writer, lock-guarded `WithConnection`,
+`auto_vacuum = INCREMENTAL` plus logical-size measurement, the severity index, and
+the concrete-recorder DI wiring are all present and correct at this commit. The
+module source is byte-identical between `39d737e` and current `HEAD`, so this review
+reflects the live code. Three new findings were recorded, all low-to-medium and none
+regressions of prior fixes. The most notable (SiteEventLogging-012) is a correctness
+gap left by the SiteEventLogging-005 background-writer rework: when an event cannot
+be persisted because the logger has been disposed, the returned `Task` is completed
+*successfully* rather than faulted, so an `await`-ing caller is told a dropped audit
+event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the
+keyword-search filter (SiteEventLogging-013) and the initial purge running
+synchronously on the host startup thread (SiteEventLogging-014).
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
 |---|----------|----------|-------|
-| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). |
+| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). Re-review: dropped events report success (-012); `LIKE` wildcards unescaped in keyword search (-013). |
 | 2 | Akka.NET conventions | ☑ | Handler actor has no supervision/correlation concerns of its own; singleton placement issue (-004). `Ask` boundary is appropriate. |
 | 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
 | 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
 | 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
-| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). |
+| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). |
 | 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
 | 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
 | 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
@@ -529,3 +546,122 @@ explanatory note added to `AddSiteEventLogging` pointing readers to where the ac
 is actually registered. Documentation/dead-code change only; no regression test was
 added — the change is a method removal verified by the compiler (no callers) and the
 full module suite still passing.
+
+### SiteEventLogging-012 — Dropped events report success: `Task` is completed, not faulted, when the event cannot be persisted
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` |
+
+**Description**
+
+`LogEventAsync` returns a `Task` that, per the interface XML doc (corrected under
+SiteEventLogging-009), "completes once the event is durably persisted and faults if
+the write fails, so callers that `await` it observe success or failure." Two paths
+break that contract by signalling **success** for an event that was never written:
+
+1. In `LogEventAsync`, if `_writeQueue.Writer.TryWrite(pending)` fails (the channel
+   has been completed because the logger was disposed), the code calls
+   `pending.Completion.TrySetResult()` — completing the `Task` successfully — even
+   though the comment immediately above acknowledges "there is nowhere to persist the
+   event."
+2. In `ProcessWriteQueueAsync`, `WithConnection` returns `false` when the logger has
+   been disposed mid-drain. The code does not inspect the returned `written` flag and
+   unconditionally calls `pending.Completion.TrySetResult()`, again reporting success
+   for an event the comment admits "simply cannot be persisted."
+
+The event log is the site's diagnostic audit trail. A caller that `await`s
+`LogEventAsync` to confirm a critical event (deployment applied, alarm activated) was
+recorded will observe a *successful* completion for an event that was silently
+dropped. This is the same class of defect SiteEventLogging-008 fixed for write
+*errors* — but the disposed-drop path was left reporting false success. The window
+is the disposal/shutdown interval, during which shutdown-related events (graceful
+singleton handover, instance disable) are exactly the events most likely to be
+enqueued and lost.
+
+**Recommendation**
+
+For both paths, fault the `Task` (or complete it with a sentinel failure) instead of
+`TrySetResult()` — e.g. `pending.Completion.TrySetException(new ObjectDisposedException(...))`
+— so an `await`-ing caller can distinguish a dropped event from a persisted one.
+Inspect the `written` flag returned by `WithConnection` in `ProcessWriteQueueAsync`
+and only call `TrySetResult()` when `written` is `true`. Update the XML doc if a
+deliberate "drop silently on shutdown" semantics is chosen instead.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` |
+
+**Description**
+
+The keyword-search filter builds the `LIKE` pattern as `$"%{request.KeywordFilter}%"`
+and binds it as a parameter. Parameterisation correctly prevents SQL injection, but
+it does **not** neutralise the `LIKE` metacharacters `%` and `_` inside the
+user-supplied keyword. A search for a literal `_` (common in event sources and
+identifiers such as `store_and_forward`, `PLC_1`, or instance IDs) is interpreted as
+"match any single character", and a `%` matches any run of characters. The design
+calls keyword search "free-text search on message and source fields ... Useful for
+finding events by script name, alarm name, or error message" — users will reasonably
+expect a literal substring match, so a query for `store_and_forward` silently returns
+events containing `storeXandYforward` and similar false positives. There is no way
+for the caller to search for a literal underscore or percent.
+
+**Recommendation**
+
+Escape `%`, `_`, and the escape character itself in `request.KeywordFilter` before
+wrapping it in `%...%`, and append an `ESCAPE` clause to the `LIKE` expression
+(e.g. `... LIKE $keyword ESCAPE '\'`). Alternatively document that the keyword field
+accepts `LIKE` wildcard syntax, but a literal-substring match is the behaviour the
+design implies.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Performance & resource management |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` |
+
+**Description**
+
+`EventLogPurgeService.ExecuteAsync` calls `RunPurge()` (a fully synchronous method
+that runs `PurgeByRetention` and `PurgeByStorageCap`) *before* the first `await`
+(`await timer.WaitForNextTickAsync(...)`). A `BackgroundService`'s `ExecuteAsync` is
+invoked from `StartAsync`, and the host's startup pipeline does not proceed past a
+`BackgroundService` until its `ExecuteAsync` yields at the first real `await`. Because
+`RunPurge()` precedes any `await`, the entire initial purge — including a cap-purge
+that deletes rows in 1000-row batches and runs `PRAGMA incremental_vacuum` until a
+near-1 GB database is back under the cap — executes inline on the startup thread,
+blocking host startup (and therefore the `/health/ready` gate) for as long as the
+purge takes. On a site that has accumulated a large log this can be a multi-second
+stall during every node start/failover. The class doc states the service "runs on a
+background thread and does not block event recording" — the startup-thread block is
+inconsistent with that intent.
+
+**Recommendation**
+
+Yield before the initial purge so it runs on the background scheduler rather than the
+startup thread — e.g. `await Task.Yield();` as the first statement of `ExecuteAsync`,
+or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync`
+(accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`.
+
+**Resolution**
+
+_Unresolved._