fix(site-event-logging): resolve SiteEventLogging-005,007,008,010 — background async writer, drop concrete downcast, surface write failures, test coverage
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 7 |
|
||||
| Open findings | 3 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -241,7 +241,7 @@ on the active node. No code change made; see the re-triage note above.
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:57-99` |
|
||||
|
||||
**Description**
|
||||
@@ -267,7 +267,16 @@ background flush is preferable.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit `pending`): event recording is now offloaded to a
|
||||
dedicated background writer. `SiteEventLogger` owns an unbounded `Channel<T>` and a
|
||||
single background consumer thread; `LogEventAsync` only validates its arguments and
|
||||
enqueues, so caller threads (Akka actor threads on hot paths) never block on the
|
||||
SQLite write or on contention for the write lock. The returned `Task` completes once
|
||||
the event is durably persisted (so `await` callers still observe write ordering) and
|
||||
faults if the write fails. `Dispose` completes the channel and drains the writer.
|
||||
Regression test `LogEventAsync_DoesNotBlockCaller_WhenWriteIsSlow` (verifies the
|
||||
caller returns in <500 ms while the database is held busy) plus
|
||||
`LogEventAsync_TaskCompletes_AfterEventIsPersisted`.
|
||||
|
||||
### SiteEventLogging-006 — Missing indexes for severity and keyword-search query paths
|
||||
|
||||
@@ -299,36 +308,51 @@ cost.
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteEventLogging-007 — `ISiteEventLogger` consumers downcast to the concrete type and reach into the DB connection
|
||||
### SiteEventLogging-007 — `ISiteEventLogger` consumers downcast to the concrete type
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium (partially re-triaged 2026-05-16 — see Re-triage note) |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:25`, `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:26`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:34` |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:21-30`, `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:20-28`, `src/ScadaLink.SiteEventLogging/ServiceCollectionExtensions.cs:10-23` |
|
||||
|
||||
**Description**
|
||||
|
||||
Both `EventLogPurgeService` and `EventLogQueryService` take `ISiteEventLogger` via
|
||||
DI and immediately downcast it: `_eventLogger = (SiteEventLogger)eventLogger;`. They
|
||||
then access the `internal SqliteConnection Connection` property to run arbitrary SQL.
|
||||
This defeats the purpose of the interface abstraction, makes the registration
|
||||
fragile (any `ISiteEventLogger` that is not exactly `SiteEventLogger` causes an
|
||||
`InvalidCastException` at construction), and leaks the database handle and raw SQL
|
||||
surface out of the recorder. It is also the root cause of the unsynchronised
|
||||
connection sharing in SiteEventLogging-003.
|
||||
Both `EventLogPurgeService` and `EventLogQueryService` took `ISiteEventLogger` via
|
||||
DI and immediately downcast it: `_eventLogger = (SiteEventLogger)eventLogger;`. This
|
||||
made the registration fragile — any `ISiteEventLogger` that is not exactly
|
||||
`SiteEventLogger` (a test double, a decorator) caused an `InvalidCastException` at
|
||||
construction — and defeated the purpose of the interface abstraction.
|
||||
|
||||
**Re-triage note (2026-05-16)**
|
||||
|
||||
The finding as originally written also claimed the services "access the `internal
|
||||
SqliteConnection Connection` property to run arbitrary SQL" and called itself "the
|
||||
root cause of the unsynchronised connection sharing in SiteEventLogging-003". That
|
||||
part is stale: the resolution of SiteEventLogging-003 had already removed the
|
||||
`internal Connection` property and replaced it with lock-guarded `WithConnection`
|
||||
overloads. At the time this finding was actioned, the only remaining defect was the
|
||||
concrete-type downcast itself. Severity stays Medium; the description is corrected to
|
||||
the downcast-only scope.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Introduce a proper data-access abstraction (e.g. an `IEventLogStore` with
|
||||
`Insert`, `Query`, `PurgeOlderThan`, `PurgeToSize`, `GetSizeBytes`) that owns the
|
||||
connection and its locking, and inject that into the recorder, query, and purge
|
||||
services. Remove the `internal Connection` property and the concrete-type downcasts.
|
||||
Have the purge and query services depend on the concrete `SiteEventLogger` directly
|
||||
(it is the type that owns the lock-guarded `WithConnection`), and register the
|
||||
concrete type in DI with the interface forwarded to the same singleton. Remove the
|
||||
fragile downcasts.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit `pending`): `AddSiteEventLogging` now registers
|
||||
`SiteEventLogger` as a concrete singleton and forwards `ISiteEventLogger` to that
|
||||
same instance. `EventLogPurgeService` and `EventLogQueryService` take a
|
||||
`SiteEventLogger` constructor parameter directly, eliminating the
|
||||
`(SiteEventLogger)eventLogger` downcast and its `InvalidCastException` risk. All
|
||||
three services still share one connection/lock. Regression tests
|
||||
`AddSiteEventLogging_ResolvesAllServices_SharingOneRecorderInstance` and
|
||||
`PurgeAndQueryServices_AcceptConcreteRecorder_WithoutDowncast`.
|
||||
|
||||
### SiteEventLogging-008 — Event-recording write failures are silently swallowed
|
||||
|
||||
@@ -336,7 +360,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:92-95` |
|
||||
|
||||
**Description**
|
||||
@@ -359,7 +383,13 @@ a Warning/Error health metric rather than only a local log line.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit `pending`): write failures are no longer swallowed. The
|
||||
background writer (introduced for SiteEventLogging-005) now, on an `INSERT` failure,
|
||||
(a) increments a new `Interlocked`-guarded counter exposed as the public
|
||||
`SiteEventLogger.FailedWriteCount` property — which Health Monitoring can poll to
|
||||
detect a logging outage — and (b) faults the `Task` returned by `LogEventAsync` with
|
||||
the exception instead of returning `Task.CompletedTask`. The error is still logged.
|
||||
Regression test `LogEventAsync_FaultsTask_AndCountsFailure_OnWriteError`.
|
||||
|
||||
### SiteEventLogging-009 — XML doc on `LogEventAsync` claims asynchronous behaviour
|
||||
|
||||
@@ -395,7 +425,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `tests/ScadaLink.SiteEventLogging.Tests/` |
|
||||
|
||||
**Description**
|
||||
@@ -423,7 +453,19 @@ oldest-first deletion, and a query-error-path test (e.g. corrupt/closed connecti
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit `pending`): the remaining coverage gaps are now filled.
|
||||
`EventLogHandlerActorTests` (using `Akka.TestKit.Xunit2`) exercises the actor message
|
||||
contract — `EventLogQueryRequest` -> `EventLogQueryResponse` via `Sender.Tell`, for
|
||||
both success and error responses. `EventLogCoverageTests` covers the previously
|
||||
untested `EventLogQueryService.ExecuteQuery` catch block
|
||||
(`ExecuteQuery_ReturnsFailureResponse_WhenDatabaseUnavailable`) and the recorder's
|
||||
disposed-state semantics (`LogEventAsync_AfterDispose_CompletesWithoutThrowing`,
|
||||
`Dispose_IsIdempotent`). The purge/write concurrency, realistic-cap, and
|
||||
oldest-first behaviours were already covered by the tests added when
|
||||
SiteEventLogging-001/-002/-003 were resolved
|
||||
(`PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection`,
|
||||
`PurgeByStorageCap_StopsWhenUnderCap_DoesNotEmptyTable`,
|
||||
`PurgeByStorageCap_RemovesOldestEventsFirst`).
|
||||
|
||||
### SiteEventLogging-011 — Stale "Phase 4+" placeholder in `ServiceCollectionExtensions`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user