532 lines
28 KiB
Markdown
532 lines
28 KiB
Markdown
# Code Review — SiteEventLogging
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.SiteEventLogging` |
|
|
| Design doc | `docs/requirements/Component-SiteEventLogging.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-16 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `9c60592` |
|
|
| Open findings | 0 |
|
|
|
|
## Summary
|
|
|
|
The SiteEventLogging module is small and broadly well-structured: a SQLite-backed
|
|
recorder (`SiteEventLogger`), a query service with keyset pagination, a background
|
|
purge service, and a thin Akka actor bridge. The query path is parameterised
|
|
correctly (no SQL injection) and reasonably well tested. However, the storage-cap
|
|
enforcement is functionally broken: `PRAGMA incremental_vacuum` is a no-op because
|
|
`auto_vacuum = INCREMENTAL` is never set, so the cap-purge loop never sees the
|
|
database shrink and over-deletes the entire table when triggered. There is also a
|
|
genuine concurrency hazard: the purge service and query service share the single
|
|
`SqliteConnection` owned by `SiteEventLogger` but bypass its `_writeLock`, so a purge
|
|
running on the background thread can collide with a write or a query on another
|
|
thread. The `LogEventAsync` API is synchronous despite its name and `Task` return,
|
|
which silently blocks Akka actor threads on disk I/O. Other findings concern the
|
|
cluster-singleton placement of the handler actor (which can pin to the standby
|
|
node), missing indexes for common query filters, retention/cap purge not enforcing
|
|
the requirement strictly, and several documentation/maintainability issues.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). |
|
|
| 2 | Akka.NET conventions | ☑ | Handler actor has no supervision/correlation concerns of its own; singleton placement issue (-004). `Ask` boundary is appropriate. |
|
|
| 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
|
|
| 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
|
|
| 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
|
|
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). |
|
|
| 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
|
|
| 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
|
|
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
|
|
| 10 | Documentation & comments | ☑ | `LogEventAsync` XML doc says "asynchronously" but is synchronous (-009); stale "Phase 4+" placeholder (-011). |
|
|
|
|
## Findings
|
|
|
|
### SiteEventLogging-001 — `PRAGMA incremental_vacuum` is a no-op; storage cap cannot reclaim space
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:100-102`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:36-55` |
|
|
|
|
**Description**
|
|
|
|
`PurgeByStorageCap` issues `PRAGMA incremental_vacuum` after each delete batch to
|
|
reclaim space, then re-measures the database size via `page_count * page_size`.
|
|
`incremental_vacuum` only has any effect when the database was created with
|
|
`auto_vacuum = INCREMENTAL`. `InitializeSchema` never sets `auto_vacuum`, so the
|
|
database uses the SQLite default (`auto_vacuum = NONE`). With `NONE`,
|
|
`incremental_vacuum` is silently ignored and `page_count` does not decrease when
|
|
rows are deleted (free pages are retained in the file). Consequently the
|
|
`while (currentSizeBytes > capBytes)` loop never observes the size dropping. The
|
|
storage-cap feature required by the design ("configurable maximum database size...
|
|
oldest events are purged first") is therefore non-functional — it cannot bring the
|
|
file back under the cap.
|
|
|
|
**Recommendation**
|
|
|
|
Set `PRAGMA auto_vacuum = INCREMENTAL` in `InitializeSchema` before any tables are
|
|
created (it must be set before table creation or followed by a full `VACUUM` to take
|
|
effect on an existing database). Alternatively, run a full `VACUUM` after cap-purge
|
|
deletes, or measure logical data size (e.g. `page_count - freelist_count` times
|
|
`page_size`) instead of relying on `incremental_vacuum`.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`): `InitializeSchema` now sets
|
|
`PRAGMA auto_vacuum = INCREMENTAL` before any table is created, and
|
|
`GetDatabaseSizeBytes` measures logical size as `(page_count - freelist_count) *
|
|
page_size` so reclaimed pages no longer mask the size drop. The cap-purge loop now
|
|
reliably observes the database shrinking. Regression test
|
|
`PurgeByStorageCap_StopsWhenUnderCap_DoesNotEmptyTable`.
|
|
|
|
### SiteEventLogging-002 — Storage-cap purge deletes the entire table when space is not reclaimed
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:87-105` |
|
|
|
|
**Description**
|
|
|
|
Because of SiteEventLogging-001 the on-disk size never shrinks after a delete batch,
|
|
so `currentSizeBytes` stays above `capBytes`. The loop then keeps deleting 1000-row
|
|
batches on every iteration until `ExecuteNonQuery` returns 0 — i.e. until the table
|
|
is completely empty. The design states the cap should purge "the oldest events...
|
|
first" to stay within budget, not wipe the whole log. When the cap is hit (e.g.
|
|
during an alarm storm) this destroys all retained diagnostic history rather than
|
|
trimming it to the budget. The unit test `PurgeByStorageCap_DeletesOldestWhenOverCap`
|
|
masks the problem because it uses `MaxStorageMb = 0`, which legitimately expects an
|
|
empty table, so the over-delete behaviour is never exercised against a realistic cap.
|
|
|
|
**Recommendation**
|
|
|
|
Fix the size measurement / vacuum (SiteEventLogging-001) so the loop terminates when
|
|
the file is genuinely under the cap. Add a guard so the loop stops once
|
|
`currentSizeBytes` has stopped decreasing across iterations, and add a test with a
|
|
non-zero cap and a known oversized dataset to assert that only the oldest events are
|
|
removed.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`): with the size measurement fixed
|
|
(SiteEventLogging-001) the cap loop terminates when the file is genuinely under the
|
|
cap. An additional guard stops the loop if the on-disk size fails to decrease across
|
|
an iteration, so a cap that can never be met no longer empties the whole table.
|
|
Regression tests `PurgeByStorageCap_StopsWhenUnderCap_DoesNotEmptyTable` (asserts the
|
|
table is not emptied and the file ends under a realistic non-zero cap) and
|
|
`PurgeByStorageCap_RemovesOldestEventsFirst` (asserts only the oldest events are
|
|
removed).
|
|
|
|
### SiteEventLogging-003 — Shared `SqliteConnection` used by purge and query without the write lock
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:64,90,100,110,114`, `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:36`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:34,72` |
|
|
|
|
**Description**
|
|
|
|
`SiteEventLogger` owns a single `SqliteConnection` and serialises its own writes via
|
|
`lock (_writeLock)`. `EventLogPurgeService` and `EventLogQueryService` both reach
|
|
into `_eventLogger.Connection` and execute commands directly, without acquiring
|
|
`_writeLock`. The purge runs on a `BackgroundService` thread (a different thread from
|
|
event-recording callers and from the actor that drives the query service). A single
|
|
`SqliteConnection` / `SqliteCommand` is not thread-safe; concurrent use from the
|
|
purge thread and a recording thread (or query thread) can throw
|
|
`SqliteException`/`InvalidOperationException` ("DataReader already open",
|
|
"connection busy") or corrupt command state. The purge `DELETE` and the recorder
|
|
`INSERT` racing is the most likely collision because event recording is continuous.
|
|
|
|
**Recommendation**
|
|
|
|
Funnel all access to the connection through a single synchronisation point: either
|
|
expose lock-guarded methods on `SiteEventLogger` for purge/query to call, or give the
|
|
purge and query services their own dedicated `SqliteConnection` instances (SQLite
|
|
supports multiple connections to the same file; `Cache=Shared` plus a `busy_timeout`
|
|
makes this safer). Do not share one `SqliteConnection` across threads.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`): the raw `internal Connection` property was
|
|
removed from `SiteEventLogger` and replaced with lock-guarded `WithConnection(...)`
|
|
overloads that hold the existing `_writeLock` for the duration of the caller's
|
|
delegate. `EventLogPurgeService`, `EventLogQueryService`, and `LogEventAsync` all now
|
|
access the connection exclusively through `WithConnection`, so the purge thread,
|
|
query thread, and recording threads are serialised on a single lock. `Dispose` was
|
|
also brought under the lock to avoid a dispose/use race. Regression test
|
|
`PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection` exercises purge running
|
|
concurrently with multiple writer threads.
|
|
|
|
### SiteEventLogging-004 — Event-log handler runs as a cluster singleton that can land on the standby node
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Original severity | High (re-triaged down to Low on 2026-05-16 — see Re-triage note) |
|
|
| Category | Design-document adherence |
|
|
| Status | Won't Fix |
|
|
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:313-336`, `src/ScadaLink.SiteEventLogging/EventLogHandlerActor.cs:21-25` |
|
|
|
|
**Description**
|
|
|
|
`EventLogHandlerActor` is hosted as a `ClusterSingletonManager` singleton with the
|
|
stated intent that "queries always reach the active node". However, an Akka.NET
|
|
cluster singleton is pinned to the *oldest* member of the role, which is not the
|
|
same concept as the SCADA "active node" (the node currently running the Deployment
|
|
Manager singleton / serving live traffic). The design doc is explicit: "Only the
|
|
active node generates and stores events... the new active node starts logging to its
|
|
own SQLite database." The event-log SQLite file is node-local and unreplicated.
|
|
Nothing guarantees the event-log singleton co-locates with the active node, so a
|
|
remote query can be served by the standby node and read that node's near-empty
|
|
database, returning no events even though the active node has a full log. The
|
|
explanatory comment in `AkkaHostedService.cs` asserts the opposite of what actually
|
|
happens.
|
|
|
|
**Recommendation**
|
|
|
|
Either (a) host the query handler as a normal per-node actor and route queries to
|
|
the active node explicitly (the node owning the Deployment Manager singleton), or
|
|
(b) make the event-log writer follow the same singleton so the writer and the query
|
|
handler are guaranteed co-located. Reconcile the design doc and the inline comment
|
|
with whichever model is chosen.
|
|
|
|
**Re-triage note (2026-05-16)**
|
|
|
|
The finding's central claim — that a remote query "can be served by the standby
|
|
node and read that node's near-empty database" — is incorrect for the query path.
|
|
In `AkkaHostedService.cs` the `event-log-handler` `ClusterSingletonManager` and the
|
|
`deployment-manager` `ClusterSingletonManager` are created with the **same role**
|
|
(`siteRole`) in the **same cluster**. Akka.NET pins every cluster singleton of a
|
|
given role to the *oldest member of that role* — so all same-role singletons in a
|
|
cluster co-locate on one node. The "active node" in this codebase is, by definition,
|
|
the node hosting the `deployment-manager` singleton; the event-log query singleton
|
|
is therefore *guaranteed* to run on that same node. A `ClusterClient` query cannot
|
|
land on the standby. The inline comment in `AkkaHostedService.cs` is accurate, not
|
|
"the opposite of what happens".
|
|
|
|
A real but distinct concern exists: the *writer* (`SiteEventLogger`) is registered
|
|
as a plain per-node DI singleton (`AddSiteEventLogging`), so it records to a local
|
|
SQLite file on **every** node, including the standby. That wastes storage on the
|
|
standby but does **not** cause the query-returns-nothing symptom the finding
|
|
describes, because the query singleton always reads the *active* node's (populated)
|
|
database. Gating the writer to the active node would be a `ScadaLink.Host` wiring
|
|
change, outside this module's scope, and is a minor optimisation rather than a
|
|
correctness defect.
|
|
|
|
Re-triaged from High to Low and closed as **Won't Fix**: the High-severity
|
|
correctness claim does not hold. Any residual cleanup (gate the standby-node writer;
|
|
the comment needs no change) can be raised as a fresh Low finding against
|
|
`ScadaLink.Host` if desired.
|
|
|
|
**Resolution**
|
|
|
|
Won't Fix — 2026-05-16 (commit `<pending>`). Re-triaged: the asserted defect (query
|
|
served by standby returning an empty log) cannot occur because the event-log query
|
|
singleton and the deployment-manager singleton share a role and so always co-locate
|
|
on the active node. No code change made; see the re-triage note above.
|
|
|
|
### SiteEventLogging-005 — `LogEventAsync` performs synchronous disk I/O on the caller's thread
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Performance & resource management |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:57-99` |
|
|
|
|
**Description**
|
|
|
|
`LogEventAsync` is declared `async`-shaped (returns `Task`, `Async` suffix) but its
|
|
body is entirely synchronous: it takes `lock (_writeLock)`, runs
|
|
`cmd.ExecuteNonQuery()` (a blocking SQLite write), then returns `Task.CompletedTask`.
|
|
Callers across the codebase invoke it fire-and-forget as `_ = LogEventAsync(...)`
|
|
(e.g. `ScriptExecutionActor.cs:133`, `DataConnectionActor.cs:292`,
|
|
`ScriptActor.cs:250`) expecting it to be non-blocking. In reality the SQLite write,
|
|
and any contention on `_writeLock`, executes inline on the Akka actor thread of the
|
|
calling subsystem. Under an event burst (alarm storm, script failure loop) this
|
|
serialises actor threads on disk I/O and the global write lock, degrading the
|
|
hot-path subsystems the design intends to keep responsive.
|
|
|
|
**Recommendation**
|
|
|
|
Either make recording genuinely asynchronous (offload to a dedicated single-threaded
|
|
writer / `Channel<T>` consumer so callers truly fire-and-forget), or rename the
|
|
method to `LogEvent` and document that it blocks, so callers can decide. Given the
|
|
design's emphasis on not impacting runtime subsystems, an internal queue with a
|
|
background flush is preferable.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): event recording is now offloaded to a
|
|
dedicated background writer. `SiteEventLogger` owns an unbounded `Channel<T>` and a
|
|
single background consumer thread; `LogEventAsync` only validates its arguments and
|
|
enqueues, so caller threads (Akka actor threads on hot paths) never block on the
|
|
SQLite write or on contention for the write lock. The returned `Task` completes once
|
|
the event is durably persisted (so `await` callers still observe write ordering) and
|
|
faults if the write fails. `Dispose` completes the channel and drains the writer.
|
|
Regression test `LogEventAsync_DoesNotBlockCaller_WhenWriteIsSlow` (verifies the
|
|
caller returns in <500 ms while the database is held busy) plus
|
|
`LogEventAsync_TaskCompletes_AfterEventIsPersisted`.
|
|
|
|
### SiteEventLogging-006 — Missing indexes for severity and keyword-search query paths
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:50-52`, `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:65-81` |
|
|
|
|
**Description**
|
|
|
|
`InitializeSchema` creates indexes on `timestamp`, `event_type`, and `instance_id`.
|
|
The query service also filters on `severity` (`severity = $severity`) and performs
|
|
`message LIKE '%...%'` / `source LIKE '%...%'` keyword search. `severity` has no
|
|
index, and a leading-wildcard `LIKE` cannot use a normal index at all. With up to a
|
|
1 GB database and a 500-row page size, severity-filtered and keyword queries do full
|
|
table scans on every page. The design explicitly lists keyword search as a supported,
|
|
expected query type.
|
|
|
|
**Recommendation**
|
|
|
|
Add an index on `severity` (or a composite index aligned with common filter
|
|
combinations such as `(event_type, severity, id)`). For keyword search, consider an
|
|
FTS5 virtual table over `message` and `source`, or accept the scan but document the
|
|
cost.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): `InitializeSchema` now creates
|
|
`idx_events_severity` on `site_events(severity)`, so severity-filtered queries use an
|
|
index instead of full-scanning. The leading-wildcard `LIKE` keyword search cannot use
|
|
a B-tree index by construction; rather than adding an FTS5 table, that scan is
|
|
accepted and now documented with an inline comment next to the index DDL. Regression
|
|
tests `Schema_HasIndexOnSeverity` and `SeverityFilteredQuery_UsesIndex_NotFullScan`
|
|
(the latter asserts the SQLite query plan uses `idx_events_severity` and does not
|
|
`SCAN site_events`).
|
|
|
|
### SiteEventLogging-007 — `ISiteEventLogger` consumers downcast to the concrete type
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium (partially re-triaged 2026-05-16 — see Re-triage note) |
|
|
| Category | Code organization & conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:21-30`, `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:20-28`, `src/ScadaLink.SiteEventLogging/ServiceCollectionExtensions.cs:10-23` |
|
|
|
|
**Description**
|
|
|
|
Both `EventLogPurgeService` and `EventLogQueryService` took `ISiteEventLogger` via
|
|
DI and immediately downcast it: `_eventLogger = (SiteEventLogger)eventLogger;`. This
|
|
made the registration fragile — any `ISiteEventLogger` that is not exactly
|
|
`SiteEventLogger` (a test double, a decorator) caused an `InvalidCastException` at
|
|
construction — and defeated the purpose of the interface abstraction.
|
|
|
|
**Re-triage note (2026-05-16)**
|
|
|
|
The finding as originally written also claimed the services "access the `internal
|
|
SqliteConnection Connection` property to run arbitrary SQL" and called itself "the
|
|
root cause of the unsynchronised connection sharing in SiteEventLogging-003". That
|
|
part is stale: the resolution of SiteEventLogging-003 had already removed the
|
|
`internal Connection` property and replaced it with lock-guarded `WithConnection`
|
|
overloads. At the time this finding was actioned, the only remaining defect was the
|
|
concrete-type downcast itself. Severity stays Medium; the description is corrected to
|
|
the downcast-only scope.
|
|
|
|
**Recommendation**
|
|
|
|
Have the purge and query services depend on the concrete `SiteEventLogger` directly
|
|
(it is the type that owns the lock-guarded `WithConnection`), and register the
|
|
concrete type in DI with the interface forwarded to the same singleton. Remove the
|
|
fragile downcasts.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): `AddSiteEventLogging` now registers
|
|
`SiteEventLogger` as a concrete singleton and forwards `ISiteEventLogger` to that
|
|
same instance. `EventLogPurgeService` and `EventLogQueryService` take a
|
|
`SiteEventLogger` constructor parameter directly, eliminating the
|
|
`(SiteEventLogger)eventLogger` downcast and its `InvalidCastException` risk. All
|
|
three services still share one connection/lock. Regression tests
|
|
`AddSiteEventLogging_ResolvesAllServices_SharingOneRecorderInstance` and
|
|
`PurgeAndQueryServices_AcceptConcreteRecorder_WithoutDowncast`.
|
|
|
|
### SiteEventLogging-008 — Event-recording write failures are silently swallowed
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:92-95` |
|
|
|
|
**Description**
|
|
|
|
If `ExecuteNonQuery` throws (disk full, database locked, file corruption), the
|
|
exception is caught, written to `ILogger`, and discarded; `LogEventAsync` still
|
|
returns `Task.CompletedTask` as if successful. Callers fire-and-forget the result so
|
|
they cannot detect failure. The event log is the site's diagnostic audit trail; a
|
|
sustained write failure (for example a locked-database storm caused by the
|
|
unsynchronised purge in SiteEventLogging-003) means events vanish with no signal to
|
|
operators except a local log line that nobody is watching. There is no failure
|
|
counter, no health-metric hook, and no retry.
|
|
|
|
**Recommendation**
|
|
|
|
Expose a failure signal: increment a counter that the Health Monitoring component
|
|
can surface (the design notes script/alarm error rates are derived from the event
|
|
log — a logging outage should be visible). At minimum, escalate repeated failures to
|
|
a Warning/Error health metric rather than only a local log line.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): write failures are no longer swallowed. The
|
|
background writer (introduced for SiteEventLogging-005) now, on an `INSERT` failure,
|
|
(a) increments a new `Interlocked`-guarded counter exposed as the public
|
|
`SiteEventLogger.FailedWriteCount` property — which Health Monitoring can poll to
|
|
detect a logging outage — and (b) faults the `Task` returned by `LogEventAsync` with
|
|
the exception instead of returning `Task.CompletedTask`. The error is still logged.
|
|
Regression test `LogEventAsync_FaultsTask_AndCountsFailure_OnWriteError`.
|
|
|
|
### SiteEventLogging-009 — XML doc on `LogEventAsync` claims asynchronous behaviour
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/ISiteEventLogger.cs:8-10`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:57` |
|
|
|
|
**Description**
|
|
|
|
The interface XML doc states "Record an event asynchronously." and the method is
|
|
named `LogEventAsync`, but the implementation is fully synchronous (see
|
|
SiteEventLogging-005). The documentation and naming are misleading: a reader will
|
|
reasonably assume the write is offloaded and the caller's thread is not blocked,
|
|
which is false. The `details` parameter doc says "Optional JSON details" but nothing
|
|
validates or requires JSON, so callers may pass arbitrary text.
|
|
|
|
**Re-triage note (2026-05-16)**
|
|
|
|
The finding's central premise is now stale. SiteEventLogging-005 was resolved by
|
|
introducing a dedicated background writer: `LogEventAsync` enqueues onto an unbounded
|
|
`Channel<T>` and returns without blocking, so the method is now *genuinely*
|
|
asynchronous and the name `LogEventAsync` is accurate. No rename is warranted. The
|
|
only residual documentation defect was the imprecise XML doc (the one-line summary
|
|
did not describe the enqueue/await semantics) and the `details` doc claiming "JSON"
|
|
when nothing enforces it. Scope corrected to those doc fixes only.
|
|
|
|
**Recommendation**
|
|
|
|
Align the name, signature, and documentation with the actual behaviour — either make
|
|
the method genuinely asynchronous or rename to `LogEvent` and correct the doc.
|
|
Clarify that `details` is free-form text unless JSON is actually enforced.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): the `ISiteEventLogger.LogEventAsync` XML doc
|
|
now describes the actual behaviour — the call enqueues onto a background writer and
|
|
returns without blocking, and the returned `Task` completes on durable persistence
|
|
(or faults on write failure). The `details` parameter doc was corrected to "Optional
|
|
free-form detail text ... JSON is conventional but not validated or enforced". No
|
|
rename was needed: post-SiteEventLogging-005 the method is genuinely asynchronous, so
|
|
`LogEventAsync` is accurate (see Re-triage note). Documentation-only change; no
|
|
regression test added (behaviour is already covered by the SiteEventLogging-005
|
|
tests `LogEventAsync_DoesNotBlockCaller_WhenWriteIsSlow` and
|
|
`LogEventAsync_TaskCompletes_AfterEventIsPersisted`).
|
|
|
|
### SiteEventLogging-010 — Test coverage gaps: actor bridge, purge/write concurrency, vacuum effectiveness, query error path
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Status | Resolved |
|
|
| Location | `tests/ScadaLink.SiteEventLogging.Tests/` |
|
|
|
|
**Description**
|
|
|
|
The test suite covers recording, query filtering/pagination, and basic purge, but
|
|
several critical behaviours are untested:
|
|
|
|
- `EventLogHandlerActor` has no test — the actor message contract
|
|
(`EventLogQueryRequest` -> `EventLogQueryResponse`, `Sender.Tell`) is unverified.
|
|
- No test exercises purge running concurrently with active writes/queries, so the
|
|
connection-sharing race (SiteEventLogging-003) is invisible to CI.
|
|
- `PurgeByStorageCap` is only tested with `MaxStorageMb = 0`, which hides the
|
|
no-op-vacuum / over-delete bug (SiteEventLogging-001, -002). No test asserts the
|
|
file shrinks or that only oldest events are removed under a realistic cap.
|
|
- `EventLogQueryService.ExecuteQuery`'s catch block (`Success: false`,
|
|
`ErrorMessage`) has no test.
|
|
- `SiteEventLogger.Dispose` semantics (logging after dispose returns
|
|
`Task.CompletedTask`) and re-entrant dispose are untested.
|
|
|
|
**Recommendation**
|
|
|
|
Add tests for the actor bridge, a concurrency stress test (purge + write + query in
|
|
parallel), a realistic non-zero-cap purge test asserting size reduction and
|
|
oldest-first deletion, and a query-error-path test (e.g. corrupt/closed connection).
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): the remaining coverage gaps are now filled.
|
|
`EventLogHandlerActorTests` (using `Akka.TestKit.Xunit2`) exercises the actor message
|
|
contract — `EventLogQueryRequest` -> `EventLogQueryResponse` via `Sender.Tell`, for
|
|
both success and error responses. `EventLogCoverageTests` covers the previously
|
|
untested `EventLogQueryService.ExecuteQuery` catch block
|
|
(`ExecuteQuery_ReturnsFailureResponse_WhenDatabaseUnavailable`) and the recorder's
|
|
disposed-state semantics (`LogEventAsync_AfterDispose_CompletesWithoutThrowing`,
|
|
`Dispose_IsIdempotent`). The purge/write concurrency, realistic-cap, and
|
|
oldest-first behaviours were already covered by the tests added when
|
|
SiteEventLogging-001/-002/-003 were resolved
|
|
(`PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection`,
|
|
`PurgeByStorageCap_StopsWhenUnderCap_DoesNotEmptyTable`,
|
|
`PurgeByStorageCap_RemovesOldestEventsFirst`).
|
|
|
|
### SiteEventLogging-011 — Stale "Phase 4+" placeholder in `ServiceCollectionExtensions`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.SiteEventLogging/ServiceCollectionExtensions.cs:18-22` |
|
|
|
|
**Description**
|
|
|
|
`AddSiteEventLoggingActors` is an empty method with a comment "Placeholder for Akka
|
|
actor registration (Phase 4+)". The actor (`EventLogHandlerActor`) is in fact already
|
|
implemented and is registered directly in `AkkaHostedService.cs:313-336`, not through
|
|
this method. The placeholder is dead code: it is either never called or called with
|
|
no effect, and the comment is stale. A reader looking for where the event-log actor
|
|
is wired up will be misdirected.
|
|
|
|
**Recommendation**
|
|
|
|
Either implement the actor registration here and have `AkkaHostedService` call it
|
|
(centralising the wiring), or delete `AddSiteEventLoggingActors` entirely and remove
|
|
the misleading comment.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`): confirmed dead code — a repo-wide search
|
|
found zero callers of `AddSiteEventLoggingActors`, and `EventLogHandlerActor` is in
|
|
fact wired up directly in `ScadaLink.Host/Actors/AkkaHostedService.cs` as a cluster
|
|
singleton (it must be created inside the `ActorSystem` with a resolved
|
|
`IEventLogQueryService`, which a `IServiceCollection` extension cannot do). The empty
|
|
placeholder method and its stale "Phase 4+" comment were deleted, and a short
|
|
explanatory note added to `AddSiteEventLogging` pointing readers to where the actor
|
|
is actually registered. Documentation/dead-code change only; no regression test was
|
|
added — the change is a method removal verified by the compiler (no callers) and the
|
|
full module suite still passing.
|