code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.
regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.SiteEventLogging` |
|
||||
| Design doc | `docs/requirements/Component-SiteEventLogging.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 9 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -46,6 +46,31 @@ keyword-search filter (SiteEventLogging-013) and a claimed initial-purge block o
|
||||
host startup thread (SiteEventLogging-014 — later re-triaged to Won't Fix, the
|
||||
premise does not hold on .NET 8+).
|
||||
|
||||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||||
|
||||
Re-reviewed the module at commit `1eb6e97`. All fourteen prior findings remain closed
|
||||
and their resolutions hold up under inspection: the lock-guarded `WithConnection`
|
||||
overloads, the background-writer `Channel<T>` with disposed-mid-drain fault
|
||||
propagation, the `auto_vacuum = INCREMENTAL` schema + logical-size measurement, the
|
||||
severity index, the `LIKE` keyword-search escaping, and the concrete-recorder DI
|
||||
wiring are all present and correct at this commit. Nine new findings were recorded —
|
||||
none are regressions of prior fixes. The most notable (SiteEventLogging-016, **High**)
|
||||
is a correctness defect in the query path: timestamps are stored as ISO 8601 strings
|
||||
generated from `DateTimeOffset.UtcNow` (so they always have a `+00:00` offset suffix),
|
||||
but the `From`/`To` filters are stringified verbatim via `request.From.Value.ToString("o")`
|
||||
without normalising to UTC, so a central client that sends a non-UTC `DateTimeOffset`
|
||||
gets a broken lexicographic comparison and either spuriously includes or excludes
|
||||
events. The next-most-notable findings are SiteEventLogging-015 (unbounded background
|
||||
write queue can grow without limit under sustained writer slowness — sister
|
||||
`SqliteAuditWriter` uses a bounded channel) and SiteEventLogging-017 (the central
|
||||
client's `PageSize` is used verbatim with no upper-bound clamp, defeating the design's
|
||||
"prevents broad queries from overwhelming the communication channel" rationale). The
|
||||
remaining findings are low-severity hygiene / documentation: an unused
|
||||
`FailedWriteCount` metric, untyped severity/event-type fields, non-invariant culture
|
||||
parsing, the purge service running on the standby node, the redundant `Cache=Shared`
|
||||
on a single-connection logger, and a non-volatile stop flag in a concurrency stress
|
||||
test.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
@@ -61,6 +86,21 @@ premise does not hold on .NET 8+).
|
||||
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
|
||||
| 10 | Documentation & comments | ☑ | `LogEventAsync` XML doc says "asynchronously" but is synchronous (-009); stale "Phase 4+" placeholder (-011). |
|
||||
|
||||
_Re-review (2026-05-28, `1eb6e97`):_
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | `From`/`To` filters compare non-normalised ISO 8601 strings against UTC-stored timestamps (-016); `DateTimeOffset.Parse` without invariant culture is culture-sensitive (-021); severity/event-type accept any non-empty string with no schema enforcement (-020). |
|
||||
| 2 | Akka.NET conventions | ☑ | `EventLogHandlerActor` is a simple `Receive`/`Tell` bridge with no supervision concerns of its own; no new findings. |
|
||||
| 3 | Concurrency & thread safety | ☑ | Concurrent-write stress test uses a non-volatile `stop` flag (-023). The shared-connection lock pattern is correct post-SiteEventLogging-003. |
|
||||
| 4 | Error handling & resilience | ☑ | `FailedWriteCount` is exposed but nothing in Health Monitoring polls it — the metric is unobserved (-018). |
|
||||
| 5 | Security | ☑ | Queries are fully parameterised. `PageSize` and `KeywordFilter` from the central client are not bounded (-017) — a hostile or buggy central could request `int.MaxValue` rows or multi-MB `LIKE` patterns. |
|
||||
| 6 | Performance & resource management | ☑ | Background write queue is unbounded (-015); `Cache=Shared` is redundant for a single-connection logger (-022); upper-bound on `PageSize` missing (-017). |
|
||||
| 7 | Design-document adherence | ☑ | `EventLogPurgeService` is registered as a per-host `BackgroundService` and runs on the standby too, but the design says "the daily background job runs on the active node" (-019). |
|
||||
| 8 | Code organization & conventions | ☑ | `FailedWriteCount` is on the concrete `SiteEventLogger`, not on `ISiteEventLogger`, so any future non-concrete consumer cannot read it (-018). |
|
||||
| 9 | Testing coverage | ☑ | Non-volatile `stop` flag in `PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection` (-023). No tests for `PageSize` bounds, `From`/`To` timezone handling, or unobserved `FailedWriteCount`. |
|
||||
| 10 | Documentation & comments | ☑ | `FailedWriteCount` XML doc claims "Health Monitoring can poll" but nothing does (-018). Severity / event-type docs enumerate values that are not enforced (-020). |
|
||||
|
||||
## Findings
|
||||
|
||||
### SiteEventLogging-001 — `PRAGMA incremental_vacuum` is a no-op; storage cap cannot reclaim space
|
||||
@@ -706,3 +746,341 @@ re-triage note). No code change made. A verification test
|
||||
`StartAsync_DoesNotBlock_OnTheInitialPurge` was added to pin this behaviour
|
||||
(asserts `StartAsync` returns in under 1 s and the initial purge still runs on the
|
||||
background scheduler).
|
||||
|
||||
### SiteEventLogging-015 — Background write queue is unbounded; can grow without limit under sustained writer slowness
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:58-63` |
|
||||
|
||||
**Description**
|
||||
|
||||
`SiteEventLogger` creates its background-writer feeder as
|
||||
`Channel.CreateUnbounded<PendingEvent>(...)`. The writer thread funnels every write
|
||||
through the shared `_writeLock` (acquired by `WithConnection`), so any condition that
|
||||
makes a single iteration slow — a long-running query in `EventLogQueryService`
|
||||
holding the lock, a `PurgeByStorageCap` run that takes the lock for batched
|
||||
`DELETE` + `PRAGMA incremental_vacuum`, a disk stall, or a sustained event burst
|
||||
from an alarm storm / script failure loop — drives the queue arbitrarily large.
|
||||
Every queued `PendingEvent` retains its `TaskCompletionSource` and its payload
|
||||
strings, so there is no upper bound on how much memory the recorder can hold.
|
||||
|
||||
The sister centralized-audit component `ScadaLink.AuditLog/Site/SqliteAuditWriter.cs`
|
||||
addresses the same hot-path-writer problem with
|
||||
`Channel.CreateBounded<...>(new BoundedChannelOptions(_options.ChannelCapacity) { ..., FullMode = BoundedChannelFullMode.Wait })`,
|
||||
giving back-pressure to producers. Site event logging picked the riskier choice for
|
||||
a component that — per the design — is fed by every site subsystem (script, alarm,
|
||||
deployment, DCL, store-and-forward, instance lifecycle, notification) and has both
|
||||
a 30-day retention sweep and a 1 GB cap-purge competing for the same lock.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Switch to `Channel.CreateBounded<PendingEvent>(...)` with a configurable capacity
|
||||
(default in the order of 10 000 — large enough to absorb a normal alarm burst,
|
||||
small enough to bound memory). Pick a `FullMode` that matches policy: `Wait` for
|
||||
back-pressure (callers `await` and serialise their actor thread on the queue —
|
||||
defeats some of the SiteEventLogging-005 win but is safe), or `DropOldest` /
|
||||
`DropWrite` with a counter (drop-and-count is closer to "best-effort audit"). Add
|
||||
the dropped-event counter to `FailedWriteCount` or a sibling metric. Document the
|
||||
chosen policy on `ISiteEventLogger.LogEventAsync`.
|
||||
|
||||
### SiteEventLogging-016 — `From`/`To` filters compare non-normalised ISO 8601 strings against UTC-stored timestamps
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:67-77`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:159`, `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:72-78` |
|
||||
|
||||
**Description**
|
||||
|
||||
Event rows are persisted with `timestamp` = `DateTimeOffset.UtcNow.ToString("o")`,
|
||||
which always emits the round-trip ISO 8601 form ending in the literal offset
|
||||
`+00:00` (e.g. `2026-05-28T12:34:56.7890123+00:00`). The query path filters by
|
||||
range using a direct string comparison:
|
||||
|
||||
```
|
||||
whereClauses.Add("timestamp >= $from");
|
||||
parameters.Add(new SqliteParameter("$from", request.From.Value.ToString("o")));
|
||||
```
|
||||
|
||||
`request.From` is a `DateTimeOffset?` and `ToString("o")` preserves whatever offset
|
||||
the caller passed in. If a central client passes a non-UTC `DateTimeOffset` — for
|
||||
example the result of `DateTimeOffset.Now` in a `UTC+05:00` timezone — the produced
|
||||
string is `"2026-05-28T17:34:56.0000000+05:00"`, which is lexicographically *greater*
|
||||
than the equivalent UTC instant string `"2026-05-28T12:34:56.0000000+00:00"`. The
|
||||
comparison `timestamp >= $from` is then evaluated as a byte-by-byte string compare
|
||||
(SQLite default `BINARY` collation), so the query either spuriously excludes events
|
||||
that genuinely occurred in the range, or spuriously includes events from a wholly
|
||||
different hour. The same defect applies to `To`. The retention purge does
|
||||
`DateTimeOffset.UtcNow.AddDays(-N).ToString("o")` (UTC) so it is safe; only the
|
||||
central query path is vulnerable.
|
||||
|
||||
The design explicitly states "All timestamps are UTC throughout the system" but the
|
||||
boundary between a central `DateTimeOffset` and the SQLite store is not enforced.
|
||||
A central UI rendered in a non-UTC timezone is the most likely trigger, and the
|
||||
defect silently corrupts every query that filters by time range — exactly the
|
||||
filter most likely to be set on a "show me what happened around the failover" query.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Normalise `From` / `To` to UTC before serialising:
|
||||
`request.From.Value.ToUniversalTime().ToString("o")` (or
|
||||
`.UtcDateTime.ToString("o")`), so the produced offset is always `+00:00`. Add a
|
||||
regression test that filters with a `DateTimeOffset` carrying a non-zero offset and
|
||||
asserts the matching events are returned. Optionally also store timestamps as
|
||||
Unix-epoch `INTEGER` and let SQLite compare numerically, eliminating the
|
||||
lexicographic-comparison hazard structurally.
|
||||
|
||||
### SiteEventLogging-017 — Central client's `PageSize` is unbounded; defeats the "configurable page size" design rationale
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:55`, `src/ScadaLink.Commons/Messages/RemoteQuery/EventLogQueryRequest.cs:18` |
|
||||
|
||||
**Description**
|
||||
|
||||
`EventLogQueryService.ExecuteQuery` resolves the effective page size as
|
||||
`var pageSize = request.PageSize > 0 ? request.PageSize : _options.QueryPageSize;`
|
||||
and uses it directly as the SQL `LIMIT $limit` (passing `pageSize + 1` to detect
|
||||
"has more"). There is no upper bound. A central client — buggy or hostile — can
|
||||
send `PageSize = int.MaxValue`, in which case the query attempts to materialise the
|
||||
entire (up to 1 GB) event log into a single `List<EventLogEntry>` while holding the
|
||||
shared write lock. This:
|
||||
|
||||
- Builds a worst-case ~1 GB managed allocation that, depending on Akka.NET cluster
|
||||
message serialisation limits, will then be serialised into an
|
||||
`EventLogQueryResponse` and pushed over the ClusterClient pipe.
|
||||
- Blocks all writes (purge, recorder hot path) for the duration of the scan
|
||||
because the read holds `_writeLock`.
|
||||
- Stalls the singleton `EventLogHandlerActor`, also blocking subsequent legitimate
|
||||
queries.
|
||||
|
||||
The design explicitly justifies pagination as preventing exactly this — "Results
|
||||
are paginated with a configurable page size (default: 500 events) ... This prevents
|
||||
broad queries from overwhelming the communication channel." The code honours the
|
||||
*default* but does not enforce an *upper bound* on a client-supplied override.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Clamp `pageSize` to a configurable maximum (e.g. `SiteEventLogOptions.MaxQueryPageSize`,
|
||||
default 5000) before using it. Also bound `KeywordFilter.Length` (e.g. 256 chars) —
|
||||
a leading-wildcard `LIKE` of an unbounded pattern is itself an expensive operation
|
||||
that runs under the same lock. Add a `Success: false, ErrorMessage: "PageSize
|
||||
exceeds maximum"` reject path so a misbehaving central is told why its query is
|
||||
refused.
|
||||
|
||||
### SiteEventLogging-018 — `FailedWriteCount` is exposed but never consumed by Health Monitoring
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:67-71,225-226` |
|
||||
|
||||
**Description**
|
||||
|
||||
`SiteEventLogger.FailedWriteCount` was added under SiteEventLogging-008 with the
|
||||
XML doc statement "Surfaced so Health Monitoring can detect a logging outage
|
||||
instead of relying on a local log line nobody is watching." The implementation is
|
||||
correct (`Interlocked.Increment` on write failure, `Interlocked.Read` getter), but
|
||||
a repo-wide search shows **no** caller anywhere in `src/` reads the property —
|
||||
neither `ScadaLink.HealthMonitoring`, the central health collector, nor the host's
|
||||
`/health` endpoint. The metric is dead-letter: a logging outage still goes
|
||||
unnoticed in production, contradicting the original finding's resolution claim.
|
||||
|
||||
The property is also exposed only on the concrete `SiteEventLogger`, not on
|
||||
`ISiteEventLogger`, so even if Health Monitoring were wired up it would have to
|
||||
take a concrete-type dependency (`internal Connection` removed, but
|
||||
`FailedWriteCount` remained concrete-only).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either (a) wire `FailedWriteCount` into the existing Health Monitoring metric
|
||||
pipeline (e.g. publish it alongside other 30-second-interval site metrics, and
|
||||
promote a sustained non-zero value to a Warning), and add it to `ISiteEventLogger`
|
||||
so the consumer doesn't downcast; or (b) acknowledge the metric is unobserved by
|
||||
softening the XML doc to "Available for future Health Monitoring integration" and
|
||||
file a tracking item for the wiring. The current doc claim is misleading.
|
||||
|
||||
### SiteEventLogging-019 — `EventLogPurgeService` runs on every host node; design says "active node"
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/ServiceCollectionExtensions.cs:21`, `docs/requirements/Component-SiteEventLogging.md:45` |
|
||||
|
||||
**Description**
|
||||
|
||||
`AddSiteEventLogging` calls `services.AddHostedService<EventLogPurgeService>()`,
|
||||
which registers the purge `BackgroundService` per host. On a 2-node site cluster
|
||||
both `node-a` and `node-b` start the service independently, so each runs its own
|
||||
30-day retention purge and 1 GB cap purge against its own local
|
||||
`site_events.db`. The design states only "A daily background job runs on the
|
||||
active node and deletes all events older than 30 days." (Component-SiteEventLogging,
|
||||
Storage section). In practice the standby node receives no writes, so its purge
|
||||
finds nothing to delete and is harmless — but the implementation does not match the
|
||||
documented "active node" gating, and the resolution note on SiteEventLogging-004
|
||||
already flagged that the *writer* runs on the standby too. The purge has the same
|
||||
shape.
|
||||
|
||||
Aligning to the design is also a defence against a future change that does write
|
||||
to the standby (e.g. local heartbeats), and removes the per-node wake-ups that
|
||||
contribute to `Microsoft.Extensions.Hosting` shutdown latency.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either (a) gate the purge service on "this node is the active member of `siteRole`"
|
||||
(check the cluster singleton ownership before each `RunPurge()`, or host the
|
||||
purge inside the same cluster singleton as `EventLogHandlerActor`), or (b) reword
|
||||
the design doc to "the purge runs on every node against its own local database;
|
||||
on the standby it is a no-op". Pick one; the current mismatch is a doc-vs-code
|
||||
defect.
|
||||
|
||||
### SiteEventLogging-020 — `severity` and `eventType` are unvalidated free-form strings; doc enumerates a set that is not enforced
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:144-156`, `src/ScadaLink.SiteEventLogging/ISiteEventLogger.cs:14-15` |
|
||||
|
||||
**Description**
|
||||
|
||||
`LogEventAsync` validates `eventType` and `severity` only for non-empty/non-whitespace.
|
||||
The XML doc enumerates the allowed values: `eventType` ∈ {script, alarm,
|
||||
deployment, connection, store_and_forward, instance_lifecycle}, `severity` ∈
|
||||
{Info, Warning, Error}. Nothing in the code enforces either set. Any caller can
|
||||
pass `"SCRIPT"`, `"Script"`, `"warn"`, `"ERR"`, or a typo and the row is inserted
|
||||
verbatim. Two follow-on consequences:
|
||||
|
||||
1. The `EventLogQueryService.Severity` filter is `severity = $severity` (exact
|
||||
match, case-sensitive by SQLite default `BINARY` collation). A row stored as
|
||||
`"error"` will not be returned for a query filtering on `"Error"`. The design
|
||||
lists severity as a first-class filter and the central UI will reasonably
|
||||
normalise to one casing — every row stored with a different casing is silently
|
||||
invisible to that filter.
|
||||
2. The `Events Logged` table in the design implicitly relies on a stable
|
||||
`event_type` enumeration to drive UI grouping; a typo'd `event_type` slips in
|
||||
silently and is hard to detect later.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Validate `eventType` and `severity` against a known set (or accept `enum`s on the
|
||||
interface, converting to canonical string at the call site). Reject unknown values
|
||||
with `ArgumentException` and log a single-shot warning during construction if a
|
||||
deployment is found to be using an unexpected value. Alternatively, normalise
|
||||
casing (`severity = severity.ToLowerInvariant()`) so the query filter is
|
||||
case-insensitive. Update the XML doc to match the enforced contract.
|
||||
|
||||
### SiteEventLogging-021 — `DateTimeOffset.Parse` uses the current culture; can throw on non-default locales
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:138` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ExecuteQuery` materialises rows via
|
||||
`DateTimeOffset.Parse(reader.GetString(1))`. `DateTimeOffset.Parse(string)` uses
|
||||
`CultureInfo.CurrentCulture` and `DateTimeStyles.None`. The stored format is ISO
|
||||
8601 round-trip (`"o"`), which is *usually* parseable in any culture — but a
|
||||
production node running with a non-default culture (e.g. Turkish "tr-TR", which
|
||||
has historically broken case-insensitive ASCII comparisons via the
|
||||
"Turkish-I" issue, or any culture that overrides the date/time separators) can
|
||||
parse incorrectly or throw `FormatException`. The exception is caught by the outer
|
||||
`try`, so the entire query is converted to a `Success: false` response — but the
|
||||
failure mode is silent and culture-dependent.
|
||||
|
||||
The recorder side stores via `DateTimeOffset.UtcNow.ToString("o")`, which is also
|
||||
culture-sensitive in the same way; on a hostile-culture node, the round-trip
|
||||
between insert and query is not guaranteed to be lossless without explicit
|
||||
culture pinning.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Parse with explicit invariant culture and round-trip style:
|
||||
`DateTimeOffset.Parse(reader.GetString(1), CultureInfo.InvariantCulture,
|
||||
DateTimeStyles.RoundtripKind)` (and the same for the `ToString("o", InvariantCulture)`
|
||||
emitters in `SiteEventLogger.LogEventAsync` and `EventLogPurgeService.PurgeByRetention`).
|
||||
Alternatively switch the schema to store `timestamp` as Unix-epoch `INTEGER` and
|
||||
avoid all string-parsing.
|
||||
|
||||
### SiteEventLogging-022 — `Cache=Shared` is redundant for a single-connection logger
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:52` |
|
||||
|
||||
**Description**
|
||||
|
||||
The connection string is built as
|
||||
`$"Data Source={options.Value.DatabasePath};Cache=Shared"`. SQLite's
|
||||
shared-cache mode is a *cross-connection* optimisation: it lets multiple
|
||||
`SqliteConnection`s in the same process share an in-process page cache. This
|
||||
logger owns exactly one `SqliteConnection` and serialises all access through
|
||||
`_writeLock`, so `Cache=Shared` cannot share with anything — the mode is dormant.
|
||||
At best it is dead configuration; at worst it adds (very small) per-statement
|
||||
lock overhead inside SQLite. The sister `SqliteAuditWriter` carries the same
|
||||
unused option, so the smell is a copy-and-paste pattern.
|
||||
|
||||
Shared-cache mode also subtly changes the semantics of `PRAGMA busy_timeout` and
|
||||
`PRAGMA locking_mode`, so leaving it on while *not* using it is a small future-foot
|
||||
gun if anyone later opens a second connection to the same file from another
|
||||
component on the same host (e.g. a tooling read-only viewer).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Drop `Cache=Shared` from the connection string — the logger is single-connection
|
||||
and gains nothing from it. If a future need to share the DB across connections in
|
||||
the same process arises, reintroduce it deliberately together with the busy_timeout
|
||||
and locking_mode review that should accompany it.
|
||||
|
||||
### SiteEventLogging-023 — Concurrent-stress test uses a non-volatile `stop` flag
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Location | `tests/ScadaLink.SiteEventLogging.Tests/EventLogPurgeServiceTests.cs:282-308` |
|
||||
|
||||
**Description**
|
||||
|
||||
`PurgeByStorageCap_ConcurrentWritesDoNotCorruptConnection` uses a plain `bool stop = false;`
|
||||
that the main test thread mutates after the purge task completes
|
||||
(`stop = true;`) while four background writer tasks are spin-checking `while (!stop)`.
|
||||
The flag is not declared `volatile`, not wrapped in `Volatile.Read/Volatile.Write`,
|
||||
and not behind a memory barrier. On a release build with a relaxed memory model
|
||||
the writer threads are permitted to cache the `stop = false` read indefinitely,
|
||||
which means in theory the test can hang past xUnit's per-test timeout instead of
|
||||
asserting `Empty(exceptions)`. The test relies on observed JIT/runtime behaviour
|
||||
that today happens to refresh the field across the `await _eventLogger.LogEventAsync`
|
||||
boundary, but that is an implementation detail rather than a contract.
|
||||
|
||||
The test is a regression test for SiteEventLogging-003; a flaky / hang-prone
|
||||
version of it can mask the very behaviour it is meant to pin.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Use a `CancellationTokenSource` (`while (!cts.IsCancellationRequested)`), or change
|
||||
`stop` to a `volatile bool`, or use `Interlocked.Exchange` / `Volatile.Read`.
|
||||
`CancellationTokenSource` is the canonical .NET pattern and also lets the test
|
||||
cooperate with xUnit's `Task.WhenAll` timeout.
|
||||
|
||||
Reference in New Issue
Block a user