c42bb48585
Re-reviewed the modules whose source changed since the last review baseline (full-review remediationfd618cf1+ InboundAPI Database-helper fixesb3c90143), focused on whether the fixes are sound and regression-free. 9 of 17 modules clean; 8 new findings (0 Critical, 0 High, 4 Medium, 4 Low), all code-verified by the orchestrator before recording: - DataConnectionLayer-029 (Med): DCL-023's unsubscribe-clears-in-flight reopens a double-subscribe window that leaks an orphaned alarm feed; the alarm completion handler overwrites the subscription id without the tag-path guard at line 908. - InboundAPI-031 (Med): WaitForAttribute's 5s grace backstop is tighter than the CommunicationService Ask's timeout+IntegrationTimeout (30s) round-trip slack, so a slow-but-valid timed-out 'false' arriving in the 5-30s window is cancelled into an unhandled OperationCanceledException/500 (contradicts spec 6 + its own comment). - SiteRuntime-032 (Med): SiteRuntime-029's wasPresent guard skips the deployed-count decrement when deleting a DISABLED instance (absent from both maps), drifting the health-dashboard tally; self-heals on singleton restart (observational, hence Med). - StoreAndForward-028 (Med): StoreAndForward-025 resets the register-guard but not _bufferedCount, so a same-instance Stop->Start re-seeds the depth gauge to ~2N. - AuditLog-017, CentralUI-037, ScriptAnalysis-009, SiteRuntime-033 (Low): a test-coverage gap plus stale doc-comments/spec following the remediation. Header commit/date bumped to1f9de8a2/ 2026-06-24 on all 17 modules; README regenerated (8 pending / 576 total).
331 lines
24 KiB
Markdown
331 lines
24 KiB
Markdown
# Code Review — KpiHistory
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ZB.MOM.WW.ScadaBridge.KpiHistory` |
|
|
| Design doc | `docs/requirements/Component-KpiHistory.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-06-24 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `1f9de8a2` |
|
|
| Open findings | 0 |
|
|
|
|
## Summary
|
|
|
|
KpiHistory is a small, well-built observability module: a single ~305-line recorder
|
|
singleton (`KpiHistoryRecorderActor`), a strongly-typed options class with a fail-fast
|
|
validator, and a thin DI composition root. The owned code is clean — the actor is
|
|
textbook best-effort: every sample pass and purge sweep runs off the actor thread via
|
|
`PipeTo`, per-source faults are isolated so one throwing `IKpiSampleSource` never aborts
|
|
a tick or the other sources, the repository write/purge is guarded, no exception escapes
|
|
either tick handler, and a lifecycle `CancellationTokenSource` is cancelled in `PostStop`
|
|
so an in-flight pass observes shutdown promptly. The singleton is wired correctly in the
|
|
Host (ClusterSingletonManager + proxy + `PhaseClusterLeave` drain) and is deliberately
|
|
absent from the readiness barrier, exactly as the design requires. The options validator,
|
|
the EAV table mapping, both named indexes, and all four `IKpiSampleSource` implementations
|
|
exist and are registered on the central host as designed.
|
|
|
|
The dominant theme is **unbounded work under load**, in two places. First, the recorder
|
|
has **no in-flight guard** on its sample timer — directly contradicting its own XML-doc
|
|
claim to "mirror the NotificationOutboxActor timer + scope-per-tick + PipeTo pattern,"
|
|
because the NotificationOutbox dispatcher *does* hold an in-flight guard and the recorder
|
|
does not. When a sample pass runs longer than `SampleInterval` (slow/recovering DB), Akka
|
|
periodic timers enqueue, not coalesce, so overlapping `RunSamplePass` tasks pile up,
|
|
multiplying DB load at exactly the moment the store is struggling and double-writing
|
|
samples for overlapping windows. Second, `GetRawSeriesAsync` has **no server-side row
|
|
cap**: the design's `DefaultMaxSeriesPoints` ceiling is applied by `KpiSeriesBucketer`
|
|
only *after* the full raw window is materialised into memory — a 7-day window at the
|
|
default 60 s cadence is ~10 080 rows per series pulled to the Central UI before
|
|
downsampling, defeating the stated intent of the cap. A secondary theme is **bucketer
|
|
contract drift** (the "largest-timestamp-wins for unsorted input" doc claim is not what
|
|
the code does, and the short-series early-return emits raw capture timestamps where the
|
|
downsample path emits bucket-boundary timestamps) — both live in Commons but are core to
|
|
this module's query reducer. No Critical findings; one High, three Medium, two Low.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | Yes | `capturedAt`/cut-off captured on the actor thread (correct). `KpiSeriesBucketer` short-series early-return emits raw capture timestamps vs bucket-boundary timestamps on the downsample path (KpiHistory-005). Bucketer "largest-timestamp-wins for unsorted input" doc claim is false — it is last-in-iteration (KpiHistory-006). |
|
|
| 2 | Akka.NET conventions | Yes | `PipeTo` + scope-per-tick + off-thread I/O + `IWithTimers` all correct; sender not captured across awaits; messages immutable singletons. But no in-flight guard despite the XML claiming to mirror NotificationOutbox (KpiHistory-001). |
|
|
| 3 | Concurrency & thread safety | Yes | Actor state is effectively stateless; `_shutdownCts` lifecycle is correct. The missing in-flight guard allows overlapping sample passes under DB latency (KpiHistory-001). |
|
|
| 4 | Error handling & resilience | Yes | Per-source isolation, write/purge guards, and `OperationCanceledException` shutdown handling are all correct and tested. No issues beyond KpiHistory-001's load amplification. |
|
|
| 5 | Security | Yes | No injection surface — all queries are parameterised LINQ; `Source`/`Metric`/`Scope`/`ScopeKey` are equality predicates, never interpolated SQL. Scope isolation via `ScopeKey == scopeKey` (incl. `IS NULL` for Global) is correct. No issues found. |
|
|
| 6 | Performance & resource management | Yes | `GetRawSeriesAsync` returns the entire window with no server-side cap before bucketing (KpiHistory-002). `RecordSamplesAsync` short-circuits empty batches (good). Purge is set-based `ExecuteDeleteAsync` but is a single unbatched statement (KpiHistory-004). |
|
|
| 7 | Design-document adherence | Yes | The recorder XML claims to mirror NotificationOutbox's pattern but omits its in-flight guard (KpiHistory-001). `DefaultMaxSeriesPoints`-as-a-true-cap intent is undermined by KpiHistory-002. Otherwise faithful: singleton, not-readiness-gated, daily purge, EAV schema, indexes. |
|
|
| 8 | Code organization & conventions | Yes | Options class owned by the component (correct); validator co-located; `IKpiHistoryRepository`/`IKpiSampleSource`/`KpiSample`/bucketer in Commons, impl in ConfigurationDatabase (correct); singleton Props built in Host. No issues found. |
|
|
| 9 | Testing coverage | Yes | Recorder isolation, faulted-tick recovery, and purge cut-off are tested; validator bounds fully covered; bucketer has strong sorted-input coverage. Gaps: no overlapping-tick test, no unsorted-input bucketer test, no short-series timestamp-semantics assertion (KpiHistory-003). |
|
|
| 10 | Documentation & comments | Yes | XML is generally excellent. Two drift points: the "mirror NotificationOutbox" claim (KpiHistory-001) and the bucketer unsorted-input claim (KpiHistory-006). `SampleComplete`/`PurgeComplete` no-op handlers are documented. |
|
|
|
|
## Findings
|
|
|
|
### KpiHistory-001 — Recorder has no in-flight guard; overlapping sample passes pile up under DB latency
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.KpiHistory/KpiHistoryRecorderActor.cs:89-92`, `:103-107`, `:143-159` |
|
|
|
|
**Description**
|
|
|
|
The sample timer is a plain Akka periodic timer
|
|
(`Timers.StartPeriodicTimer(SampleTimerKey, SampleTick.Instance, …, interval: _options.SampleInterval)`).
|
|
`HandleSampleTick` launches `RunSamplePass(...)` off-thread via `PipeTo` and returns
|
|
immediately; the piped-back `SampleComplete` is a deliberate no-op
|
|
(`Receive<SampleComplete>(_ => { })`). There is **no in-flight guard** — nothing prevents
|
|
a second `SampleTick` from starting a second `RunSamplePass` while the first is still
|
|
awaiting its DB round-trip.
|
|
|
|
Akka periodic timers do not coalesce missed ticks — they enqueue. So when a sample pass
|
|
takes longer than `SampleInterval` (a slow, contended, or recovering central MS SQL —
|
|
exactly the regime where observability matters most), each subsequent tick spawns *another*
|
|
concurrent pass. Each pass opens its own DI scope + `DbContext` and issues its own
|
|
`AddRange` + `SaveChangesAsync`. The result is a self-amplifying load spiral against the
|
|
struggling store, plus duplicate sample rows whose `CapturedAtUtc` values straddle the same
|
|
real-time window (the design's "one shared tick timestamp" invariant assumes one pass per
|
|
tick).
|
|
|
|
The actor's own XML-doc states it "mirrors the `NotificationOutboxActor` timer +
|
|
scope-per-tick + PipeTo pattern" (lines 37-39) — but `NotificationOutboxActor` holds an
|
|
explicit in-flight boolean cleared on `DispatchComplete` precisely to serialise sweeps.
|
|
This recorder dropped that half of the pattern. The piped-back `SampleComplete` message is
|
|
the natural place the guard would be lowered, and its current empty body is the tell that
|
|
the guard was intended but omitted.
|
|
|
|
**Recommendation**
|
|
|
|
Add an in-flight guard: set a `_sampleInFlight` flag at the top of `HandleSampleTick`, skip
|
|
(and log at debug) the tick if already set, and clear it in the `SampleComplete` handler
|
|
(both success and failure projections already route there). This matches the
|
|
NotificationOutbox pattern the doc claims to follow and bounds the recorder to one pass per
|
|
tick. Add a regression test that drives two `SampleTick`s before the first pass completes
|
|
(e.g. a repository whose `RecordSamplesAsync` blocks on a gate) and asserts only one pass
|
|
ran. The purge tick is daily + idempotent so a guard there is optional, but consider the
|
|
same treatment for symmetry.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-06-20 (commit `fd618cf1`): added a `_sampleInFlight` guard to the recorder — `HandleSampleTick` skips (debug-logs) if a pass is in flight; the flag is cleared via a `SampleComplete` message piped on BOTH success and failure, so a faulted source can't wedge the guard. Overlapping-tick regression test added.
|
|
|
|
### KpiHistory-002 — `GetRawSeriesAsync` materialises the entire window with no server-side cap; `DefaultMaxSeriesPoints` is applied only after the fact
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Performance & resource management |
|
|
| Status | Deferred |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Repositories/IKpiHistoryRepository.cs:39-41` (contract); `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/KpiHistoryRepository.cs:45-62` (impl); consumed by `src/ZB.MOM.WW.ScadaBridge.CentralUI/Services/KpiHistoryQueryService.cs:77-93` |
|
|
|
|
**Description**
|
|
|
|
`GetRawSeriesAsync` has no `limit`/`maxPoints`/`Take` parameter. The implementation issues a
|
|
`Where(...).OrderBy(CapturedAtUtc).Select(...).ToListAsync()` that pulls **every** sample in
|
|
`[fromUtc, toUtc]` for the series into memory. The `DefaultMaxSeriesPoints` ceiling
|
|
(default 200, design-described as the cap that prevents "a single trend query [from
|
|
streaming] an arbitrarily large series to the Central UI" — `KpiHistoryOptionsValidator`
|
|
XML) is enforced by `KpiSeriesBucketer.Bucket` **after** the full raw set has already been
|
|
fetched across the wire and allocated as a `List<KpiSeriesPoint>` in the query service.
|
|
|
|
At the default 60 s sample cadence, a 24 h window is ~1 440 rows/series, a 7 d window is
|
|
~10 080 rows/series, and a 30 d window ~43 200 rows/series — per chart, with up to four
|
|
trend panels on a page each issuing its own query. The cap that the design relies on for
|
|
back-pressure provides none at the data tier; it only trims the in-memory result the UI
|
|
binds to. The `IX_KpiSample_Series` index makes the *range scan* efficient, but the row
|
|
count returned is still unbounded by anything except the window the parent page chooses.
|
|
|
|
**Recommendation**
|
|
|
|
Push the downsampling toward the store, or at least bound the fetch. Cheapest correct fix:
|
|
add an optional `int? maxRows` to `GetRawSeriesAsync` and have the query service pass a
|
|
generous multiple of `effectiveMax` (e.g. `effectiveMax * k`) so the bucketer still has
|
|
enough density to pick representative last-values while the DB-side `Take` caps the
|
|
transfer. A more thorough fix is a server-side bucketed aggregation (GROUP BY a computed
|
|
bucket index, MAX(CapturedAtUtc) per bucket), but that is a larger change the design
|
|
explicitly deferred ("v1 ships exactly one aggregation"). At minimum, document the
|
|
unbounded-fetch behaviour and the practical window ceiling so an operator does not point a
|
|
30 d chart at a busy multi-site deployment.
|
|
|
|
**Resolution**
|
|
|
|
Deferred 2026-06-20: bounding the raw window fetch (cheap `Take`-cap vs. server-side bucketed aggregation) is a design decision and the code lives in ConfigurationDatabase/Commons, not this project. Recorded as a query-path enhancement; the practical window ceiling should be documented when addressed.
|
|
|
|
### KpiHistory-003 — Missing tests: overlapping ticks, unsorted-input bucketing, short-series timestamp semantics
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Status | Resolved |
|
|
| Location | `tests/ZB.MOM.WW.ScadaBridge.KpiHistory.Tests/KpiHistoryRecorderActorTests.cs`; `tests/ZB.MOM.WW.ScadaBridge.Commons.Tests/Kpi/KpiSeriesBucketerTests.cs` |
|
|
|
|
**Description**
|
|
|
|
The recorder and bucketer tests are otherwise strong (per-source isolation, faulted-tick
|
|
recovery, purge cut-off, validator bounds, sorted-input bucketing, right-edge handling,
|
|
empty-bucket omission, out-of-window filtering). Three behaviour gaps remain, each tied to
|
|
a finding here:
|
|
|
|
1. **No overlapping-tick test** (KpiHistory-001). Every recorder test sends a single tick
|
|
and awaits its effect; nothing exercises a second `SampleTick` arriving while the first
|
|
pass is still in flight, so the missing in-flight guard is invisible to the suite. A
|
|
gated-repository test (block `RecordSamplesAsync`, send two ticks, count passes) would
|
|
pin the intended one-pass-per-tick behaviour.
|
|
2. **No unsorted-input bucketer test** (KpiHistory-006). All bucketer tests pass ascending
|
|
input, so the doc's "largest-timestamp-wins for unsorted input" claim is never checked —
|
|
and it is in fact wrong.
|
|
3. **No short-series timestamp-semantics assertion** (KpiHistory-005).
|
|
`Bucket_BucketStartUtc_IsSetToBucketStart…` covers only the downsample path; no test
|
|
asserts what `BucketStartUtc` the early-return (`raw.Count <= maxPoints`) path emits, so
|
|
the inconsistency between the two paths is untested.
|
|
|
|
**Recommendation**
|
|
|
|
Add the three tests above. The overlapping-tick test belongs in
|
|
`KpiHistoryRecorderActorTests` (it is a recorder behaviour); the two bucketer tests belong
|
|
in `KpiSeriesBucketerTests`.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-06-20 (commit `fd618cf1`): added the overlapping-tick gated-repository test plus unsorted-input and short-series bucketer tests (the latter pin the documented short-series behaviour).
|
|
|
|
### KpiHistory-004 — Retention purge is a single unbatched `ExecuteDeleteAsync`; a large backlog deletes in one transaction
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Deferred |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/KpiHistoryRepository.cs:65-71` |
|
|
|
|
**Description**
|
|
|
|
`PurgeOlderThanAsync` runs one set-based `Where(s => s.CapturedAtUtc < before).ExecuteDeleteAsync(...)`.
|
|
For the steady-state daily cadence this deletes one day of expired rows and is fine. But if
|
|
the purge has not run for an extended period (singleton down across a long failover window,
|
|
`PurgeInterval` mis-set and later corrected, or `RetentionDays` shortened), a single
|
|
unbounded `DELETE` can touch a very large row count in one transaction — lock escalation on
|
|
`KpiSample`, a long-running transaction, and transaction-log growth, which on the shared
|
|
central MS SQL can affect operational tables. The Audit Log purge path, by contrast, uses a
|
|
bounded batched delete for exactly this reason.
|
|
|
|
This is observability data on a non-partitioned `[PRIMARY]` table, so the blast radius is
|
|
bounded and the severity is Low — but the unbatched delete is a latent operational hazard
|
|
on the shared store.
|
|
|
|
**Recommendation**
|
|
|
|
Loop a bounded delete (`DELETE TOP (N) … WHERE CapturedAtUtc < @before`, or EF Core's
|
|
batching) until zero rows are affected, mirroring the Audit Log purge shape. Keep the
|
|
returned total for the existing log line.
|
|
|
|
**Resolution**
|
|
|
|
Deferred 2026-06-20: batching the retention purge is a shared-MS-SQL-store tradeoff (vs. the Audit Log's batched-delete precedent); low severity for non-partitioned observability data. Recorded as a future enhancement.
|
|
|
|
### KpiHistory-005 — `KpiSeriesBucketer` short-series early-return emits raw capture timestamps where the downsample path emits bucket-boundary timestamps
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Deferred |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.Commons/Types/Kpi/KpiSeriesBucketer.cs:57-58` (early return) vs `:93-94` (downsample) |
|
|
|
|
**Description**
|
|
|
|
`KpiSeriesBucketer.Bucket` has two return paths that disagree on what `BucketStartUtc`
|
|
means. When `raw.Count <= maxPoints` it returns `raw` unchanged — those points carry the
|
|
raw `CapturedAtUtc` as their `BucketStartUtc` (the repository builds
|
|
`new KpiSeriesPoint(s.CapturedAtUtc, s.Value)`). When `raw.Count > maxPoints` it returns
|
|
points whose `BucketStartUtc` is the **bucket boundary**
|
|
(`fromUtc + bucketIndex * bucketWidthTicks`), as the dedicated test
|
|
`Bucket_BucketStartUtc_IsSetToBucketStartNotRawPointTimestamp` asserts.
|
|
|
|
So the same series, charted at a density below vs above `maxPoints`, plots its points at
|
|
different x-positions: actual capture instants in the sparse case, evenly-spaced bucket
|
|
starts in the dense case. The downstream `KpiTrendChart` normalises X across
|
|
`[min(BucketStartUtc), max(BucketStartUtc)]`, so the visual impact is minor (the time range
|
|
is essentially the same), but the contract is inconsistent and the x-axis "tick spacing"
|
|
subtly changes as a window crosses the cap. This is the bucketer that the KpiHistory design
|
|
defines as the module's query reducer, so the inconsistency is in-scope even though the file
|
|
lives in Commons.
|
|
|
|
**Recommendation**
|
|
|
|
Make the two paths agree. Either document the difference explicitly on `Bucket` (the
|
|
short-series path returns raw capture instants; the downsample path returns bucket starts),
|
|
or — cleaner — have the short-series path also project onto a consistent timestamp basis if
|
|
exact bucket-start semantics are part of the contract. Add the short-series timestamp test
|
|
from KpiHistory-003.
|
|
|
|
**Resolution**
|
|
|
|
Deferred 2026-06-20: whether the bucketer's short-series early-return and the downsample path must agree on `BucketStartUtc` semantics (vs. documenting the difference) is a contract decision for the design owner. The doc was corrected (KpiHistory-006) to describe current behaviour; the contract change is deferred.
|
|
|
|
### KpiHistory-006 — `KpiSeriesBucketer` doc claims "largest timestamp within each bucket is selected" for unsorted input; the code selects last-in-iteration
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Resolved |
|
|
| Location | `src/ZB.MOM.WW.ScadaBridge.Commons/Types/Kpi/KpiSeriesBucketer.cs:20-21` (doc), `:88-97` (code) |
|
|
|
|
**Description**
|
|
|
|
The `raw` param doc states: *"If not sorted, the point with the largest timestamp within
|
|
each bucket is selected."* The code does not do this. When a point is stored into a bucket,
|
|
`best[bucketIndex].BucketStartUtc` is set to the **bucket-start** timestamp
|
|
(`fromUtc + bucketIndex * bucketWidthTicks`), not the raw point's timestamp. The
|
|
last-value comparison for a subsequent point in the same bucket is then
|
|
`point.BucketStartUtc > best[bucketIndex].BucketStartUtc` — i.e. it compares the new raw
|
|
point's capture time against the *bucket start*, which (for any in-bucket point) is almost
|
|
always true. The effect is that each later-in-iteration point overwrites the previous one
|
|
regardless of their relative timestamps, so the **last point in iteration order** wins, not
|
|
the point with the largest timestamp.
|
|
|
|
For the production caller this is harmless: `KpiHistoryRepository.GetRawSeriesAsync` always
|
|
`OrderBy(CapturedAtUtc)`, so iteration order equals time order and last-in-iteration is the
|
|
largest timestamp. But the documented contract for unsorted input is simply false, and the
|
|
"if ties, keep first encountered — stable" comment (line 89) is also inaccurate — the
|
|
overwrite triggers on equal-as-well-as-greater for any in-bucket point. A future caller that
|
|
trusts the unsorted-input guarantee will get wrong results silently.
|
|
|
|
**Recommendation**
|
|
|
|
Either (a) fix the comparison to track the selected raw point's actual timestamp (store the
|
|
raw `point.BucketStartUtc` alongside the emitted value and compare against that), making the
|
|
"largest timestamp wins" claim true for unsorted input; or (b) tighten the doc to state the
|
|
method requires ascending-sorted input and selects last-in-iteration otherwise, and drop the
|
|
inaccurate "largest timestamp" / "stable ties" language. Pair with the unsorted-input test
|
|
from KpiHistory-003.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-06-20 (commit `fd618cf1`): corrected the `KpiSeriesBucketer` param XML doc — dropped the false 'largest-timestamp-wins / stable ties' claim; now states it requires ascending-sorted input and selects last-in-iteration otherwise. Doc-only.
|
|
|
|
## Re-review — 2026-06-24 (commit `1f9de8a2`)
|
|
|
|
Focused re-review of the changes since the prior review — verifying the code-review remediation + feature fixes are sound and regression-free. Reviewed by a per-module workflow agent; findings code-verified by the orchestrator.
|
|
|
|
**Changes reviewed:** The diff adds an in-flight guard (`_sampleInFlight` bool field) to `KpiHistoryRecorderActor` so overlapping sample passes can never run concurrently. `HandleSampleTick` now skips (coalesces) a `SampleTick` and logs at debug if a pass is already in flight; otherwise it raises the guard before launching the off-thread `RunSamplePass`. The `SampleComplete` receive handler lowers the guard (it fires on both success and fault paths via the PipeTo projection). XML doc comments were updated accordingly. Tests: a new deterministic regression test (`OverlappingTick_WhileFirstPassInFlight_DoesNotStartSecondPass`) uses a gated repository to prove the second tick is skipped and the guard correctly resets; the pre-existing recovery test was hardened to re-send the tick per poll to avoid racing the asynchronous guard reset.
|
|
|
|
**Verdict:** The change is sound, minimal, and regression-free. It faithfully mirrors the already-shipped `NotificationOutboxActor` dispatch in-flight-guard pattern (skip-if-busy, raise-before-launch, lower-on-PipeTo-completion). The guard field is read and written only on the actor thread, so there is no thread-safety hazard; it cannot wedge permanently because `RunSamplePass` never throws and both PipeTo success and failure projections emit `SampleComplete`. The skip-on-overlap behaviour is consistent with the design doc, which describes best-effort per-tick sampling with no strict "a sample must land every interval" guarantee. The new behaviour is covered by a deterministic regression test, and all 4 actor tests pass. No new issues found.
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ☑ | Guard raised before launch, lowered on both success+fault PipeTo paths; cannot wedge (RunSamplePass never throws). 1:1 raise/lower pairing — no double-lower. No issues found. |
|
|
| 2 | Akka.NET conventions | ☑ | Off-thread work via PipeTo to Self; guard mutated only on the actor thread in the SampleComplete handler. Matches the documented NotificationOutboxActor pattern. No issues found. |
|
|
| 3 | Concurrency & thread safety | ☑ | _sampleInFlight is touched only on the actor thread (HandleSampleTick + SampleComplete receive); no shared mutable state or captured this/sender in the off-thread pass. No issues found. |
|
|
| 4 | Error handling & resilience | ☑ | Faulted pass still produces SampleComplete (belt-and-braces failure projection), so the guard always clears; best-effort observability contract preserved. No issues found. |
|
|
| 5 | Security | ☑ | No new I/O, no secrets, no user input, no injection surface introduced. Debug-level skip log carries no sensitive data. No issues found. |
|
|
| 6 | Performance & resource management | ☑ | The guard's purpose is precisely to avoid load amplification on a slow/recovering DB by preventing overlapping passes. No new allocations or leaks; field is reset on restart. No issues found. |
|
|
| 7 | Design-document adherence | ☑ | Component-KpiHistory.md describes best-effort per-tick sampling with no strict per-interval landing guarantee; coalescing overlaps is consistent and the doc already says the recorder mirrors NotificationOutboxActor. No drift. |
|
|
| 8 | Code organization & conventions | ☑ | Single new private bool field with thorough XML doc; inline comments accurate. Consistent with the sibling actor's naming/structure. No issues found. |
|
|
| 9 | Testing coverage | ☑ | New deterministic gated-repository regression test pins one-pass-per-tick and guard reset; existing recovery test hardened against the async-reset race. All 4 tests pass. Coverage is adequate for the delta. |
|
|
| 10 | Documentation & comments | ☑ | Field, handler, and SampleComplete XML docs updated to explain the guard, the enqueue-not-coalesce timer rationale, and the lower-on-fault behaviour. Accurate and clear. No issues found. |
|
|
|
|
_No new findings — the changes in this module are clean._
|