docs(code-review): re-review 17 changed modules at 1f9de8a2 — 8 new findings
Re-reviewed the modules whose source changed since the last review baseline (full-review remediationfd618cf1+ InboundAPI Database-helper fixesb3c90143), focused on whether the fixes are sound and regression-free. 9 of 17 modules clean; 8 new findings (0 Critical, 0 High, 4 Medium, 4 Low), all code-verified by the orchestrator before recording: - DataConnectionLayer-029 (Med): DCL-023's unsubscribe-clears-in-flight reopens a double-subscribe window that leaks an orphaned alarm feed; the alarm completion handler overwrites the subscription id without the tag-path guard at line 908. - InboundAPI-031 (Med): WaitForAttribute's 5s grace backstop is tighter than the CommunicationService Ask's timeout+IntegrationTimeout (30s) round-trip slack, so a slow-but-valid timed-out 'false' arriving in the 5-30s window is cancelled into an unhandled OperationCanceledException/500 (contradicts spec 6 + its own comment). - SiteRuntime-032 (Med): SiteRuntime-029's wasPresent guard skips the deployed-count decrement when deleting a DISABLED instance (absent from both maps), drifting the health-dashboard tally; self-heals on singleton restart (observational, hence Med). - StoreAndForward-028 (Med): StoreAndForward-025 resets the register-guard but not _bufferedCount, so a same-instance Stop->Start re-seeds the depth gauge to ~2N. - AuditLog-017, CentralUI-037, ScriptAnalysis-009, SiteRuntime-033 (Low): a test-coverage gap plus stale doc-comments/spec following the remediation. Header commit/date bumped to1f9de8a2/ 2026-06-24 on all 17 modules; README regenerated (8 pending / 576 total).
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ZB.MOM.WW.ScadaBridge.StoreAndForward` |
|
||||
| Design doc | `docs/requirements/Component-StoreAndForward.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-06-20 |
|
||||
| Last reviewed | 2026-06-24 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `4307c381` |
|
||||
| Open findings | 0 (025 Resolved; 026, 027 Deferred; 5 Deferred total: 002, 011, 012, 026, 027; all prior Open findings resolved) |
|
||||
| Commit reviewed | `1f9de8a2` |
|
||||
| Open findings | 1 (025 Resolved; 026, 027 Deferred; 5 Deferred total: 002, 011, 012, 026, 027; all prior Open findings resolved) |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -1736,3 +1736,46 @@ not a forced fix. Options, in rough order of preference:
|
||||
|
||||
Deferred 2026-06-20: the per-retry `"Retried"` site-event volume is spec-conformant (Component-StoreAndForward.md lists 'retried' as a logged activity) and the site log is capped, so the residual concern is history eviction, not disk exhaustion. Whether to drop/sample/threshold the event vs. document the volume is a design-vs-scale decision for the owner.
|
||||
|
||||
## Re-review — 2026-06-24 (commit `1f9de8a2`)
|
||||
|
||||
Focused re-review of the changes since the prior review — verifying the code-review remediation + feature fixes are sound and regression-free. Reviewed by a per-module workflow agent; findings code-verified by the orchestrator.
|
||||
|
||||
**Changes reviewed:** The diff implements the StoreAndForward-025 remediation in StoreAndForwardService.cs: it adds a `_queueDepthProvider` field that retains the exact delegate registered with the process-global `ScadaBridgeTelemetry` queue-depth gauge in `StartAsync`, and reworks `StopAsync` so that after the in-flight retry sweep drains it deregisters that delegate by identity via the new `ScadaBridgeTelemetry.ClearQueueDepthProvider` (compare-and-clear), nulls `_queueDepthProvider`, and resets the `_queueDepthProviderRegistered` one-time guard to 0. The sweep-wait block was also refactored from an early-return guard into an `if (inflight is not null && !inflight.IsCompleted)` block (behaviour-preserving). A companion `ClearQueueDepthProvider` was added to ScadaBridgeTelemetry and three regression tests were added to QueueDepthGaugeTests.
|
||||
|
||||
**Verdict:** The StoreAndForward-025 fix is largely sound: the identity-checked compare-and-clear correctly prevents a late stop from stomping a successor instance's provider (proven by the passing late-stop test), the reference-type `Interlocked.CompareExchange` overload binds correctly, the project builds clean, and all five QueueDepthGaugeTests pass. The StopAsync refactor preserves the StoreAndForward-024 sweep-wait semantics. However, the change introduces one regression in the restart path it newly enables: resetting `_queueDepthProviderRegistered` to 0 without resetting `_bufferedCount` means a Stop→Start cycle on a non-empty buffer re-runs the additive seed and double-counts still-buffered rows into the gauge — contradicting the fix's own comment that a later StartAsync "re-registers cleanly." Impact is gauge-accuracy only (the metric is documented as approximate), so it is bounded, but it is a real behavioural drift versus the prior permanent-latch guard.
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | StopAsync resets _queueDepthProviderRegistered to 0 but never resets _bufferedCount, so a same-instance Stop→Start over a non-empty buffer re-runs Interlocked.Add(pending) and double-counts the gauge. See finding. |
|
||||
| 2 | Akka.NET conventions | ☑ | No issues found. StoreAndForwardService is a plain service, not an actor; no actor-state, Tell/Ask, or supervision concerns touched by the diff. |
|
||||
| 3 | Concurrency & thread safety | ☑ | Telemetry clear uses Interlocked.CompareExchange (reference overload) and the gauge reads via Volatile — sync-safe. _queueDepthProvider is read/written only on the same instance during the documented single-threaded Start/Stop lifecycle. No data race in the changed code. |
|
||||
| 4 | Error handling & resilience | ☑ | StopAsync's bounded WaitAsync with TimeoutException + catch-all (swallow + log Warning) is preserved through the refactor; the provider clear runs unconditionally afterward so a timed-out sweep still releases the gauge slot. No issues found. |
|
||||
| 5 | Security | ☑ | No issues found. No secrets, no SQL/LDAP, no script-trust surface, no logging of sensitive data in the changed code. |
|
||||
| 6 | Performance & resource management | ☑ | The fix correctly addresses the original StoreAndForward-025 leak: the global delegate slot no longer pins the dead service after a graceful StopAsync, and the gauge stops reporting a frozen depth. |
|
||||
| 7 | Design-document adherence | ☑ | Component-StoreAndForward.md references buffer-depth metrics only generically; the gauge provider lifecycle is an implementation detail. No spec drift. Note: Host AkkaHostedService.StopAsync never calls StoreAndForwardService.StopAsync, so the clear path is not exercised on real host shutdown — pre-existing wiring gap, out of scope of this file. |
|
||||
| 8 | Code organization & conventions | ☑ | No issues found. New field, telemetry method, and XML docs follow existing patterns; the explicit (Func<long>) cast on the seed delegate is needed for overload disambiguation. |
|
||||
| 9 | Testing coverage | ☑ | Three new tests cover clear-on-stop, late-stop-doesn't-clobber-successor, and the additive seed. No test covers a Stop→Start restart on the same instance with a non-empty buffer, which is exactly the double-count path the finding describes. |
|
||||
| 10 | Documentation & comments | ☑ | The StopAsync XML doc and the inline comment at lines 405-406/451-452 claim a later StartAsync 're-registers cleanly', which is inaccurate for _bufferedCount — the comment overstates the cleanliness of the restart path. Tied to the finding. |
|
||||
|
||||
**New findings from this re-review (1):**
|
||||
|
||||
### StoreAndForward-028 — StopAsync resets register-guard but not _bufferedCount, double-counting gauge on restart
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ZB.MOM.WW.ScadaBridge.StoreAndForward/StoreAndForwardService.cs:459` |
|
||||
|
||||
**Description**
|
||||
|
||||
The StoreAndForward-025 fix added `Interlocked.Exchange(ref _queueDepthProviderRegistered, 0)` in StopAsync (line 459) so a later StartAsync re-registers — and the new XML doc/comment (lines 405-406, 451-452) explicitly claims StartAsync then 're-registers cleanly.' But StopAsync never resets `_bufferedCount` (verified: it is only ever Increment/Decrement/Add, never reset — lines 154/359/619/678/712/751/973). On a same-instance Stop→Start cycle the guard CompareExchange at line 357 now succeeds again, so `Interlocked.Add(ref _bufferedCount, pending)` (line 359) re-adds the freshly-read durable Pending count on top of the leftover in-memory value from the prior lifecycle. When the service stops with a non-empty buffer (the normal case — S&F stops with queued rows), `_bufferedCount` is ~N at stop and stays N; the restart adds another N, so the gauge reports ~2N. Before this change the guard was a permanent latch (never reset), so a second StartAsync short-circuited and the seed never re-ran — no double-count. The service is a DI singleton and the Host comment notes the test harness triggers second start/stop interleavings, so this restart path is reachable. Impact is confined to gauge accuracy (the depth metric is documented as approximate, dashboard-only — no data loss), but it is a real regression in the path this change newly enables and mis-documents.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Reset the cached counter on stop alongside the guard, e.g. `Interlocked.Exchange(ref _bufferedCount, 0);` immediately before/after the `Interlocked.Exchange(ref _queueDepthProviderRegistered, 0)` at line 459, so a subsequent StartAsync seeds from a clean zero base. Then add a regression test to QueueDepthGaugeTests that buffers N rows, calls StopAsync, calls StartAsync again on the same instance, and asserts the gauge reports N (not 2N). Alternatively, correct the 're-registers cleanly' comment to document that restart is unsupported and keep the guard a permanent latch.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user