docs(code-review): re-review 17 changed modules at 1f9de8a2 — 8 new findings
Re-reviewed the modules whose source changed since the last review baseline (full-review remediationfd618cf1+ InboundAPI Database-helper fixesb3c90143), focused on whether the fixes are sound and regression-free. 9 of 17 modules clean; 8 new findings (0 Critical, 0 High, 4 Medium, 4 Low), all code-verified by the orchestrator before recording: - DataConnectionLayer-029 (Med): DCL-023's unsubscribe-clears-in-flight reopens a double-subscribe window that leaks an orphaned alarm feed; the alarm completion handler overwrites the subscription id without the tag-path guard at line 908. - InboundAPI-031 (Med): WaitForAttribute's 5s grace backstop is tighter than the CommunicationService Ask's timeout+IntegrationTimeout (30s) round-trip slack, so a slow-but-valid timed-out 'false' arriving in the 5-30s window is cancelled into an unhandled OperationCanceledException/500 (contradicts spec 6 + its own comment). - SiteRuntime-032 (Med): SiteRuntime-029's wasPresent guard skips the deployed-count decrement when deleting a DISABLED instance (absent from both maps), drifting the health-dashboard tally; self-heals on singleton restart (observational, hence Med). - StoreAndForward-028 (Med): StoreAndForward-025 resets the register-guard but not _bufferedCount, so a same-instance Stop->Start re-seeds the depth gauge to ~2N. - AuditLog-017, CentralUI-037, ScriptAnalysis-009, SiteRuntime-033 (Low): a test-coverage gap plus stale doc-comments/spec following the remediation. Header commit/date bumped to1f9de8a2/ 2026-06-24 on all 17 modules; README regenerated (8 pending / 576 total).
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ZB.MOM.WW.ScadaBridge.DataConnectionLayer` |
|
||||
| Design doc | `docs/requirements/Component-DataConnectionLayer.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-06-20 |
|
||||
| Last reviewed | 2026-06-24 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `4307c381` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `1f9de8a2` |
|
||||
| Open findings | 1 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -1504,3 +1504,47 @@ decided explicitly.
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): the `_alarmSourceFilter` comment corrected from 'first subscriber wins' to 'last subscriber wins' (and notes co-subscribers share the single filter).
|
||||
|
||||
## Re-review — 2026-06-24 (commit `1f9de8a2`)
|
||||
|
||||
Focused re-review of the changes since the prior review — verifying the code-review remediation + feature fixes are sound and regression-free. Reviewed by a per-module workflow agent; findings code-verified by the orchestrator.
|
||||
|
||||
**Changes reviewed:** Two remediation fixes. (1) DCL-023 in DataConnectionActor.cs: the native-alarm subscribe path now inherits the DCL-021 obsolete-completion guard — HandleAlarmSubscribeCompleted releases the just-created adapter feed when no subscriber remains (last subscriber unsubscribed mid-flight), and HandleUnsubscribeAlarms now clears the _alarmSubscribesInFlight marker so a late completion is recognised as orphaned. (2) DCL-025 in MxGatewayDataConnection.cs: DisposeAsync now tears down the alarm stream (cancel/dispose _alarmCts, reset _alarmSubCount) under _alarmLock, mirroring DisconnectAsync, because the actor disposes adapters fire-and-forget without first calling DisconnectAsync. A focused regression test (DCL023_...) and a comment-only doc tweak on _alarmSourceFilter were added.
|
||||
|
||||
**Verdict:** Both fixes target genuine resource-leak bugs and are largely well-constructed: DCL-025 correctly closes a real MxGateway alarm-stream/CTS leak on the fire-and-forget DisposeAsync path (lock-guarded, idempotent, and OPC UA correctly does not share the leak since its client owns the alarm subscription), and DCL-023 correctly handles the documented "last subscriber unsubscribed while subscribe in flight" case with good test coverage. However, the DCL-023 change introduces a new, narrower race: clearing _alarmSubscribesInFlight on unsubscribe re-opens the in-flight gate, so a re-subscribe to the same source arriving before the late completion fires causes a second concurrent adapter subscribe whose first subscription id is then silently overwritten and leaked. The tag-path completion handler guards against exactly this overwrite (line 908); the alarm completion handler does not. Project builds clean with no warnings.
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | DCL-023 guard logic is correct for the documented case, but newly enables a double-subscribe leak on resubscribe-during-orphan because HandleAlarmSubscribeCompleted overwrites _alarmSubscriptionIds with no already-present guard. See finding. |
|
||||
| 2 | Akka.NET conventions | ☑ | State mutated only on the actor thread; adapter subscribe uses ContinueWith+PipeTo (no captured Self/Sender misuse); subscriber/Self/generation captured into locals before the async continuation. Compliant. |
|
||||
| 3 | Concurrency & thread safety | ☑ | DCL-025 teardown is correctly serialized via _alarmLock against concurrent Subscribe/UnsubscribeAlarmsAsync; cancel-after-dispose-and-null is safe. The DCL-023 ordering hazard is logical (single-threaded actor), not a data race; covered under Correctness. |
|
||||
| 4 | Error handling & resilience | ☑ | Orphaned-feed release is fire-and-forget (acceptable, best-effort cleanup); failed/late completions handled; warning logged. No issues. |
|
||||
| 5 | Security | ☑ | No secrets, SQL/LDAP, or untrusted input in the delta; no script-trust-forbidden APIs introduced. No issues found. |
|
||||
| 6 | Performance & resource management | ☑ | DCL-025 fixes a real CTS/Task leak on MxGateway failover/stop. DCL-023 fixes the original orphan leak but leaves a residual leak path (the resubscribe-overwrite case in the finding). |
|
||||
| 7 | Design-document adherence | ☑ | Component-DataConnectionLayer.md ref-counted 'open once, tear down on last unsubscribe' invariant is honoured; fixes reinforce it. No doc drift introduced. |
|
||||
| 8 | Code organization & conventions | ☑ | Fixes are localized, follow the established DCL-0xx tagged-comment convention, and mirror the existing DCL-021 / DisconnectAsync patterns. Consistent. |
|
||||
| 9 | Testing coverage | ☑ | New DCL023 test exercises the fixed path well. Gaps: no test for the resubscribe-during-orphan double-subscribe race (the new finding), and no test for DCL-025 DisposeAsync alarm teardown. |
|
||||
| 10 | Documentation & comments | ☑ | Comments on both fixes are accurate and explain intent/cross-references. The _alarmSourceFilter doc update (first→last subscriber wins) correctly matches the unconditional overwrite in HandleSubscribeAlarms. |
|
||||
|
||||
**New findings from this re-review (1):**
|
||||
|
||||
### DataConnectionLayer-029 — Resubscribe during orphaned in-flight subscribe leaks an alarm feed
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ZB.MOM.WW.ScadaBridge.DataConnectionLayer/Actors/DataConnectionActor.cs:1823` |
|
||||
|
||||
**Description**
|
||||
|
||||
The DCL-023 fix clears _alarmSubscribesInFlight in HandleUnsubscribeAlarms (line 1910). This re-opens the in-flight gate, so the following interleaving is now reachable: (1) Subscribe A for source S starts adapter subscribe #1, in-flight={S}; (2) the last subscriber unsubscribes — HandleUnsubscribeAlarms removes the subscriber set and clears in-flight (no subscription id stored yet, so no teardown); (3) a new Subscribe B for the same S arrives before completion #1 fires — HandleSubscribeAlarms sees neither _alarmSubscriptionIds.ContainsKey(S) nor _alarmSubscribesInFlight.Contains(S) (both cleared), so it issues a SECOND concurrent adapter subscribe #2; (4) completion #1 fires: _alarmSourceSubscribers now contains S again (B re-added it), so the orphan guard at line 1806 does NOT trigger and line 1823-1825 stores _alarmSubscriptionIds[S]=subId#1; (5) completion #2 fires and overwrites _alarmSubscriptionIds[S]=subId#2 — subId#1 is now leaked (never unsubscribed) and, since the adapter generation is unchanged, feed #1 keeps streaming routable transitions (duplicate delivery + resource leak). Pre-DCL-023, step 3 short-circuited because the in-flight marker survived, so the second subscribe was never issued — this leak path is newly introduced by the delta. Note the analogous tag-path completion handler guards against exactly this overwrite at line 908 (`if (result.AlreadySubscribed || _subscriptionIds.ContainsKey(result.TagPath)) continue;`); the alarm completion handler has no equivalent guard.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In HandleAlarmSubscribeCompleted, before storing the subscription id, guard against an already-present feed: if _alarmSubscriptionIds.ContainsKey(msg.SourceReference) (or a concurrent in-flight subscribe re-added the marker), release the just-created subId via UnsubscribeAlarmsAsync instead of overwriting — mirroring the tag-path re-check at line 908. Add a regression test that fires unsubscribe then a fresh subscribe for the same source while the first adapter subscribe is still parked, then releases both, asserting exactly one feed survives and the extra one is released.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user