docs(code-review): re-review 17 changed modules at 1f9de8a2 — 8 new findings

Re-reviewed the modules whose source changed since the last review baseline
(full-review remediation fd618cf1 + InboundAPI Database-helper fixes b3c90143),
focused on whether the fixes are sound and regression-free. 9 of 17 modules
clean; 8 new findings (0 Critical, 0 High, 4 Medium, 4 Low), all code-verified
by the orchestrator before recording:

- DataConnectionLayer-029 (Med): DCL-023's unsubscribe-clears-in-flight reopens a
  double-subscribe window that leaks an orphaned alarm feed; the alarm completion
  handler overwrites the subscription id without the tag-path guard at line 908.
- InboundAPI-031 (Med): WaitForAttribute's 5s grace backstop is tighter than the
  CommunicationService Ask's timeout+IntegrationTimeout (30s) round-trip slack, so
  a slow-but-valid timed-out 'false' arriving in the 5-30s window is cancelled into
  an unhandled OperationCanceledException/500 (contradicts spec 6 + its own comment).
- SiteRuntime-032 (Med): SiteRuntime-029's wasPresent guard skips the deployed-count
  decrement when deleting a DISABLED instance (absent from both maps), drifting the
  health-dashboard tally; self-heals on singleton restart (observational, hence Med).
- StoreAndForward-028 (Med): StoreAndForward-025 resets the register-guard but not
  _bufferedCount, so a same-instance Stop->Start re-seeds the depth gauge to ~2N.
- AuditLog-017, CentralUI-037, ScriptAnalysis-009, SiteRuntime-033 (Low): a
  test-coverage gap plus stale doc-comments/spec following the remediation.

Header commit/date bumped to 1f9de8a2 / 2026-06-24 on all 17 modules; README
regenerated (8 pending / 576 total).
This commit is contained in:
Joseph Doherty
2026-06-24 09:20:03 -04:00
parent 1f9de8a2b5
commit c42bb48585
18 changed files with 635 additions and 66 deletions
+47 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.DataConnectionLayer` |
| Design doc | `docs/requirements/Component-DataConnectionLayer.md` |
| Status | Reviewed |
| Last reviewed | 2026-06-20 |
| Last reviewed | 2026-06-24 |
| Reviewer | claude-agent |
| Commit reviewed | `4307c381` |
| Open findings | 0 |
| Commit reviewed | `1f9de8a2` |
| Open findings | 1 |
## Summary
@@ -1504,3 +1504,47 @@ decided explicitly.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): the `_alarmSourceFilter` comment corrected from 'first subscriber wins' to 'last subscriber wins' (and notes co-subscribers share the single filter).
## Re-review — 2026-06-24 (commit `1f9de8a2`)
Focused re-review of the changes since the prior review — verifying the code-review remediation + feature fixes are sound and regression-free. Reviewed by a per-module workflow agent; findings code-verified by the orchestrator.
**Changes reviewed:** Two remediation fixes. (1) DCL-023 in DataConnectionActor.cs: the native-alarm subscribe path now inherits the DCL-021 obsolete-completion guard — HandleAlarmSubscribeCompleted releases the just-created adapter feed when no subscriber remains (last subscriber unsubscribed mid-flight), and HandleUnsubscribeAlarms now clears the _alarmSubscribesInFlight marker so a late completion is recognised as orphaned. (2) DCL-025 in MxGatewayDataConnection.cs: DisposeAsync now tears down the alarm stream (cancel/dispose _alarmCts, reset _alarmSubCount) under _alarmLock, mirroring DisconnectAsync, because the actor disposes adapters fire-and-forget without first calling DisconnectAsync. A focused regression test (DCL023_...) and a comment-only doc tweak on _alarmSourceFilter were added.
**Verdict:** Both fixes target genuine resource-leak bugs and are largely well-constructed: DCL-025 correctly closes a real MxGateway alarm-stream/CTS leak on the fire-and-forget DisposeAsync path (lock-guarded, idempotent, and OPC UA correctly does not share the leak since its client owns the alarm subscription), and DCL-023 correctly handles the documented "last subscriber unsubscribed while subscribe in flight" case with good test coverage. However, the DCL-023 change introduces a new, narrower race: clearing _alarmSubscribesInFlight on unsubscribe re-opens the in-flight gate, so a re-subscribe to the same source arriving before the late completion fires causes a second concurrent adapter subscribe whose first subscription id is then silently overwritten and leaked. The tag-path completion handler guards against exactly this overwrite (line 908); the alarm completion handler does not. Project builds clean with no warnings.
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | DCL-023 guard logic is correct for the documented case, but newly enables a double-subscribe leak on resubscribe-during-orphan because HandleAlarmSubscribeCompleted overwrites _alarmSubscriptionIds with no already-present guard. See finding. |
| 2 | Akka.NET conventions | ☑ | State mutated only on the actor thread; adapter subscribe uses ContinueWith+PipeTo (no captured Self/Sender misuse); subscriber/Self/generation captured into locals before the async continuation. Compliant. |
| 3 | Concurrency & thread safety | ☑ | DCL-025 teardown is correctly serialized via _alarmLock against concurrent Subscribe/UnsubscribeAlarmsAsync; cancel-after-dispose-and-null is safe. The DCL-023 ordering hazard is logical (single-threaded actor), not a data race; covered under Correctness. |
| 4 | Error handling & resilience | ☑ | Orphaned-feed release is fire-and-forget (acceptable, best-effort cleanup); failed/late completions handled; warning logged. No issues. |
| 5 | Security | ☑ | No secrets, SQL/LDAP, or untrusted input in the delta; no script-trust-forbidden APIs introduced. No issues found. |
| 6 | Performance & resource management | ☑ | DCL-025 fixes a real CTS/Task leak on MxGateway failover/stop. DCL-023 fixes the original orphan leak but leaves a residual leak path (the resubscribe-overwrite case in the finding). |
| 7 | Design-document adherence | ☑ | Component-DataConnectionLayer.md ref-counted 'open once, tear down on last unsubscribe' invariant is honoured; fixes reinforce it. No doc drift introduced. |
| 8 | Code organization & conventions | ☑ | Fixes are localized, follow the established DCL-0xx tagged-comment convention, and mirror the existing DCL-021 / DisconnectAsync patterns. Consistent. |
| 9 | Testing coverage | ☑ | New DCL023 test exercises the fixed path well. Gaps: no test for the resubscribe-during-orphan double-subscribe race (the new finding), and no test for DCL-025 DisposeAsync alarm teardown. |
| 10 | Documentation & comments | ☑ | Comments on both fixes are accurate and explain intent/cross-references. The _alarmSourceFilter doc update (first→last subscriber wins) correctly matches the unconditional overwrite in HandleSubscribeAlarms. |
**New findings from this re-review (1):**
### DataConnectionLayer-029 — Resubscribe during orphaned in-flight subscribe leaks an alarm feed
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ZB.MOM.WW.ScadaBridge.DataConnectionLayer/Actors/DataConnectionActor.cs:1823` |
**Description**
The DCL-023 fix clears _alarmSubscribesInFlight in HandleUnsubscribeAlarms (line 1910). This re-opens the in-flight gate, so the following interleaving is now reachable: (1) Subscribe A for source S starts adapter subscribe #1, in-flight={S}; (2) the last subscriber unsubscribes — HandleUnsubscribeAlarms removes the subscriber set and clears in-flight (no subscription id stored yet, so no teardown); (3) a new Subscribe B for the same S arrives before completion #1 fires — HandleSubscribeAlarms sees neither _alarmSubscriptionIds.ContainsKey(S) nor _alarmSubscribesInFlight.Contains(S) (both cleared), so it issues a SECOND concurrent adapter subscribe #2; (4) completion #1 fires: _alarmSourceSubscribers now contains S again (B re-added it), so the orphan guard at line 1806 does NOT trigger and line 1823-1825 stores _alarmSubscriptionIds[S]=subId#1; (5) completion #2 fires and overwrites _alarmSubscriptionIds[S]=subId#2 — subId#1 is now leaked (never unsubscribed) and, since the adapter generation is unchanged, feed #1 keeps streaming routable transitions (duplicate delivery + resource leak). Pre-DCL-023, step 3 short-circuited because the in-flight marker survived, so the second subscribe was never issued — this leak path is newly introduced by the delta. Note the analogous tag-path completion handler guards against exactly this overwrite at line 908 (`if (result.AlreadySubscribed || _subscriptionIds.ContainsKey(result.TagPath)) continue;`); the alarm completion handler has no equivalent guard.
**Recommendation**
In HandleAlarmSubscribeCompleted, before storing the subscription id, guard against an already-present feed: if _alarmSubscriptionIds.ContainsKey(msg.SourceReference) (or a concurrent in-flight subscribe re-added the marker), release the just-created subId via UnsubscribeAlarmsAsync instead of overwriting — mirroring the tag-path re-check at line 908. Add a regression test that fires unsubscribe then a fresh subscribe for the same source while the first adapter subscribe is still parked, then releases both, asserting exactly one feed survives and the extra one is released.
**Resolution**
_Unresolved._