review(Core.ScriptedAlarms): stop shelving timer on failed reload + drop dead branch
Re-review at 7286d320. -015: dispose shelving timer at top of LoadAsync so a failed
reload doesn't leave it firing against partially-cleared state + test. -014: make
pendingEmissions required (removes unreachable fire-under-gate branch that could
reintroduce the -003 deadlock).
This commit is contained in:
@@ -4,8 +4,8 @@
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-23 |
|
||||
| Commit reviewed | `a9be809` |
|
||||
| Review date | 2026-06-19 |
|
||||
| Commit reviewed | `7286d320` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 0 |
|
||||
|
||||
@@ -49,6 +49,38 @@ re-examined; existing closed findings stay as audit trail.
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | Core.ScriptedAlarms-013 |
|
||||
|
||||
### 2026-06-19 re-review (commit `7286d320`)
|
||||
|
||||
Full re-review at HEAD. Covers all commits since `a9be809`: `AlarmPredicateContext`
|
||||
moved to `Core.Scripting.Abstractions`; `ScriptedAlarmEvent` gained `Comment` and
|
||||
`HistorizeToAveva` fields; `AlarmAcknowledgeRequest.OperatorUser` wired through
|
||||
`ScriptedAlarmSource`; `CompiledScriptCache` routing added; `_alarms.Clear()` in
|
||||
`Dispose` re-enabled under `-016`; mass XML-doc backfill; CVE patches. All prior
|
||||
findings remain Resolved. Cross-module observation noted below checklist
|
||||
(not recorded as a module finding — root cause is in Server).
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | No issues found |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | No issues found |
|
||||
| 4 | Error handling & resilience | Core.ScriptedAlarms-015 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | Core.ScriptedAlarms-014 |
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
**Cross-module observation (2026-06-19):** `OtOpcUaNodeManager.cs:677` constructs an
|
||||
`AlarmCommand` with `User: string.Empty` for the system-timer `OnTimedUnshelve` bypass
|
||||
path. This empty string flows to `ScriptedAlarmHostActor.OnAlarmCommand` →
|
||||
`_engine.UnshelveAsync("", ...)` → `Part9StateMachine.ApplyUnshelve` which throws
|
||||
`ArgumentException` for `IsNullOrWhiteSpace(user)`. The exception is caught and
|
||||
logged at the actor level (not a crash), but the auto-unshelve silently fails. Root
|
||||
cause is in the Server module (`OtOpcUaNodeManager.cs`); the engine's user-validation
|
||||
guard is correct. Fix belongs in a Server module review.
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.ScriptedAlarms-001
|
||||
@@ -275,3 +307,74 @@ concurrent-read-while-writer scenario against a plain `Dictionary`
|
||||
comparisons + single-key reads against a quiesced engine. No code change
|
||||
required — the engine's correctness was always there; only the test-side
|
||||
contract was undocumented.
|
||||
|
||||
### Core.ScriptedAlarms-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `ScriptedAlarmEngine.cs:539-540` (pre-fix) |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `EvaluatePredicateToStateAsync` had a nullable optional parameter
|
||||
`List<ScriptedAlarmEvent>? pendingEmissions = null` with a fallback branch
|
||||
`else FireEvent(evt); // LoadAsync path: no caller-supplied list, fire here.`
|
||||
The fallback was dead code — both callers (`LoadAsync` at line 267 and
|
||||
`ReevaluateAsync` at line 453) always pass an explicit `pending` list, so the
|
||||
`pendingEmissions is null` branch was never reachable. The comment was actively
|
||||
misleading: it implied a valid caller path that did not exist. Had a future
|
||||
developer called the method with `null` (or with a new call site that omitted the
|
||||
argument), the event would have been fired under `_evalGate` — directly
|
||||
reintroducing the deadlock hazard fixed by Core.ScriptedAlarms-003.
|
||||
|
||||
**Recommendation:** Remove the nullable default and dead branch; make the parameter
|
||||
required (`List<ScriptedAlarmEvent> pendingEmissions`) so the compiler enforces the
|
||||
contract that all callers must supply a pending list.
|
||||
|
||||
**Resolution:** Resolved 2026-06-19 — changed `pendingEmissions` from
|
||||
`List<ScriptedAlarmEvent>? pendingEmissions = null` to required
|
||||
`List<ScriptedAlarmEvent> pendingEmissions`; removed the dead `else FireEvent(evt)`
|
||||
branch and its misleading comment; added a `<remarks>` doc explaining why the
|
||||
parameter is required (preventing a future caller from accidentally firing events
|
||||
under the gate). Both existing callers already passed a non-null list so no call
|
||||
sites changed. Build and all 71 tests pass.
|
||||
|
||||
---
|
||||
|
||||
### Core.ScriptedAlarms-015
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ScriptedAlarmEngine.cs:284-298` (pre-fix) / `ScriptedAlarmEngine.cs:184-198` (post-fix) |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `LoadAsync` disposed `_shelvingTimer` (the prior generation's 5-second
|
||||
timer) AFTER the `compileFailures` guard on line 241. If a reload call threw at the
|
||||
compile step (all alarms fail to compile), execution never reached the
|
||||
`_shelvingTimer?.Dispose()` call. The prior-generation timer kept firing against
|
||||
whatever `_alarms` remained after the partial clear + partial recompile — a window
|
||||
of inconsistency between the failed reload and the eventual `Dispose()` call.
|
||||
`Dispose()` itself does call `_shelvingTimer?.Dispose()` on line 734, so there is
|
||||
no permanent resource leak; the risk is unexpected shelving-check callbacks
|
||||
operating on partially-cleared state during that window. On a first-load failure
|
||||
(where `_loaded` remains `false`) the issue is benign since `EnsureLoaded()` gates
|
||||
all public API calls. On a reload failure (previously loaded) the timer runs against
|
||||
whatever alarms happened to compile before the bad ones.
|
||||
|
||||
**Recommendation:** Move `_shelvingTimer?.Dispose(); _shelvingTimer = null;` to the
|
||||
very start of the `try` block, alongside `UnsubscribeFromUpstream()`, so the prior
|
||||
timer is always stopped before `_alarms` is cleared — regardless of whether the
|
||||
reload succeeds or fails.
|
||||
|
||||
**Resolution:** Resolved 2026-06-19 — moved `_shelvingTimer?.Dispose(); _shelvingTimer = null;`
|
||||
to the start of the `try` block in `LoadAsync`, before `UnsubscribeFromUpstream()` and
|
||||
`_alarms.Clear()`. Removed the now-redundant `_shelvingTimer?.Dispose()` that previously
|
||||
appeared just before the new-timer assignment. Updated the comment at the new-timer
|
||||
assignment site to note that `_shelvingTimer` is guaranteed null here. Added regression
|
||||
test `Failed_reload_leaves_engine_recoverable_and_disposes_cleanly` documenting that
|
||||
after a failed reload + successful third load, `Dispose()` completes cleanly with no
|
||||
background tasks outliving the engine.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user