fix(scripted-alarms): update findings.md for resolved Medium findings

Mark Core.ScriptedAlarms-002, -004, -005, -007, -012 as Resolved with one-line descriptions. Update open-findings count from 11 to 6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:54 -04:00
parent 69994f9cf6
commit 4dcfaace62
1 changed files with 11 additions and 11 deletions
@@ -7,7 +7,7 @@
 | Review date | 2026-05-22 |
 | Commit reviewed | `76d35d1` |
 | Status | Reviewed |
-| Open findings | 11 |
+| Open findings | 6 |

 ## Checklist coverage

@@ -51,13 +51,13 @@ a category produced nothing rather than leaving it blank.
 | Severity | Medium |
 | Category | Correctness & logic bugs |
 | Location | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
-| Status | Open |
+| Status | Resolved |

 **Description:** `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it unconditionally assigns `_shelvingTimer = new Timer(...)` without disposing the timer created by a previous `LoadAsync` call. A second `LoadAsync` therefore leaks the old `Timer` and leaves two timers running concurrently against the same `_alarms`/`_evalGate`. The old timer's `RunShelvingCheck` keeps firing forever.

 **Recommendation:** Dispose any existing `_shelvingTimer` before reassigning it, e.g. `_shelvingTimer?.Dispose();` immediately before line 162, inside the `_evalGate` critical section. If reload is genuinely not supported, instead guard `LoadAsync` against a second call and document it as one-shot.

-**Resolution:** _(open)_
+**Resolution:** Resolved 2026-05-22 — added `_shelvingTimer?.Dispose()` before the timer reassignment in `LoadAsync` so a second load call does not leak the previous timer.

 ### Core.ScriptedAlarms-003

@@ -81,13 +81,13 @@ a category produced nothing rather than leaving it blank.
 | Severity | Medium |
 | Category | Concurrency & thread safety |
 | Location | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
-| Status | Open |
+| Status | Resolved |

 **Description:** During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common pattern, and the `ITagUpstreamSource` contract does not forbid it), the observer callback `OnUpstreamChange` runs on the calling thread, schedules `ReevaluateAsync`, which calls `_evalGate.WaitAsync`. That does not deadlock (the reevaluation task simply blocks until `LoadAsync` releases the gate), but it can cause a re-evaluation to run against a half-initialised `_alarms`/index, and the value written to `_valueCache` on line 141 may be immediately overwritten by the subscription's synchronous push with no defined ordering. The cold-start guard partly masks this, but the ordering between the seed read (line 141) and the subscription push is unspecified and may seed a stale value.

 **Recommendation:** Subscribe to all upstream tags after the seed reads and after `_loaded = true`, or capture the subscription's first push into the cache and treat `SubscribeTag` as the single source of truth (drop the separate `ReadTag` seed). Document the expected `ITagUpstreamSource` delivery semantics (does `SubscribeTag` push an initial value?).

-**Resolution:** _(open)_
+**Resolution:** Resolved 2026-05-22 — split the seed/subscribe loop: `ReadTag` seeds `_valueCache`, persisted-state restore runs, `_loaded = true` is set, then `SubscribeTag` is called; any synchronous initial push now arrives after `_alarms` is fully initialised and correctly queues behind the gate.

 ### Core.ScriptedAlarms-005

@@ -96,13 +96,13 @@ a category produced nothing rather than leaving it blank.
 | Severity | Medium |
 | Category | Concurrency & thread safety |
 | Location | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
-| Status | Open |
+| Status | Resolved |

 **Description:** `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran, then proceed into `ShelvingCheckAsync`, which awaits `_evalGate` and mutates `_alarms` — concurrently with `Dispose`'s `_alarms.Clear()` at line 422 (which runs outside `_evalGate`). `Timer.Dispose()` does not wait for the running callback to finish. The result is a possible `InvalidOperationException` from a dictionary mutated during enumeration, or a save of stale state to the store after dispose. The same applies to a `ReevaluateAsync` in flight from a late upstream push.

 **Recommendation:** Use `Timer.Dispose(WaitHandle)` (or `DisposeAsync`) to wait for the callback to drain, and perform `_alarms.Clear()` under `_evalGate` (or simply drop the clear — the object is being discarded). Also have `ShelvingCheckAsync`/`ReevaluateAsync` re-check `_disposed` after acquiring the gate before mutating/saving.

-**Resolution:** _(open)_
+**Resolution:** Resolved 2026-05-22 — added `_disposed` re-checks in `ReevaluateAsync` and `ShelvingCheckAsync` after acquiring `_evalGate` so late callbacks bail out cleanly; dropped the unsynchronised `_alarms.Clear()` from `Dispose` since the object is being discarded and the clear raced concurrent reads.

 ### Core.ScriptedAlarms-006

@@ -126,13 +126,13 @@ a category produced nothing rather than leaving it blank.
 | Severity | Medium |
 | Category | Error handling & resilience |
 | Location | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
-| Status | Open |
+| Status | Resolved |

 **Description:** Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` it surfaces to the Part 9 method caller *after* the in-memory `_alarms` entry was already updated (line 215 runs before the save on line 216), leaving the in-memory state and the persisted state divergent; in `ReevaluateAsync`/`ShelvingCheckAsync` it is caught and logged, but again the in-memory `_alarms` entry was already advanced (lines 250/386) so the persisted store silently falls behind the live state. After a restart, startup recovery reloads the stale persisted state and operators can see a re-raised or re-ackable alarm. The docs claim "the store's view is always consistent with the in-memory state" (`docs/ScriptedAlarms.md` State persistence) — that invariant is not actually enforced.

 **Recommendation:** Save before committing the in-memory update, or roll back the in-memory entry if `SaveAsync` fails, so the two never diverge. Classify transient store failures and retry, and surface a hard error/health-degraded signal if persistence is permanently failing rather than silently logging and continuing.

-**Resolution:** _(open)_
+**Resolution:** Resolved 2026-05-22 — reordered `SaveAsync`/`_alarms[id]=` in `ApplyAsync`, `ReevaluateAsync`, and `ShelvingCheckAsync` so persistence happens before the in-memory update; a store failure now leaves both views at the prior state rather than diverging.

 ### Core.ScriptedAlarms-008

@@ -201,10 +201,10 @@ a category produced nothing rather than leaving it blank.
 | Severity | Medium |
 | Category | Testing coverage |
 | Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
-| Status | Open |
+| Status | Resolved |

 **Description:** Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEngine` driving the timer with an injectable clock; (2) `ConfirmAsync`, `TimedShelveAsync`, `UnshelveAsync`, `EnableAsync` engine methods (only `Acknowledge`, `OneShotShelve`, `Disable`, `AddComment` are exercised); (3) `OnEvent` subscriber-throws isolation (`EmitEvent` catch on line 357); (4) `IAlarmStateStore.SaveAsync` failure handling (finding 007); (5) re-entrant `LoadAsync` and the timer leak (finding 002); (6) the cold-start `AreInputsReady` guard with Bad / null / Uncertain inputs. The `clock` and `scriptTimeout` constructor parameters exist specifically to make timer/timeout tests deterministic but no test uses them.

 **Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.

-**Resolution:** _(open)_
+**Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.