fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path

Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary + AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch held in _scratchByAlarmId, refilled in place under _evalGate on each evaluation. The hot path no longer allocates per upstream tag change. Why this matters: On a busy line where many tags feeding many alarms change frequently, the old BuildReadCache allocated a fresh dictionary + context on every predicate evaluation — a steady stream of short-lived allocations the GC eventually has to reclaim. With the reuse, the dictionary and context are allocated once per alarm (on first evaluation) and refilled in place across every subsequent re-eval. Implementation: - New private AlarmScratch class holds the reusable Dictionary<string, DataValueSnapshot> read cache (pre-sized to the alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by reference. The context observes refilled values without being re-created. - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the engine, cleared in LoadAsync alongside _alarms so a config-publish drops the prior generation's scratch (Inputs / Logger may change). - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear + repopulate the dictionary in place, then runs the predicate against the reused context. - BuildReadCache removed. Safety: Reuse is serialised under _evalGate which guarantees no two threads ever observe the same scratch in a half-refilled state. The AlarmPredicateContext is bound to the scratch dictionary by reference, so the predicate's ctx.GetTag(path) sees the freshly-refilled values rather than a stale snapshot. Verification: - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests locking the reuse contract). - All 56 VirtualTags tests still pass (unchanged). - All 104 Core.Scripting tests still pass (unchanged). New tests in ScriptedAlarmEngineTests: - Reevaluation_reuses_the_same_read_cache_dictionary — asserts ReferenceEquals(scratch_before, scratch_after) across two evaluations of the same alarm. - Reevaluation_reuses_the_same_predicate_context — same, for the context. - LoadAsync_drops_the_prior_generations_scratch — asserts a config publish wipes the prior scratch (so a stale Logger / Inputs can't leak into the new generation). Internal test hooks TryGetScratchReadCacheForTest / TryGetScratchContextForTest added via the existing InternalsVisibleTo for the tests project. Kept internal — not part of the public engine surface. Docs: - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section rewritten as "hot-path allocation reuse" documenting the new contract + reuse safety reasoning + the three regression tests. - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped Won't Fix → Resolved. - code-reviews/README.md regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:10:09 -04:00
parent 7b6ab2ec6f
commit 0001cdd579
5 changed files with 202 additions and 13 deletions
@@ -925,4 +925,94 @@ public sealed class ScriptedAlarmEngineTests
        public Task RemoveAsync(string alarmId, CancellationToken ct)
            => _inner.RemoveAsync(alarmId, ct);
    }
+
+    // --- Core.ScriptedAlarms-009: per-alarm evaluation-scratch reuse ---
+
+    [Fact]
+    public async Task Reevaluation_reuses_the_same_read_cache_dictionary()
+    {
+        // Pre-009 the engine allocated a fresh Dictionary<string, DataValueSnapshot>
+        // on every upstream-change tick; on a busy line this was a steady allocation
+        // stream on the hot path. The fix: one dictionary per alarm, refilled in place
+        // under _evalGate. Test asserts the dictionary instance is identical across
+        // two consecutive evaluations of the same alarm.
+        var up = new FakeUpstream();
+        up.Set("Temp", 50);
+        using var eng = Build(up, out _);
+        await eng.LoadAsync(
+            [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")],
+            TestContext.Current.CancellationToken);
+
+        // First evaluation runs during LoadAsync. Capture the scratch reference now.
+        var scratchAfterLoad = eng.TryGetScratchReadCacheForTest("HighTemp");
+        scratchAfterLoad.ShouldNotBeNull(
+            "the scratch should have been allocated during LoadAsync's initial evaluation");
+
+        // Trigger a re-evaluation by pushing an upstream change.
+        up.Push("Temp", 150);
+        await WaitForAsync(() =>
+            eng.GetState("HighTemp")!.Active == AlarmActiveState.Active);
+
+        var scratchAfterPush = eng.TryGetScratchReadCacheForTest("HighTemp");
+        ReferenceEquals(scratchAfterLoad, scratchAfterPush).ShouldBeTrue(
+            "the read-cache dictionary must be the *same* instance across evaluations " +
+            "(Core.ScriptedAlarms-009) — a per-call allocation would defeat the fix.");
+        scratchAfterPush!["Temp"].Value.ShouldBe(150, "refill must update the existing dictionary in place");
+    }
+
+    [Fact]
+    public async Task Reevaluation_reuses_the_same_predicate_context()
+    {
+        // The context wraps the read-cache by reference; refilling the dictionary
+        // updates what the script sees. Reusing the context spares a per-call object
+        // allocation as well as the dictionary one.
+        var up = new FakeUpstream();
+        up.Set("Temp", 50);
+        using var eng = Build(up, out _);
+        await eng.LoadAsync(
+            [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")],
+            TestContext.Current.CancellationToken);
+
+        var ctxAfterLoad = eng.TryGetScratchContextForTest("HighTemp");
+        ctxAfterLoad.ShouldNotBeNull();
+
+        up.Push("Temp", 150);
+        await WaitForAsync(() =>
+            eng.GetState("HighTemp")!.Active == AlarmActiveState.Active);
+
+        var ctxAfterPush = eng.TryGetScratchContextForTest("HighTemp");
+        ReferenceEquals(ctxAfterLoad, ctxAfterPush).ShouldBeTrue(
+            "the AlarmPredicateContext must be reused across evaluations (Core.ScriptedAlarms-009).");
+    }
+
+    [Fact]
+    public async Task LoadAsync_drops_the_prior_generations_scratch()
+    {
+        // A config-publish recreates AlarmStates with potentially different Inputs +
+        // Loggers; reusing the prior generation's scratch would attach an outdated
+        // logger to the new alarm. LoadAsync must clear _scratchByAlarmId so the
+        // next evaluation lazily re-allocates against the fresh AlarmState.
+        var up = new FakeUpstream();
+        up.Set("Temp", 50);
+        using var eng = Build(up, out _);
+        await eng.LoadAsync(
+            [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")],
+            TestContext.Current.CancellationToken);
+
+        var scratchAfterFirstLoad = eng.TryGetScratchReadCacheForTest("HighTemp");
+        scratchAfterFirstLoad.ShouldNotBeNull();
+
+        // Second LoadAsync — same alarm id, same predicate, but the scratch should be
+        // wiped and re-allocated on the next evaluation. (LoadAsync itself triggers a
+        // first evaluation, so the scratch is reborn before we look.)
+        await eng.LoadAsync(
+            [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")],
+            TestContext.Current.CancellationToken);
+
+        var scratchAfterSecondLoad = eng.TryGetScratchReadCacheForTest("HighTemp");
+        scratchAfterSecondLoad.ShouldNotBeNull();
+        ReferenceEquals(scratchAfterFirstLoad, scratchAfterSecondLoad).ShouldBeFalse(
+            "LoadAsync must drop the prior generation's scratch — reuse across a publish " +
+            "would attach a stale Logger / Inputs to the new alarm definition.");
+    }
 }