fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path

Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 16:10:09 -04:00
parent 7b6ab2ec6f
commit 0001cdd579
5 changed files with 202 additions and 13 deletions

View File

@@ -48,6 +48,37 @@ public sealed class ScriptedAlarmEngine : IDisposable
// snapshot enumeration safe. The only write shapes are indexer-set and Clear,
// both of which ConcurrentDictionary supports atomically. (Core.ScriptedAlarms-001)
private readonly ConcurrentDictionary<string, AlarmState> _alarms = new(StringComparer.Ordinal);
/// <summary>
/// Per-alarm reusable evaluation scratch. The read-cache dictionary and the
/// <see cref="AlarmPredicateContext"/> instance are both allocated once per
/// alarm (on first evaluation) and reused across every subsequent re-eval —
/// the hot path no longer allocates a fresh dictionary + context per upstream
/// tag change. Safe because <see cref="EvaluatePredicateToStateAsync"/> only
/// runs under <see cref="_evalGate"/>, which serialises every evaluation:
/// two threads can never observe the same scratch in a half-refilled state.
/// Cleared in <see cref="LoadAsync"/> alongside <see cref="_alarms"/>.
/// (Core.ScriptedAlarms-009)
/// </summary>
private readonly ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId =
new(StringComparer.Ordinal);
/// <summary>
/// Test-only diagnostic: returns the per-alarm scratch read-cache dictionary
/// if one has been allocated, else null. Used by Core.ScriptedAlarms-009
/// regression tests to assert the scratch is reused across evaluations
/// (two reads return the same instance).
/// </summary>
internal IReadOnlyDictionary<string, DataValueSnapshot>? TryGetScratchReadCacheForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.ReadCache : null;
/// <summary>
/// Test-only diagnostic: returns the per-alarm <see cref="AlarmPredicateContext"/>
/// if one has been allocated, else null. Companion to
/// <see cref="TryGetScratchReadCacheForTest"/>.
/// </summary>
internal AlarmPredicateContext? TryGetScratchContextForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.Context : null;
private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache
= new(StringComparer.Ordinal);
private readonly Dictionary<string, HashSet<string>> _alarmsReferencing
@@ -108,6 +139,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
UnsubscribeFromUpstream();
_alarms.Clear();
_alarmsReferencing.Clear();
// Drop the prior generation's per-alarm scratch buffers — definitions may
// have changed (different Inputs, different Logger), so any reuse would be
// unsafe. (Core.ScriptedAlarms-009)
_scratchByAlarmId.Clear();
var compileFailures = new List<string>();
foreach (var def in definitions)
@@ -354,7 +389,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
AlarmState state, AlarmConditionState seed, DateTime nowUtc, CancellationToken ct,
List<ScriptedAlarmEvent>? pendingEmissions = null)
{
var inputs = BuildReadCache(state.Inputs);
// Look up (or lazily allocate) the per-alarm scratch and refill its read cache
// in place. The dictionary + context survive across evaluations so the hot path
// no longer allocates per upstream tag change. (Core.ScriptedAlarms-009)
var scratch = _scratchByAlarmId.GetOrAdd(
state.Definition.AlarmId,
_ => new AlarmScratch(state.Inputs, state.Logger, _clock));
RefillReadCache(scratch.ReadCache, state.Inputs);
// Cold-start guard — skip the predicate when any referenced upstream tag has no
// cached value yet (the upstream subscription hasn't delivered its first push).
@@ -362,9 +403,9 @@ public sealed class ScriptedAlarmEngine : IDisposable
// every tick until the cache fills, spamming the log with identical stack traces.
// Bad quality is treated the same: the input isn't available at the predicate's
// expected type, so the only defensible move is to hold the prior condition state.
if (!AreInputsReady(inputs)) return seed;
if (!AreInputsReady(scratch.ReadCache)) return seed;
var context = new AlarmPredicateContext(inputs, state.Logger, _clock);
var context = scratch.Context;
bool predicateTrue;
try
@@ -399,12 +440,20 @@ public sealed class ScriptedAlarmEngine : IDisposable
return result.State;
}
private IReadOnlyDictionary<string, DataValueSnapshot> BuildReadCache(IReadOnlySet<string> inputs)
/// <summary>
/// Refill <paramref name="cache"/> in place from <c>_valueCache</c>, falling
/// back to a synchronous <c>ITagUpstreamSource.ReadTag</c> for paths whose
/// first upstream push hasn't arrived yet. The dictionary is cleared and
/// repopulated under <c>_evalGate</c> so no concurrent reader can observe
/// a partial state. Replaces the old <c>BuildReadCache</c> which allocated a
/// fresh dictionary every call (Core.ScriptedAlarms-009).
/// </summary>
private void RefillReadCache(
Dictionary<string, DataValueSnapshot> cache, IReadOnlySet<string> inputs)
{
var d = new Dictionary<string, DataValueSnapshot>(StringComparer.Ordinal);
cache.Clear();
foreach (var p in inputs)
d[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p);
return d;
cache[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p);
}
/// <summary>
@@ -611,6 +660,37 @@ public sealed class ScriptedAlarmEngine : IDisposable
IReadOnlyList<string> TemplateTokens,
ILogger Logger,
AlarmConditionState Condition);
/// <summary>
/// Per-alarm reusable evaluation scratch. The <see cref="ReadCache"/> dictionary
/// is the same instance across every evaluation of the owning alarm — it is
/// cleared and refilled in <see cref="ScriptedAlarmEngine.RefillReadCache"/> on
/// each call. <see cref="Context"/> wraps that dictionary by reference, so a
/// refilled <see cref="ReadCache"/> is what the predicate's
/// <c>ctx.GetTag(path)</c> calls observe. (Core.ScriptedAlarms-009)
/// </summary>
/// <remarks>
/// Reuse is safe because <see cref="ScriptedAlarmEngine"/> serialises every
/// evaluation under <c>_evalGate</c>: two threads can never observe the same
/// scratch in a half-refilled state.
/// </remarks>
private sealed class AlarmScratch
{
public Dictionary<string, DataValueSnapshot> ReadCache { get; }
public AlarmPredicateContext Context { get; }
public AlarmScratch(IReadOnlySet<string> inputs, ILogger logger, Func<DateTime> clock)
{
// Pre-size to the expected input count so the first refill doesn't pay the
// dictionary-grow cost. The dictionary auto-grows if Inputs changes (it
// cannot under the current contract — Inputs is fixed at LoadAsync — but
// pre-sizing is defensive against future changes).
ReadCache = new Dictionary<string, DataValueSnapshot>(inputs.Count, StringComparer.Ordinal);
// Context holds the read cache by reference. Refilling the dictionary
// updates what the context (and the script) observes.
Context = new AlarmPredicateContext(ReadCache, logger, clock);
}
}
}
/// <summary>