fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path

Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 16:10:09 -04:00
parent 7b6ab2ec6f
commit 0001cdd579
5 changed files with 202 additions and 13 deletions

View File

@@ -151,8 +151,11 @@ substantive driver change, and revise this table when the data does.
`SubscriptionRegistry`, or a downstream consumer retaining
`DataValueSnapshot` references past their useful life.
## Scripted-alarm engine — known hot-path allocations
## Scripted-alarm engine — hot-path allocation reuse
`ScriptedAlarmEngine.BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` and `AlarmPredicateContext` on every predicate evaluation — i.e. once per upstream tag change per referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. (Core.ScriptedAlarms-009)
`ScriptedAlarmEngine` keeps a per-alarm reusable evaluation scratch in `_scratchByAlarmId` — the read-cache `Dictionary<string, DataValueSnapshot>` and the `AlarmPredicateContext` are allocated once per alarm (on first evaluation) and refilled in place across every subsequent predicate evaluation. The hot path no longer allocates a fresh dictionary + context per upstream tag change. (Core.ScriptedAlarms-009)
The allocations are deliberate for now: predicate evaluation is already serialised under `_evalGate`, so a single reused scratch dictionary would be safe, but the per-call dictionary keeps the evaluation surface immutable and trivially safe against future refactors. If a future scripted-alarm soak surfaces allocation pressure on this path, the mitigation is a per-alarm scratch buffer cleared between evaluations — note here before changing the engine.
Safety: reuse is serialised under `_evalGate`, so two threads can never observe the same scratch in a half-refilled state. The context wraps the read-cache by reference, so refilling the dictionary is what the predicate's `ctx.GetTag(path)` calls observe. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a config-publish drops the prior generation's scratch (a new generation may carry different `Inputs` / `Logger`). Regression tests in `ScriptedAlarmEngineTests` lock the reuse contract:
- `Reevaluation_reuses_the_same_read_cache_dictionary` — asserts dictionary instance identity across two evaluations.
- `Reevaluation_reuses_the_same_predicate_context` — same, for the context.
- `LoadAsync_drops_the_prior_generations_scratch` — asserts a publish resets the scratch.