diff --git a/code-reviews/Core.ScriptedAlarms/findings.md b/code-reviews/Core.ScriptedAlarms/findings.md index 3c0ec70..8d953af 100644 --- a/code-reviews/Core.ScriptedAlarms/findings.md +++ b/code-reviews/Core.ScriptedAlarms/findings.md @@ -156,13 +156,29 @@ a category produced nothing rather than leaving it blank. | Severity | Low | | Category | Performance & resource management | | Location | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` | -| Status | Won't Fix | +| Status | Resolved | **Description:** `BuildReadCache` allocates a fresh `Dictionary` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. `AlarmPredicateContext` is also newly constructed each evaluation (line 281). **Recommendation:** Minor. If the evaluation path shows up in allocation profiling, the read cache could be a reused per-alarm buffer cleared between evaluations (evaluations are already serialised under `_evalGate`, so a single shared scratch dictionary is safe). Not worth doing speculatively — flag for the perf surface in `docs/v2/Galaxy.Performance.md` if alarm evaluation is ever soak-tested. -**Resolution:** Won't Fix 2026-05-23 — per the recommendation, no code change. Documented the known allocation characteristic in `docs/v2/Galaxy.Performance.md` (new "Scripted-alarm engine — known hot-path allocations" section) so a future soak that surfaces pressure has a noted mitigation (reused per-alarm scratch buffer) and we don't re-find this in a later review. +**Resolution:** Resolved 2026-05-23 — added a per-alarm reusable `AlarmScratch` +(read-cache `Dictionary` + `AlarmPredicateContext`) held in +`_scratchByAlarmId`, populated lazily on first evaluation and refilled in place +by the new `RefillReadCache(Dictionary, IReadOnlySet)` helper on every +subsequent re-eval. `BuildReadCache` is gone. The reuse is safe because every +evaluation runs under `_evalGate`; the context wraps the dictionary by +reference, so the predicate's `ctx.GetTag(path)` sees the freshly-refilled +values. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a +config-publish drops the prior generation's scratch (Inputs / Logger may +change). Regression tests added in `ScriptedAlarmEngineTests`: +`Reevaluation_reuses_the_same_read_cache_dictionary`, +`Reevaluation_reuses_the_same_predicate_context`, and +`LoadAsync_drops_the_prior_generations_scratch`; internal test hooks +`TryGetScratchReadCacheForTest` / `TryGetScratchContextForTest` exposed via +the existing `InternalsVisibleTo`. `docs/v2/Galaxy.Performance.md` "Scripted-alarm +engine" section rewritten to document the new reuse contract. Suite now 66 +green (was 63). ### Core.ScriptedAlarms-010 diff --git a/code-reviews/README.md b/code-reviews/README.md index dd01f44..00869eb 100644 --- a/code-reviews/README.md +++ b/code-reviews/README.md @@ -281,7 +281,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`. | Core.ScriptedAlarms-003 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` | | Core.ScriptedAlarms-006 | Low | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` | | Core.ScriptedAlarms-008 | Low | Resolved | Performance & resource management | `Part9StateMachine.cs:261-268` | -| Core.ScriptedAlarms-009 | Low | Won't Fix | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` | +| Core.ScriptedAlarms-009 | Low | Resolved | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` | | Core.ScriptedAlarms-010 | Low | Resolved | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` | | Core.ScriptedAlarms-011 | Low | Resolved | Code organization & conventions | `Part9StateMachine.cs:275` | | Core.Scripting-005 | Low | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:97` | diff --git a/docs/v2/Galaxy.Performance.md b/docs/v2/Galaxy.Performance.md index 01bbb3e..b54f124 100644 --- a/docs/v2/Galaxy.Performance.md +++ b/docs/v2/Galaxy.Performance.md @@ -151,8 +151,11 @@ substantive driver change, and revise this table when the data does. `SubscriptionRegistry`, or a downstream consumer retaining `DataValueSnapshot` references past their useful life. -## Scripted-alarm engine — known hot-path allocations +## Scripted-alarm engine — hot-path allocation reuse -`ScriptedAlarmEngine.BuildReadCache` allocates a fresh `Dictionary` and `AlarmPredicateContext` on every predicate evaluation — i.e. once per upstream tag change per referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. (Core.ScriptedAlarms-009) +`ScriptedAlarmEngine` keeps a per-alarm reusable evaluation scratch in `_scratchByAlarmId` — the read-cache `Dictionary` and the `AlarmPredicateContext` are allocated once per alarm (on first evaluation) and refilled in place across every subsequent predicate evaluation. The hot path no longer allocates a fresh dictionary + context per upstream tag change. (Core.ScriptedAlarms-009) -The allocations are deliberate for now: predicate evaluation is already serialised under `_evalGate`, so a single reused scratch dictionary would be safe, but the per-call dictionary keeps the evaluation surface immutable and trivially safe against future refactors. If a future scripted-alarm soak surfaces allocation pressure on this path, the mitigation is a per-alarm scratch buffer cleared between evaluations — note here before changing the engine. +Safety: reuse is serialised under `_evalGate`, so two threads can never observe the same scratch in a half-refilled state. The context wraps the read-cache by reference, so refilling the dictionary is what the predicate's `ctx.GetTag(path)` calls observe. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a config-publish drops the prior generation's scratch (a new generation may carry different `Inputs` / `Logger`). Regression tests in `ScriptedAlarmEngineTests` lock the reuse contract: +- `Reevaluation_reuses_the_same_read_cache_dictionary` — asserts dictionary instance identity across two evaluations. +- `Reevaluation_reuses_the_same_predicate_context` — same, for the context. +- `LoadAsync_drops_the_prior_generations_scratch` — asserts a publish resets the scratch. diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs index 32fa082..39aff9a 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs @@ -48,6 +48,37 @@ public sealed class ScriptedAlarmEngine : IDisposable // snapshot enumeration safe. The only write shapes are indexer-set and Clear, // both of which ConcurrentDictionary supports atomically. (Core.ScriptedAlarms-001) private readonly ConcurrentDictionary _alarms = new(StringComparer.Ordinal); + + /// + /// Per-alarm reusable evaluation scratch. The read-cache dictionary and the + /// instance are both allocated once per + /// alarm (on first evaluation) and reused across every subsequent re-eval — + /// the hot path no longer allocates a fresh dictionary + context per upstream + /// tag change. Safe because only + /// runs under , which serialises every evaluation: + /// two threads can never observe the same scratch in a half-refilled state. + /// Cleared in alongside . + /// (Core.ScriptedAlarms-009) + /// + private readonly ConcurrentDictionary _scratchByAlarmId = + new(StringComparer.Ordinal); + + /// + /// Test-only diagnostic: returns the per-alarm scratch read-cache dictionary + /// if one has been allocated, else null. Used by Core.ScriptedAlarms-009 + /// regression tests to assert the scratch is reused across evaluations + /// (two reads return the same instance). + /// + internal IReadOnlyDictionary? TryGetScratchReadCacheForTest(string alarmId) + => _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.ReadCache : null; + + /// + /// Test-only diagnostic: returns the per-alarm + /// if one has been allocated, else null. Companion to + /// . + /// + internal AlarmPredicateContext? TryGetScratchContextForTest(string alarmId) + => _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.Context : null; private readonly ConcurrentDictionary _valueCache = new(StringComparer.Ordinal); private readonly Dictionary> _alarmsReferencing @@ -108,6 +139,10 @@ public sealed class ScriptedAlarmEngine : IDisposable UnsubscribeFromUpstream(); _alarms.Clear(); _alarmsReferencing.Clear(); + // Drop the prior generation's per-alarm scratch buffers — definitions may + // have changed (different Inputs, different Logger), so any reuse would be + // unsafe. (Core.ScriptedAlarms-009) + _scratchByAlarmId.Clear(); var compileFailures = new List(); foreach (var def in definitions) @@ -354,7 +389,13 @@ public sealed class ScriptedAlarmEngine : IDisposable AlarmState state, AlarmConditionState seed, DateTime nowUtc, CancellationToken ct, List? pendingEmissions = null) { - var inputs = BuildReadCache(state.Inputs); + // Look up (or lazily allocate) the per-alarm scratch and refill its read cache + // in place. The dictionary + context survive across evaluations so the hot path + // no longer allocates per upstream tag change. (Core.ScriptedAlarms-009) + var scratch = _scratchByAlarmId.GetOrAdd( + state.Definition.AlarmId, + _ => new AlarmScratch(state.Inputs, state.Logger, _clock)); + RefillReadCache(scratch.ReadCache, state.Inputs); // Cold-start guard — skip the predicate when any referenced upstream tag has no // cached value yet (the upstream subscription hasn't delivered its first push). @@ -362,9 +403,9 @@ public sealed class ScriptedAlarmEngine : IDisposable // every tick until the cache fills, spamming the log with identical stack traces. // Bad quality is treated the same: the input isn't available at the predicate's // expected type, so the only defensible move is to hold the prior condition state. - if (!AreInputsReady(inputs)) return seed; + if (!AreInputsReady(scratch.ReadCache)) return seed; - var context = new AlarmPredicateContext(inputs, state.Logger, _clock); + var context = scratch.Context; bool predicateTrue; try @@ -399,12 +440,20 @@ public sealed class ScriptedAlarmEngine : IDisposable return result.State; } - private IReadOnlyDictionary BuildReadCache(IReadOnlySet inputs) + /// + /// Refill in place from _valueCache, falling + /// back to a synchronous ITagUpstreamSource.ReadTag for paths whose + /// first upstream push hasn't arrived yet. The dictionary is cleared and + /// repopulated under _evalGate so no concurrent reader can observe + /// a partial state. Replaces the old BuildReadCache which allocated a + /// fresh dictionary every call (Core.ScriptedAlarms-009). + /// + private void RefillReadCache( + Dictionary cache, IReadOnlySet inputs) { - var d = new Dictionary(StringComparer.Ordinal); + cache.Clear(); foreach (var p in inputs) - d[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p); - return d; + cache[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p); } /// @@ -611,6 +660,37 @@ public sealed class ScriptedAlarmEngine : IDisposable IReadOnlyList TemplateTokens, ILogger Logger, AlarmConditionState Condition); + + /// + /// Per-alarm reusable evaluation scratch. The dictionary + /// is the same instance across every evaluation of the owning alarm — it is + /// cleared and refilled in on + /// each call. wraps that dictionary by reference, so a + /// refilled is what the predicate's + /// ctx.GetTag(path) calls observe. (Core.ScriptedAlarms-009) + /// + /// + /// Reuse is safe because serialises every + /// evaluation under _evalGate: two threads can never observe the same + /// scratch in a half-refilled state. + /// + private sealed class AlarmScratch + { + public Dictionary ReadCache { get; } + public AlarmPredicateContext Context { get; } + + public AlarmScratch(IReadOnlySet inputs, ILogger logger, Func clock) + { + // Pre-size to the expected input count so the first refill doesn't pay the + // dictionary-grow cost. The dictionary auto-grows if Inputs changes (it + // cannot under the current contract — Inputs is fixed at LoadAsync — but + // pre-sizing is defensive against future changes). + ReadCache = new Dictionary(inputs.Count, StringComparer.Ordinal); + // Context holds the read cache by reference. Refilling the dictionary + // updates what the context (and the script) observes. + Context = new AlarmPredicateContext(ReadCache, logger, clock); + } + } } /// diff --git a/tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs b/tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs index 5c761cb..d58c87b 100644 --- a/tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs +++ b/tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs @@ -925,4 +925,94 @@ public sealed class ScriptedAlarmEngineTests public Task RemoveAsync(string alarmId, CancellationToken ct) => _inner.RemoveAsync(alarmId, ct); } + + // --- Core.ScriptedAlarms-009: per-alarm evaluation-scratch reuse --- + + [Fact] + public async Task Reevaluation_reuses_the_same_read_cache_dictionary() + { + // Pre-009 the engine allocated a fresh Dictionary + // on every upstream-change tick; on a busy line this was a steady allocation + // stream on the hot path. The fix: one dictionary per alarm, refilled in place + // under _evalGate. Test asserts the dictionary instance is identical across + // two consecutive evaluations of the same alarm. + var up = new FakeUpstream(); + up.Set("Temp", 50); + using var eng = Build(up, out _); + await eng.LoadAsync( + [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")], + TestContext.Current.CancellationToken); + + // First evaluation runs during LoadAsync. Capture the scratch reference now. + var scratchAfterLoad = eng.TryGetScratchReadCacheForTest("HighTemp"); + scratchAfterLoad.ShouldNotBeNull( + "the scratch should have been allocated during LoadAsync's initial evaluation"); + + // Trigger a re-evaluation by pushing an upstream change. + up.Push("Temp", 150); + await WaitForAsync(() => + eng.GetState("HighTemp")!.Active == AlarmActiveState.Active); + + var scratchAfterPush = eng.TryGetScratchReadCacheForTest("HighTemp"); + ReferenceEquals(scratchAfterLoad, scratchAfterPush).ShouldBeTrue( + "the read-cache dictionary must be the *same* instance across evaluations " + + "(Core.ScriptedAlarms-009) — a per-call allocation would defeat the fix."); + scratchAfterPush!["Temp"].Value.ShouldBe(150, "refill must update the existing dictionary in place"); + } + + [Fact] + public async Task Reevaluation_reuses_the_same_predicate_context() + { + // The context wraps the read-cache by reference; refilling the dictionary + // updates what the script sees. Reusing the context spares a per-call object + // allocation as well as the dictionary one. + var up = new FakeUpstream(); + up.Set("Temp", 50); + using var eng = Build(up, out _); + await eng.LoadAsync( + [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")], + TestContext.Current.CancellationToken); + + var ctxAfterLoad = eng.TryGetScratchContextForTest("HighTemp"); + ctxAfterLoad.ShouldNotBeNull(); + + up.Push("Temp", 150); + await WaitForAsync(() => + eng.GetState("HighTemp")!.Active == AlarmActiveState.Active); + + var ctxAfterPush = eng.TryGetScratchContextForTest("HighTemp"); + ReferenceEquals(ctxAfterLoad, ctxAfterPush).ShouldBeTrue( + "the AlarmPredicateContext must be reused across evaluations (Core.ScriptedAlarms-009)."); + } + + [Fact] + public async Task LoadAsync_drops_the_prior_generations_scratch() + { + // A config-publish recreates AlarmStates with potentially different Inputs + + // Loggers; reusing the prior generation's scratch would attach an outdated + // logger to the new alarm. LoadAsync must clear _scratchByAlarmId so the + // next evaluation lazily re-allocates against the fresh AlarmState. + var up = new FakeUpstream(); + up.Set("Temp", 50); + using var eng = Build(up, out _); + await eng.LoadAsync( + [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")], + TestContext.Current.CancellationToken); + + var scratchAfterFirstLoad = eng.TryGetScratchReadCacheForTest("HighTemp"); + scratchAfterFirstLoad.ShouldNotBeNull(); + + // Second LoadAsync — same alarm id, same predicate, but the scratch should be + // wiped and re-allocated on the next evaluation. (LoadAsync itself triggers a + // first evaluation, so the scratch is reborn before we look.) + await eng.LoadAsync( + [Alarm("HighTemp", """return (int)ctx.GetTag("Temp").Value > 100;""")], + TestContext.Current.CancellationToken); + + var scratchAfterSecondLoad = eng.TryGetScratchReadCacheForTest("HighTemp"); + scratchAfterSecondLoad.ShouldNotBeNull(); + ReferenceEquals(scratchAfterFirstLoad, scratchAfterSecondLoad).ShouldBeFalse( + "LoadAsync must drop the prior generation's scratch — reuse across a publish " + + "would attach a stale Logger / Inputs to the new alarm definition."); + } }