fix(core-scripted-alarms): resolve Low code-review findings (Core.ScriptedAlarms-003,006,008,010,011; -009 documented)

- Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting pending emissions during the gate-held section and flushing them after release; eliminates re-entrancy deadlock the docs already promised. - Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate / ShelvingCheck task in _inFlight; Dispose drains the set so the engine no longer races store writes against teardown. - Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment> so AppendComment is O(log n) instead of O(n). - Core.ScriptedAlarms-010: document the deliberate input-quality asymmetry (Uncertain drives the predicate, renders {?} in the message) in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks. - Core.ScriptedAlarms-011: propagate the no-op reason through TransitionResult.NoOp(state, reason) and log it from ScriptedAlarmEngine.ApplyAsync. - Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md with a mitigation path if a future soak surfaces pressure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:23:31 -04:00
parent e74e8f7b31
commit 99354bfaf2
8 changed files with 491 additions and 42 deletions
@@ -35,7 +35,7 @@ new ScriptedAlarmDefinition(

 ## Predicate evaluation

-Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known memory / CPU resource limits are documented there as well.
+Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory, the per-publish accretion of dynamically-emitted script assemblies (Core.Scripting-008), and the orphan-thread CPU-budget caveat — are documented in that file as well.

 `AlarmPredicateContext` (`AlarmPredicateContext.cs`) is the script's `ScriptContext` subclass:

@@ -79,6 +79,17 @@ Two invariants the machine enforces:

 Fallback rules: a resolved `DataValueSnapshot` with a non-zero `StatusCode`, a `null` `Value`, or an unknown path becomes `{?}`. The event still fires — the operator sees where the reference broke rather than having the alarm swallowed.

+## Input-quality policy
+
+Predicate evaluation and message-template resolution deliberately treat tag-input quality differently:
+
+| Surface | Quality bar | Rationale |
+|---|---|---|
+| `ScriptedAlarmEngine.AreInputsReady` (predicate gate) | **Bad rejected** (`StatusCode` bit 31 set). `Good` and `Uncertain` are both accepted. | Uncertain quality still carries a value the predicate can inspect; rejecting it would mask a transitional alarm condition. Predicate evaluation is a state-machine input — operators want it to track reality as closely as the quality allows. |
+| `MessageTemplate.Resolve` (operator-facing message) | **Any non-zero `StatusCode` rejected** — only `Good` substitutes; `Uncertain` / Bad / unknown all render as `{?}`. | The message is a human-readable signal; substituting an Uncertain value would let operators act on a questionable reading without seeing the qualifier. Rendering `{?}` makes the doubt explicit. |
+
+`AlarmPredicateContext.GetTag` returns a `BadNodeIdUnknown` (`0x80340000`) snapshot for missing or empty paths, so a typo in the predicate flows through `AreInputsReady` (Bad → predicate skipped, prior state held) and `MessageTemplate.Resolve` (non-Good → `{?}`) without crashing the engine. (Core.ScriptedAlarms-010)
+
 ## State persistence

 `IAlarmStateStore` (`IAlarmStateStore.cs`) is the persistence contract: `LoadAsync(alarmId)`, `LoadAllAsync`, `SaveAsync(state)`, `RemoveAsync(alarmId)`. `InMemoryAlarmStateStore` in the same file is the default for tests and dev deployments without a SQL backend. Stream E wires the production implementation against the `ScriptedAlarmState` config-DB table with audit logging through `Core.Abstractions.IAuditLogger`.
@@ -150,3 +150,9 @@ substantive driver change, and revise this table when the data does.
   leak guard. Likely culprits: lingering subscription handles in
   `SubscriptionRegistry`, or a downstream consumer retaining
   `DataValueSnapshot` references past their useful life.
+
+## Scripted-alarm engine — known hot-path allocations
+
+`ScriptedAlarmEngine.BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` and `AlarmPredicateContext` on every predicate evaluation — i.e. once per upstream tag change per referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. (Core.ScriptedAlarms-009)
+
+The allocations are deliberate for now: predicate evaluation is already serialised under `_evalGate`, so a single reused scratch dictionary would be safe, but the per-call dictionary keeps the evaluation surface immutable and trivially safe against future refactors. If a future scripted-alarm soak surfaces allocation pressure on this path, the mitigation is a per-alarm scratch buffer cleared between evaluations — note here before changing the engine.