CompiledScriptCache<TContext, TResult> — source-hash-keyed cache of compiled evaluators. Roslyn compilation is the most expensive step in the evaluator pipeline (5-20ms per script depending on size); re-compiling on every value-change event would starve the engine. ConcurrentDictionary of Lazy<ScriptEvaluator> with ExecutionAndPublication mode ensures concurrent callers never double-compile even on a cold cache race. Failed compiles evict the cache entry so an Admin UI retry with corrected source actually recompiles (otherwise the cached exception would persist). Whitespace-sensitive hash — reformatting a script misses the cache on purpose, simpler than AST-canonicalize and happens rarely. No capacity bound because virtual-tag + alarm scripts are config-DB bounded (thousands, not millions); if scale pushes past that in v3 an LRU eviction slots in behind the same API. TimedScriptEvaluator<TContext, TResult> — wraps a ScriptEvaluator with a per-evaluation wall-clock timeout (default 250ms per Phase 7 plan Stream A.4, configurable per tag so slower backends can widen). Critical implementation detail: the underlying Roslyn ScriptRunner executes synchronously on the calling thread for CPU-bound user scripts, returning an already-completed Task before the caller can register a timeout. Naive `Task.WaitAsync(timeout)` would see the completed task and never fire. Fix: push evaluation to a thread-pool thread via Task.Run, so the caller's thread is free to wait and the timeout reliably fires after the configured budget. Known trade-off (documented in the class summary): when a script times out, the underlying evaluation task continues running on the thread-pool thread until Roslyn returns; in the CPU-bound-infinite-loop case it's effectively leaked until the runtime decides to unwind. Tighter CPU budgeting would require an out-of-process script runner (v3 concern). In practice the timeout + structured warning log surfaces the offending script so the operator fixes it, and the orphan thread is rare. Caller-supplied CancellationToken is honored and takes precedence over the timeout, so driver-shutdown paths see a clean OperationCanceledException rather than a misclassified ScriptTimeoutException. ScriptTimeoutException carries the configured Timeout and a diagnostic message pointing the operator at ctx.Logger output around the failure plus suggesting widening the timeout, simplifying the script, or moving heavy work out of the evaluation path. The virtual-tag engine (Stream B) will catch this and map the owning tag's quality to BadInternalError per Phase 7 decision #11, logging a structured warning with the offending script name. Tests: CompiledScriptCacheTests (10) — first-call compile, identical-source dedupe to same instance, different-source produces different evaluator, whitespace-sensitivity documented, cached evaluator still runs correctly, failed compile evicted for retry, Clear drops entries, concurrent GetOrCompile of the same source deduplicates to one instance, different TContext/TResult use separate cache instances, null source rejected. TimedScriptEvaluatorTests (9) — fast script completes under timeout, CPU-bound script throws ScriptTimeoutException, caller cancellation takes precedence over timeout (shutdown path correctness), default 250ms per plan, zero/negative timeout rejected at construction, null inner rejected, null context rejected, user-thrown exceptions propagate unwrapped (not conflated with timeout), timeout exception message contains diagnostic guidance. Full suite: 48/48 green (29 from A.1 + 19 new). Next: Stream A.3 wires the dedicated scripts-*.log Serilog rolling sink + structured-property filtering + companion-WARN enricher to the main log, closing out Stream A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
103 lines
5.0 KiB
C#
103 lines
5.0 KiB
C#
namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
|
|
|
|
/// <summary>
|
|
/// Wraps a <see cref="ScriptEvaluator{TContext, TResult}"/> with a per-evaluation
|
|
/// wall-clock timeout. Default is 250ms per Phase 7 plan Stream A.4; configurable
|
|
/// per tag so deployments with slower backends can widen it.
|
|
/// </summary>
|
|
/// <remarks>
|
|
/// <para>
|
|
/// Implemented with <see cref="Task.WaitAsync(TimeSpan, CancellationToken)"/>
|
|
/// rather than a cancellation-token-only approach because Roslyn-compiled
|
|
/// scripts don't internally poll the cancellation token unless the user code
|
|
/// does async work. A CPU-bound infinite loop in a script won't honor a
|
|
/// cooperative cancel — <c>WaitAsync</c> returns control when the timeout fires
|
|
/// regardless of whether the inner task completes.
|
|
/// </para>
|
|
/// <para>
|
|
/// <b>Known limitation:</b> when a script times out, the underlying ScriptRunner
|
|
/// task continues running on a thread-pool thread until the Roslyn runtime
|
|
/// returns. In the CPU-bound-infinite-loop case that's effectively "leaked" —
|
|
/// the thread is tied up until the runtime decides to return, which it may
|
|
/// never do. Phase 7 plan Stream A.4 accepts this as a known trade-off; tighter
|
|
/// CPU budgeting would require an out-of-process script runner, which is a v3
|
|
/// concern. In practice, the timeout + structured warning log surfaces the
|
|
/// offending script so the operator can fix it; the orphan thread is rare.
|
|
/// </para>
|
|
/// <para>
|
|
/// Caller-supplied <see cref="CancellationToken"/> is honored — if the caller
|
|
/// cancels before the timeout fires, the caller's cancel wins and the
|
|
/// <see cref="OperationCanceledException"/> propagates (not wrapped as
|
|
/// <see cref="ScriptTimeoutException"/>). That distinction matters: the
|
|
/// virtual-tag engine's shutdown path cancels scripts on dispose; it shouldn't
|
|
/// see those as timeouts.
|
|
/// </para>
|
|
/// </remarks>
|
|
public sealed class TimedScriptEvaluator<TContext, TResult>
|
|
where TContext : ScriptContext
|
|
{
|
|
/// <summary>Default timeout per Phase 7 plan Stream A.4 — 250ms.</summary>
|
|
public static readonly TimeSpan DefaultTimeout = TimeSpan.FromMilliseconds(250);
|
|
|
|
private readonly ScriptEvaluator<TContext, TResult> _inner;
|
|
|
|
/// <summary>Wall-clock budget per evaluation. Script exceeding this throws <see cref="ScriptTimeoutException"/>.</summary>
|
|
public TimeSpan Timeout { get; }
|
|
|
|
public TimedScriptEvaluator(ScriptEvaluator<TContext, TResult> inner)
|
|
: this(inner, DefaultTimeout)
|
|
{
|
|
}
|
|
|
|
public TimedScriptEvaluator(ScriptEvaluator<TContext, TResult> inner, TimeSpan timeout)
|
|
{
|
|
_inner = inner ?? throw new ArgumentNullException(nameof(inner));
|
|
if (timeout <= TimeSpan.Zero)
|
|
throw new ArgumentOutOfRangeException(nameof(timeout), "Timeout must be positive.");
|
|
Timeout = timeout;
|
|
}
|
|
|
|
public async Task<TResult> RunAsync(TContext context, CancellationToken ct = default)
|
|
{
|
|
if (context is null) throw new ArgumentNullException(nameof(context));
|
|
|
|
// Push evaluation to a thread-pool thread so a CPU-bound script (e.g. a tight
|
|
// loop with no async work) doesn't hog the caller's thread before WaitAsync
|
|
// gets to register its timeout. Without this, Roslyn's ScriptRunner executes
|
|
// synchronously on the calling thread and returns an already-completed Task,
|
|
// so WaitAsync sees a completed task and never fires the timeout.
|
|
var runTask = Task.Run(() => _inner.RunAsync(context, ct), ct);
|
|
try
|
|
{
|
|
return await runTask.WaitAsync(Timeout, ct).ConfigureAwait(false);
|
|
}
|
|
catch (TimeoutException)
|
|
{
|
|
// WaitAsync's synthesized timeout — the inner task may still be running
|
|
// on its thread-pool thread (known leak documented in the class summary).
|
|
// Wrap so callers can distinguish from user-written timeout logic.
|
|
throw new ScriptTimeoutException(Timeout);
|
|
}
|
|
}
|
|
}
|
|
|
|
/// <summary>
|
|
/// Thrown when a script evaluation exceeds its configured timeout. The virtual-tag
|
|
/// engine (Stream B) catches this + maps the owning tag's quality to
|
|
/// <c>BadInternalError</c> per Phase 7 plan decision #11, logging a structured
|
|
/// warning with the offending script name so operators can locate + fix it.
|
|
/// </summary>
|
|
public sealed class ScriptTimeoutException : Exception
|
|
{
|
|
public TimeSpan Timeout { get; }
|
|
|
|
public ScriptTimeoutException(TimeSpan timeout)
|
|
: base($"Script evaluation exceeded the configured timeout of {timeout.TotalMilliseconds:F1} ms. " +
|
|
"The script was either CPU-bound or blocked on a slow operation; check ctx.Logger output " +
|
|
"around the timeout and consider widening the timeout per tag, simplifying the script, or " +
|
|
"moving heavy work out of the evaluation path.")
|
|
{
|
|
Timeout = timeout;
|
|
}
|
|
}
|