Files

Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox

Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 17:39:20 -04:00

43 KiB

Raw Blame History

Code Review — Core.Scripting

Field	Value
Module	`src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting`
Reviewer	Claude Code
Review date	2026-05-23
Commit reviewed	`a9be809`
Status	Reviewed
Open findings	3

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

The 2026-05-23 re-review only covers code touched between commits 76d35d1 and a9be809 (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL references). Categories where the new code surface produced no issues are recorded as "No new issues" for that pass.

#	Category	Result (`76d35d1`)	Result (`a9be809`, new code only)
1	Correctness & logic bugs	Core.Scripting-004, Core.Scripting-005	Core.Scripting-015
2	OtOpcUa conventions	No issues found	No new issues
3	Concurrency & thread safety	Core.Scripting-006	Core.Scripting-014
4	Error handling & resilience	Core.Scripting-007	No new issues
5	Security	Core.Scripting-001, Core.Scripting-002, Core.Scripting-003	Core.Scripting-012, Core.Scripting-013
6	Performance & resource management	Core.Scripting-008	Core.Scripting-016
7	Design-document adherence	Core.Scripting-009	No new issues
8	Code organization & conventions	No issues found	No new issues
9	Testing coverage	Core.Scripting-010, Core.Scripting-011	No new issues
10	Documentation & comments	No issues found	No new issues

Findings

Core.Scripting-001

Field	Value
Severity	Critical
Category	Security
Location	`ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54`
Status	Resolved

Description: System.Environment lives in the allowed System namespace (it is in System.Private.CoreLib, which is allow-listed for primitives) and is not on the forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling System.Environment.Exit(0) or System.Environment.FailFast("..."). Both terminate the host process immediately. Because scripted-alarm predicates and virtual-tag scripts run in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10 server process"), a single malicious or buggy predicate brings down the entire server — an outage affecting every connected client and every driver. ScriptSandboxTests only pins the read path (Environment.GetEnvironmentVariable) as an accepted compromise; the process-killing members are not considered. The whole-process kill far exceeds the "read-only process state" justification the test comments rely on.

Recommendation: The forbidden surface must be member-granular, not namespace-granular, for types in allowed namespaces. Add an explicit forbidden-member deny-list to ForbiddenTypeAnalyzer covering at minimum System.Environment.Exit, System.Environment.FailFast, System.AppDomain, System.GC (e.g. GC.Collect, GC.AddMemoryPressure), and System.Activator.CreateInstance (a reflection-equivalent escape). Reject these in CheckSymbol by resolved method symbol, with a test for each.

Resolution: Resolved 2026-05-22 — added a type-granular ForbiddenFullTypeNames deny-list (System.Environment, System.AppDomain, System.GC, System.Activator) to ForbiddenTypeAnalyzer; CheckSymbol now rejects any resolved type symbol whose fully-qualified name matches, alongside the existing namespace-prefix check, so dangerous System-namespace process-control types are blocked at compile while legitimate System types (Math, String, …) stay usable. Regression tests added in ScriptSandboxTests for Environment.Exit / Environment.FailFast / Environment.GetEnvironmentVariable / AppDomain / GC.Collect / Activator.CreateInstance.

Core.Scripting-002

Field	Value
Severity	High
Category	Security
Location	`ForbiddenTypeAnalyzer.cs:70`
Status	Resolved

Description: The syntax walker only inspects four node kinds: ObjectCreationExpressionSyntax, InvocationExpressionSyntax with a member-access target, MemberAccessExpressionSyntax, and bare IdentifierNameSyntax. It never visits TypeOfExpressionSyntax, generic type-argument lists (GenericNameSyntax / TypeArgumentListSyntax), cast expressions (CastExpressionSyntax), is/as type patterns, default(T) expressions, array-creation element types, or using/local declared types. A script such as typeof(System.IO.File), new System.Collections.Generic.List<System.IO.FileInfo>(), (System.IO.Stream)null, or default(System.Reflection.Assembly) references a forbidden type without ever producing a node the walker examines, so the forbidden-type check is bypassed. The Phase 7 plan A.6 explicitly calls out typeof as a sandbox-escape attempt that "must fail at compile" — it currently does not.

Recommendation: Walk every TypeSyntax node (handle TypeOfExpressionSyntax, CastExpressionSyntax, generic argument lists, and the type operand of IsPatternExpression / binary as). The simplest robust fix is to enumerate all DescendantNodes() and, for any node, resolve both GetSymbolInfo and GetTypeInfo, then check the resolved type plus every type argument. Add tests covering typeof, generic arguments, casts, and default(T) with forbidden types.

Resolution: Resolved 2026-05-22 — ForbiddenTypeAnalyzer.Analyze now runs a second pass that resolves GetTypeInfo on every TypeSyntax node and recursively unwraps array element types and generic type arguments, so forbidden types named via typeof, generic arguments (List<FileInfo>), casts, is/as patterns, default(T), array-creation element types, and explicitly-typed local declarations are all rejected at compile. The original member/call node-kind switch is kept (deliberately narrow to avoid flagging inherited members such as typeof(int).Name), and a span+type dedupe prevents duplicate rejections from the two passes. Regression tests added in ScriptSandboxTests for each node form plus over-block guards for allowed generics/typeof.

Core.Scripting-003

Field	Value
Severity	Medium
Category	Security
Location	`TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30`
Status	Resolved

Description: There is no bound on memory a script may allocate or on the number of threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget concern" deferred to v3, but in-process execution means a script doing new byte[int.MaxValue] repeatedly (or Enumerable.Range(0,int.MaxValue).ToList() — LINQ is allow-listed) can drive the whole server to OutOfMemoryException, an outage. The timeout does not help: the allocation can exhaust memory well before 250ms elapses, and the orphaned thread-pool thread documented in TimedScriptEvaluator keeps the allocation rooted. System.Threading.Tasks is not on the deny-list, so a script can also Task.Run an unbounded fan-out of background work that outlives the timeout entirely.

Recommendation: At minimum, document this as a known accepted risk in docs/ScriptedAlarms.md / docs/VirtualTags.md rather than only in a code comment, and add the Task/Parallel namespaces to the forbidden list (scripts are synchronous predicates — they have no legitimate need to start background tasks). For memory, gate script authoring behind an Admin permission and treat the test-harness preview as the control point, or track an explicit v3 issue for out-of-process execution. Record the decision so it is not silently lost.

Resolution: Resolved 2026-05-22 — added System.Threading.Tasks to ForbiddenNamespacePrefixes (blocking Task.Run / Parallel fan-out); documented the unbounded-memory accepted risk and the Task denial rationale in docs/VirtualTags.md (new "Known resource limits" subsection) and cross-referenced from docs/ScriptedAlarms.md.

Core.Scripting-004

Field	Value
Severity	Medium
Category	Correctness & logic bugs
Location	`DependencyExtractor.cs:73`
Status	Resolved

Description: The walker matches tag-access calls purely by spelling — any InvocationExpressionSyntax whose member name is GetTag or SetVirtualTag is treated as a ScriptContext tag access, regardless of the receiver. A script that defines a local type with a GetTag(string) method and calls other.GetTag("X"), or calls this.GetTag(...) on a script-defined helper, has spurious dependencies harvested (or, if the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not check that the receiver is the ctx identifier — it accepts any member access with the matching name. The DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag test passes only because the helper there is a free function (not member-access form); a member-access call to a non-ctx GetTag is untested and would be misattributed.

Recommendation: In VisitInvocationExpression, additionally require that member.Expression is an IdentifierNameSyntax with Identifier.ValueText == "ctx" (matching the ScriptGlobals<TContext>.ctx field name). Add a test for someOtherObject.GetTag("X") asserting it is ignored.

Resolution: Resolved 2026-05-22 — VisitInvocationExpression now additionally checks that member.Expression is an IdentifierNameSyntax with ValueText == "ctx" before treating the call as a dependency; test Ignores_member_access_GetTag_on_non_ctx_receiver added to DependencyExtractorTests.

Core.Scripting-005

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`DependencyExtractor.cs:97`
Status	Resolved

Description: A raw string literal token passed as the tag path (a raw triple-quote literal) tokenizes as SingleLineRawStringLiteralToken / MultiLineRawStringLiteralToken, not StringLiteralToken. The check literal.Token.IsKind(SyntaxKind.StringLiteralToken) therefore rejects an otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading rejection message. This is an edge case (operators rarely write raw strings for tag paths) but the error text would confuse anyone who does.

Recommendation: Accept all string-literal token kinds — check literal.IsKind(SyntaxKind.StringLiteralExpression) on the expression node, or include the raw-string token kinds, so a static raw string is harvested rather than rejected.

Resolution: Resolved 2026-05-23 — HandleTagCall now checks literal.IsKind(SyntaxKind.StringLiteralExpression) on the expression node, which covers regular string literals, single-line raw strings, and multi-line raw strings uniformly. Regression tests Accepts_single_line_raw_string_literal_path and Accepts_multi_line_raw_string_literal_path added to DependencyExtractorTests.

Core.Scripting-006

Field	Value
Severity	Low
Category	Concurrency & thread safety
Location	`CompiledScriptCache.cs:55`
Status	Resolved

Description: On a failed compile the catch block calls _cache.TryRemove(key, out _) without a value comparison. If two threads race a miss for the same bad source, both observe the same faulted Lazy and throw, and both call TryRemove(key). If a concurrent retry re-adds a new Lazy for that key between the two removals, the second unconditional TryRemove could evict the in-flight retry entry. The window is small and the consequence is only a redundant recompile, so severity is Low — but the removal should be key+value scoped for correctness.

Recommendation: Use the ConcurrentDictionary.TryRemove(KeyValuePair<,>) overload to remove only the specific faulted Lazy instance, so a concurrently re-added entry is not evicted.

Resolution: Resolved 2026-05-23 — GetOrCompile's catch block now evicts via _cache.TryRemove(new KeyValuePair<string, Lazy<…>>(key, lazy)), comparing the value reference so only the faulted Lazy is removed; a concurrent retry that re-inserted a fresh Lazy under the same key is preserved. Regression test Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry added to CompiledScriptCacheTests (reflection-driven deterministic race: the faulted Lazy's factory swaps the dictionary entry to a fresh Lazy as a side effect of its throw, modelling the precise race window).

Core.Scripting-007

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`TimedScriptEvaluator.cs:60`
Status	Resolved

Description: RunAsync wraps the inner run in Task.Run(...) and then awaits WaitAsync(Timeout, ct). If the caller-supplied ct cancels at roughly the same time the timeout elapses, the order in which WaitAsync observes the timeout vs. the cancellation is non-deterministic, so the same shutdown can sometimes surface as ScriptTimeoutException and sometimes as OperationCanceledException. The class docs assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown path depends on to avoid misclassifying shutdown as a script fault — but the implementation does not guarantee it when both fire close together.

Recommendation: After catching TimeoutException, check ct.IsCancellationRequested and throw OperationCanceledException(ct) instead of ScriptTimeoutException when the caller's token is cancelled, so caller cancellation deterministically wins regardless of race ordering.

Resolution: Resolved 2026-05-22 — in the catch (TimeoutException) handler, ct.IsCancellationRequested is now checked and OperationCanceledException(ct) thrown before ScriptTimeoutException, so caller cancellation deterministically wins regardless of race ordering; regression test Caller_cancellation_wins_even_when_timeout_fires_first added to TimedScriptEvaluatorTests.

Core.Scripting-008

Field	Value
Severity	Low
Category	Performance & resource management
Location	`CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34`
Status	Resolved

Description: CompiledScriptCache has no capacity bound (acknowledged in the class remarks) and no eviction. Each cached ScriptEvaluator holds a Roslyn ScriptRunner<T> delegate, which keeps the dynamically emitted script assembly loaded for the process lifetime — emitted assemblies in the default AssemblyLoadContext cannot be unloaded. Clear() drops the dictionary entries but does not unload the emitted assemblies; they leak. Across many config-generation publishes (each Clear() followed by recompiling every script), the process accumulates dead script assemblies. For the expected "low thousands" of scripts this is benign, but a long-running server with frequent publishes will see steady managed-memory growth that never returns.

Recommendation: Document the per-publish assembly accretion as a known limitation, or compile scripts into a collectible AssemblyLoadContext so Clear() can unload prior generations. At minimum add a note to docs/ScriptedAlarms.md so operators with high-publish-frequency deployments are aware.

Resolution: Resolved 2026-05-23 — switched the compile pipeline off the legacy CSharpScript.CreateDelegate path (which emits into the default, non-collectible AssemblyLoadContext) and onto a hand-rolled CSharpCompilation → Compilation.Emit(MemoryStream) → ScriptAssemblyLoadContext.LoadFromStream chain, with the new ScriptAssemblyLoadContext constructed isCollectible: true. Each compiled script lives in its own ALC; ScriptEvaluator now implements IDisposable and calls AssemblyLoadContext.Unload() on dispose. CompiledScriptCache.Clear() disposes every materialised evaluator before dropping its dictionary entry, and CompiledScriptCache itself is now IDisposable for graceful server shutdown. After a publish-replace cycle the prior generation's emitted assemblies become eligible for GC; the reclaim is GC-timing-sensitive (Unload is eligible-for-collection, not synchronous) and the next collection cycle reclaims them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and ForbiddenTypeAnalyzer is the sole security gate (consistent with the Core.Scripting-001 / -002 model). docs/VirtualTags.md "Compile cache" section rewritten; docs/ScriptedAlarms.md cross-reference updated to drop the obsolete restart guidance. Regression tests added in CompiledScriptCacheTests: Dispose_unloads_compiled_script_assembly_load_context, Clear_disposes_every_materialised_evaluator, and GetOrCompile_after_Dispose_throws_ObjectDisposedException; the first two prove ALC unload via WeakReference + bounded GC.Collect() loops. Suite now 104 green (was 101). Authoring convention: the synthesized wrapper is an ordinary C# static method, so scripts must end with explicit return …; per ordinary C# rules (the legacy CSharpScript "last expression yields result" shorthand no longer applies); every script in the existing corpus already uses explicit return so this is a doc-only change for new authors.

Core.Scripting-009

Field	Value
Severity	Low
Category	Design-document adherence
Location	`ForbiddenTypeAnalyzer.cs:45`
Status	Resolved

Description: The Phase 7 plan decision #6 (docs/v2/implementation/phase-7-scripting-and-alarming.md) enumerates the forbidden surface as "No HttpClient / File / Process / reflection". ForbiddenTypeAnalyzer actually denies a broader set — System.Threading.Thread, System.Runtime.InteropServices, and Microsoft.Win32 (registry) — which is sensible hardening but is undocumented in the plan and in docs/ScriptedAlarms.md (which defers sandbox rules to VirtualTags.md). An operator reading the design docs cannot predict that a registry or interop reference will be rejected. Conversely the plan does not record the System.Environment / System.Diagnostics decisions. The code and the design document have drifted.

Recommendation: Update the plan's decision #6 (or docs/VirtualTags.md) to list the authoritative deny-list exactly as ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes defines it, including the System.Environment allowed-compromise, so the docs match the code.

Resolution: Resolved 2026-05-23 — docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6 row + the "Sandbox escape" compliance-check row now enumerate the authoritative deny-list exactly as ForbiddenTypeAnalyzer defines it (namespace-prefix denies for System.IO / System.Net / System.Diagnostics / System.Reflection / System.Threading.Tasks / System.Runtime.InteropServices / Microsoft.Win32; type-granular denies for System.Environment / System.AppDomain / System.GC / System.Activator / System.Threading.Thread), and the compliance-check row lists the syntactic vectors (typeof / generic arg / cast / is/as / default(T) / array element / declared local) the broadened analyzer covers. docs/VirtualTags.md already documents the same list and is unchanged.

Core.Scripting-010

Field	Value
Severity	Medium
Category	Testing coverage
Location	`tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54`
Status	Resolved

Description: The sandbox-escape test suite covers only the four obvious vectors (File / Http / Process / Reflection) as direct member-access calls. It does not test: typeof(forbidden), generic type arguments (List<FileInfo>), cast expressions to forbidden types, System.Environment.Exit / FailFast, System.Threading.Thread, System.Runtime.InteropServices, Microsoft.Win32 registry access, Activator, or System.AppDomain. Given that the analyzer is the sole security boundary for in-process untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated "sandbox escape tests" but the implemented set is materially narrower than the threat surface.

Recommendation: Add a parameterised escape-test covering every node form in Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must assert a ScriptSandboxViolationException (or CompilationErrorException) at compile.

Resolution: Resolved 2026-05-22 — added ScriptSandboxTests cases for System.Threading.Thread, System.Threading.Tasks.Task.Run, System.Runtime.InteropServices.Marshal, and Microsoft.Win32.Registry (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.

Core.Scripting-011

Field	Value
Severity	Low
Category	Testing coverage
Location	`tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/`
Status	Resolved

Description: Two source files have no direct test coverage: ScriptContext (Deadband static helper is exercised only indirectly through ScriptSandboxTests, and not for its boundary tolerance behaviour) and ScriptSandbox.Build itself (the ArgumentNullException / ArgumentException guards on contextType at ScriptSandbox.cs:45-48 are never asserted). ScriptLogCompanionSink and ScriptLoggerFactory have tests, but there is no test that a script's ctx.Logger Error emission surfaces via the companion sink end-to-end (factory + sink integration is untested). These are minor gaps but leave guard clauses and the logging integration unverified.

Recommendation: Add unit tests for ScriptSandbox.Build argument validation, for ScriptContext.Deadband at and around the tolerance boundary, and an end-to-end test that a script logging at Error level produces both a scripts-*.log event and a companion Warning event.

Resolution: Resolved 2026-05-23 — added three new test files: ScriptSandboxBuildTests covers the Build null / non-ScriptContext / base-class / concrete-subclass paths; ScriptContextTests locks Deadband boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs test in ScriptLogCompanionSinkTests wires ScriptLoggerFactory + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with ScriptName. Suite now 101 green (was 85 before).

Core.Scripting-012

Field	Value
Severity	High
Category	Security
Location	`ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126`
Status	Resolved

Description: The Core.Scripting-008 rewrite broadened the BCL references list from a narrow allow-list (System.Private.CoreLib + System.Linq only) to the full TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard + Microsoft.Win32.Registry. This change correctly delegates the security gate to ForbiddenTypeAnalyzer (the new comment in ScriptSandbox calls this out explicitly), but the analyzer's deny-list has not been expanded to match the new attack surface, and three categories of dangerous BCL types in the System.* allow-listed assemblies are now reachable from script source:

System.Threading.ThreadPool (in namespace System.Threading). The Core.Scripting-003 fix added System.Threading.Tasks to deny Task.Run / Parallel fan-out because background work that outlives the per-evaluation timeout is the explicit threat. ThreadPool.QueueUserWorkItem, ThreadPool.UnsafeQueueUserWorkItem, and ThreadPool.RegisterWaitForSingleObject are exactly the same threat — they schedule background work that outlives the WaitAsync(Timeout) budget and tie up worker threads — but System.Threading itself is allowed (because CancellationToken / SemaphoreSlim / Volatile live there). The Core.Scripting-003 resolution is incomplete on the new reference surface.
System.Threading.Timer (same namespace). Schedules a background callback; the script returns control to the engine but the timer keeps firing past the evaluation budget. Same threat as Task.Run.
System.Runtime.Loader.AssemblyLoadContext (in namespace System.Runtime.Loader, which is not denied — only System.Runtime.InteropServices is). The constructor + LoadFromAssemblyPath / LoadFromStream / LoadFromAssemblyName let a script load an arbitrary DLL into the host process. Pass (1) of the analyzer resolves the receiver type (AssemblyLoadContext, allowed) + the invocation symbol's containing type (also AssemblyLoadContext, allowed) and lets the call through. Pass (2) only inspects TypeSyntax nodes — if the script discards the returned Assembly (e.g. alc.LoadFromAssemblyPath(@"C:\evil.dll");) there is no TypeSyntax for the analyzer to walk and the call is accepted. Triggering execution of the loaded code from inside the sandbox is hard (most of Assembly's surface is in System.Reflection, which is denied) but the defense-in-depth gap is real: an attacker who can author a script also typically controls a file path on the server (Admin UI uploads, share mounts) and loading an assembly is the prerequisite to every chained escape — module initializers, type-resolve handlers, and a future analyzer slip would all become exploitable.

In addition, two lower-impact System.* types are reachable that arguably shouldn't be: System.Console.SetOut / Console.SetError could redirect the host's console streams (requires constructing a System.IO.TextWriter, which is blocked, so the practical exploit is Console.WriteLine log-spam only), and System.Globalization.CultureInfo.DefaultThreadCurrentCulture could perturb the entire process's formatting behavior (subtle but real cross-script side effect).

The original Core.Scripting-001 finding called out the model: when an allow-listed namespace contains dangerous types, those types must be denied type-granularly. The new reference surface introduces several more such types and the deny-list has not been kept in sync.

Recommendation: Add System.Threading.ThreadPool and System.Threading.Timer to ForbiddenFullTypeNames. Add System.Runtime.Loader as a namespace prefix to ForbiddenNamespacePrefixes (every type in System.Runtime.Loader — AssemblyLoadContext, AssemblyDependencyResolver, AssemblyLoadEventArgs — is out of script scope). Consider adding System.Console to ForbiddenFullTypeNames to stop log-spam through the host's console streams, and at minimum document CultureInfo.DefaultThreadCurrentCulture as an accepted cross-script side effect. Each addition must have a regression test in ScriptSandboxTests mirroring the Core.Scripting-010 vector style. Update docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6 + the "Sandbox escape" compliance-check row to enumerate the additions, per the Core.Scripting-009 doc-sync convention.

Resolution: Resolved 2026-05-23 — added System.Runtime.Loader to ForbiddenNamespacePrefixes (the namespace-prefix form preferred over type-granular per the recommendation; future BCL additions to that namespace are denied by default). Added System.Threading.ThreadPool and System.Threading.Timer to ForbiddenFullTypeNames — both live in System.Threading shared with allowed sync primitives so they must be type-granular. Regression tests added to ScriptSandboxTests: Rejects_ThreadPool_QueueUserWorkItem_at_compile, Rejects_Timer_new_at_compile, Rejects_AssemblyLoadContext_at_compile. docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6 + the Sandbox-escape compliance-check row both updated per the Core.Scripting-009 doc-sync convention. The two lower-impact suggestions from the recommendation (System.Console, CultureInfo.DefaultThreadCurrentCulture) were intentionally not addressed: Console.SetOut requires constructing a System.IO.TextWriter which is already blocked, leaving only Console.WriteLine log-spam (annoyance, not a security threat); and CultureInfo.DefaultThreadCurrentCulture is a cross-script side-effect worth knowing about but doesn't escape the sandbox. Recording both as accepted minor risks. Test totals after fix: Core.Scripting 107 green (was 104 — +3 new rejection tests).

Core.Scripting-013

Field	Value
Severity	Medium
Category	Security
Location	`ScriptEvaluator.cs:202-225` (`BuildWrapperSource`)
Status	Open

Description: The synthesized wrapper pastes the user's source verbatim between { and } braces inside a static method body, with a #line 1 directive and no escaping. The legacy CSharpScript.CreateDelegate path was robust to this because Roslyn's scripting compiler parses script source as a top-level statement sequence; the new hand-rolled path is parsing ordinary C# in a method body, so a script that injects matching { / } braces can extend the synthesized compilation unit with additional methods, classes, or #line directives. For example, a script body of return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() { ends the Run method early, declares a sibling Evil method (and even a sibling CompiledScript2 class) inside the synthesized namespace, then opens an unclosed method that consumes the wrapper's trailing }\n}. With matching brace counts the script parses cleanly and compiles.

ForbiddenTypeAnalyzer walks every descendant of every syntax tree, so any forbidden BCL types named inside the injected methods are still caught — the finding is not a direct sandbox escape. However:

It silently relaxes the operator-visible authoring contract documented in docs/VirtualTags.md ("scripts are statement bodies that end with an explicit return …;") to "scripts can be any compilable C# inside the CompiledScript namespace" — operators have access to features the design did not intend to expose (local types defined as siblings of Run, custom module initializers via attributes, etc.).
A script can embed its own #line directives that override the #line 1 we emit just above the user source, producing misleading error locations in compiler diagnostics surfaced to the operator.
Future hardening that relies on syntactic-shape assumptions (e.g. "every script has exactly one method") would silently fail.
It widens the analyzer's surface: the analyzer's correctness now depends on Pass (2) correctly walking every conceivable C# construct that can name a type, including ones a normal script body would never contain (UnmanagedCallersOnly attribute, function pointer types delegate*<...>, pattern types, switch arm types, …).

Recommendation: Either (a) reject scripts whose parsed body contains declarations other than statements — walk the wrapper's syntax tree after parse and require that the only members of CompiledScript are the single Run method, raising a CompilationErrorException if anything else appears — or (b) parse the user source independently as a BlockSyntax and inject the parsed block as the method body via the Roslyn syntax API, which makes brace-mismatched / class-injecting source unparseable. Add a regression test covering at least the brace-injection vector (return 0; } public static int Evil() { return 0;).

Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)

Core.Scripting-014

Field	Value
Severity	Medium
Category	Concurrency & thread safety
Location	`CompiledScriptCache.cs:91-103` (`Clear`)
Status	Open

Description: Clear() snapshots _cache.Keys.ToArray() then iterates, calling TryRemove(key, out var lazy) on each — the key-only overload, not the value-scoped one used in GetOrCompile's catch block. Between the snapshot and a given TryRemove, a concurrent GetOrCompile(scriptSource) call that hashes to the same key can re-insert a fresh Lazy whose .Value the caller already retained. The unconditional TryRemove then removes that fresh Lazy and DisposeLazyIfMaterialised(lazy) calls Dispose() on its evaluator — unloading the ALC while the concurrent caller still holds a reference to the evaluator and intends to invoke it.

This is exactly the race-window pattern the Core.Scripting-006 resolution fixed in GetOrCompile's catch block (the test Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry locks it there). Clear() carries the same shape but uses the older, value-blind overload, so the same race that finding-006 addresses is still latent on the publish-replace path.

In current production wiring Clear() is intended for config-publish + tests — neither overlaps steady-state evaluation under the documented design — so the in-practice impact is low. But the cache is checked in as the forward-looking compile cache for the engines (per Script.SourceHash's docs and the cache's own remarks); a future wiring that calls Clear() from publish while evaluations are in flight would dispose live evaluators.

Recommendation: Replace the snapshot + TryRemove(key, out var lazy) sequence with an enumeration that captures the Lazy reference at snapshot time and uses the value-scoped TryRemove(KeyValuePair<,>) overload, mirroring the Core.Scripting-006 fix:

foreach (var entry in _cache.ToArray())
{
    if (_cache.TryRemove(entry))
        DisposeLazyIfMaterialised(entry.Value);
}

Add a regression test that races GetOrCompile against Clear and asserts the caller's evaluator is still usable.

Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)

Core.Scripting-015

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`)
Status	Open

Description: ToCSharpTypeName is documented to handle nested types (Outer+Inner → Outer.Inner) via Replace('+', '.') for the non-generic path (line 269) but the generic path (line 263-266) constructs the name from def.FullName! then takes a substring up to the backtick. For a nested generic type — e.g. Outer.Inner<T> whose FullName is Outer+Inner1—Replace('+', '.')is applied first, thenSubstring(0, IndexOf('')) on "Outer.Inner1"produces"Outer.Inner"`, which is correct. Good.

However, the generic branch does NOT handle the case where the OPEN generic type itself is nested with + inside the parent's name when the parent is also generic (Outer<TOuter>.Inner<TInner> — FullName is Outer1+Inner1[[TOuter,TInner]]). For that shape Substring(0, IndexOf(''))truncates at the first backtick — yielding"Outer.Inner"— silently dropping the closed type arguments ofOuter. The resulting source string is syntactically valid but semantically wrong: global::Outer.Innerdoes not nameOuter.Inner`.

The production code never hits this shape — TResult is always one of object?, bool, int, double, string?, DateTime across the virtual-tag engine, the alarm engine, the test-harness, and the test suite, and ScriptGlobals<TContext> is always a top-level generic over a top-level ScriptContext subclass. The bug is latent. But it is a foot-gun for a future caller (e.g. a Phase-8 driver that wires a context type defined as a nested generic for grouping reasons) and the XML-doc comment claims "handles nested types" without qualifying it.

A second smaller correctness gap on the same path: the comment claims global::-qualified FQNs prevent accidental capture by the wrapper's using directives, which is true for the generic / non-generic branches, but the primitive aliases (bool, int, string, object, …) are emitted unqualified. A script that defines a local class bool (now possible per Core.Scripting-013) would shadow the alias. Probably benign, but worth a comment.

Recommendation: Add a check in the generic branch that walks the FullName backtick-by-backtick — or use INamedTypeSymbol-style name composition from def.DeclaringType recursively — so multi-arity-nested generics emit correctly. At minimum update the XML doc to qualify "handles nested types" as "handles single-level nesting; nested generics whose parent is itself generic are not supported". Add a ToCSharpTypeName unit test (currently nothing exercises this method directly — coverage relies on the end-to-end compile path, so the bug surfaces only as a misleading Roslyn diagnostic).

Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)

Core.Scripting-016

Field	Value
Severity	Medium
Category	Performance & resource management
Location	`src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182`
Status	Resolved

Description: The Core.Scripting-008 resolution introduced ScriptEvaluator.IDisposable + CompiledScriptCache.Clear() that disposes each materialised evaluator before dropping its dictionary entry, so per-publish ALC accretion is no longer process-lifetime rooted inside the cache. But neither production consumer of ScriptEvaluator uses the cache — both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync call ScriptEvaluator<TContext, TResult>.Compile(...) directly (lines 105 / 160 respectively), store the evaluator inside an internal VirtualTagState / AlarmState record, and on the next Load simply call _tags.Clear() / _alarms.Clear(). The dropped ScriptEvaluator references never have Dispose() called on them, so the underlying ScriptAssemblyLoadContext instances are never Unload()-ed. The .NET runtime guarantees that a collectible ALC stays alive until Unload() is called explicitly — having "no strong references" is necessary but not sufficient. So the publish-replace cycle leaks every prior generation's emitted assembly exactly as before the fix, even though the fix's infrastructure is in place.

The Core.Scripting-008 regression tests in CompiledScriptCacheTests (Dispose_unloads_compiled_script_assembly_load_context / Clear_disposes_every_materialised_evaluator) prove the contract on CompiledScriptCache, but neither engine uses that class. There is no integration test exercising the actual publish path — i.e. that calling VirtualTagEngine.Load(...) twice with different definitions makes the prior generation's ALC eligible for GC. As a result the fix's headline guarantee ("Server restarts are no longer required to reclaim compiled-script memory" — docs/VirtualTags.md) is not actually delivered to the production engines.

This is the same observable behavior the original Core.Scripting-008 finding described, surfacing on a different code path that the resolution did not touch.

Recommendation: Either route the engines' compile path through CompiledScriptCache<TContext, TResult> (the documented design — the cache already returns the same evaluator instance for identical source, and its Clear() now performs the right disposal — and Script.SourceHash's doc-comment already names this as the cache key), or make the engines' Load methods dispose the previous ScriptEvaluator instances before reassigning. The former is the cleaner change because it also collapses redundant compiles across publishes for unchanged scripts. Add an integration test along the lines of CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator for each engine: snapshot the per-evaluator emitted assembly via WeakReference, call Load(...) with a different definition set, and assert the prior generation's assemblies become collectable.

Resolution: Resolved 2026-05-23 — took the cleaner route from the recommendation: routed both engines' compile paths through CompiledScriptCache<TContext, TResult>. VirtualTagEngine and ScriptedAlarmEngine each gained a private _compileCache instance field, their Load/LoadAsync methods now call _compileCache.GetOrCompile(source) instead of ScriptEvaluator.Compile(source) directly, and the cache is cleared on publish-replace alongside the existing _tags / _alarms clears so the prior generation's ALCs are disposed before recompile. Engine Dispose now also calls _compileCache.Dispose() so the engine-shutdown path actually releases the emitted assemblies. Side-fix: discovered + fixed an adjacent bug in CompiledScriptCache.Dispose() itself — it set _disposed = true before calling Clear(), but Clear()'s pre-existing if (_disposed) return guard then aborted the drain unconditionally, so the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard on Clear() (the operation is idempotent — clearing an empty/cleared cache is safe). Without this side-fix the engine-Dispose path would have left the cached evaluators rooted forever even though the call chain looked correct. Side-fix for ScriptedAlarmEngine.Dispose: moved the pre-existing "do NOT clear _alarms here" comment to "clear _alarms AFTER the drain" because the AlarmState records hold the TimedScriptEvaluator/ScriptEvaluator delegates that root the emitted assembly — leaving them in _alarms after Dispose was the same root-the-script-forever pattern this finding is about, just on the engine side rather than the cache side. The _alarms clear is safe after the Task.WhenAll drain because that drain guarantees no background callback is mid-flight. Regression tests added: VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly and ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly — each uses WeakReference + bounded GC.Collect() to prove the emitted assembly is reclaimable after engine.Dispose(). Important test pattern detail: the alarms test originally failed because its helper was async Task<WeakReference> — async state machines capture locals as state-struct fields and can keep them alive past the method's apparent end. Rewrote as a synchronous helper using LoadAsync(...).GetAwaiter().GetResult() inside two cooperating [MethodImpl(MethodImplOptions.NoInlining)] helpers (CompileAlarmAndCaptureWeak + ExtractEmittedAssemblyWeakRef) so the intermediate reflection locals die when each helper returns. Test totals after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was 56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).

43 KiB Raw Blame History

Code Review — Core.Scripting

Checklist coverage

Findings

Core.Scripting-001

Core.Scripting-002

Core.Scripting-003

Core.Scripting-004

Core.Scripting-005

Core.Scripting-006

Core.Scripting-007

Core.Scripting-008

Core.Scripting-009

Core.Scripting-010

Core.Scripting-011

Core.Scripting-012

Core.Scripting-013

Core.Scripting-014

Core.Scripting-015

Core.Scripting-016

43 KiB

Raw Blame History