Files
lmxopcua/code-reviews/Core.Scripting/findings.md
Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:39:20 -04:00

717 lines
43 KiB
Markdown

# Code Review — Core.Scripting
| Field | Value |
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
| Reviewer | Claude Code |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 3 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
The 2026-05-23 re-review only covers code touched between commits `76d35d1` and
`a9be809` (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL
references). Categories where the new code surface produced no issues are
recorded as "No new issues" for that pass.
| # | Category | Result (76d35d1) | Result (a9be809, new code only) |
|---|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 | Core.Scripting-015 |
| 2 | OtOpcUa conventions | No issues found | No new issues |
| 3 | Concurrency & thread safety | Core.Scripting-006 | Core.Scripting-014 |
| 4 | Error handling & resilience | Core.Scripting-007 | No new issues |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 | Core.Scripting-012, Core.Scripting-013 |
| 6 | Performance & resource management | Core.Scripting-008 | Core.Scripting-016 |
| 7 | Design-document adherence | Core.Scripting-009 | No new issues |
| 8 | Code organization & conventions | No issues found | No new issues |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 | No new issues |
| 10 | Documentation & comments | No issues found | No new issues |
## Findings
### Core.Scripting-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` |
| Status | Resolved |
**Description:** `System.Environment` lives in the allowed `System` namespace (it is in
`System.Private.CoreLib`, which is allow-listed for primitives) and is not on the
forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling
`System.Environment.Exit(0)` or `System.Environment.FailFast("...")`. Both terminate the
host process immediately. Because scripted-alarm predicates and virtual-tag scripts run
in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10
server process"), a single malicious or buggy predicate brings down the entire server —
an outage affecting every connected client and every driver. `ScriptSandboxTests` only
pins the *read* path (`Environment.GetEnvironmentVariable`) as an accepted compromise; the
process-killing members are not considered. The whole-process kill far exceeds the
"read-only process state" justification the test comments rely on.
**Recommendation:** The forbidden surface must be member-granular, not namespace-granular,
for types in allowed namespaces. Add an explicit forbidden-member deny-list to
`ForbiddenTypeAnalyzer` covering at minimum `System.Environment.Exit`,
`System.Environment.FailFast`, `System.AppDomain`, `System.GC` (e.g. `GC.Collect`,
`GC.AddMemoryPressure`), and `System.Activator.CreateInstance` (a reflection-equivalent
escape). Reject these in `CheckSymbol` by resolved method symbol, with a test for each.
**Resolution:** Resolved 2026-05-22 — added a type-granular `ForbiddenFullTypeNames`
deny-list (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`) to
`ForbiddenTypeAnalyzer`; `CheckSymbol` now rejects any resolved type symbol whose
fully-qualified name matches, alongside the existing namespace-prefix check, so dangerous
`System`-namespace process-control types are blocked at compile while legitimate `System`
types (Math, String, …) stay usable. Regression tests added in `ScriptSandboxTests` for
`Environment.Exit` / `Environment.FailFast` / `Environment.GetEnvironmentVariable` /
`AppDomain` / `GC.Collect` / `Activator.CreateInstance`.
### Core.Scripting-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:70` |
| Status | Resolved |
**Description:** The syntax walker only inspects four node kinds:
`ObjectCreationExpressionSyntax`, `InvocationExpressionSyntax` with a member-access target,
`MemberAccessExpressionSyntax`, and bare `IdentifierNameSyntax`. It never visits
`TypeOfExpressionSyntax`, generic type-argument lists (`GenericNameSyntax` /
`TypeArgumentListSyntax`), cast expressions (`CastExpressionSyntax`), `is`/`as` type
patterns, `default(T)` expressions, array-creation element types, or `using`/local
declared types. A script such as `typeof(System.IO.File)`,
`new System.Collections.Generic.List<System.IO.FileInfo>()`,
`(System.IO.Stream)null`, or `default(System.Reflection.Assembly)` references a forbidden
type without ever producing a node the walker examines, so the forbidden-type check is
bypassed. The Phase 7 plan A.6 explicitly calls out `typeof` as a sandbox-escape attempt
that "must fail at compile" — it currently does not.
**Recommendation:** Walk every `TypeSyntax` node (handle `TypeOfExpressionSyntax`,
`CastExpressionSyntax`, generic argument lists, and the type operand of
`IsPatternExpression` / binary `as`). The simplest robust fix is to enumerate all
`DescendantNodes()` and, for any node, resolve both `GetSymbolInfo` and `GetTypeInfo`,
then check the resolved type plus every type argument. Add tests covering `typeof`,
generic arguments, casts, and `default(T)` with forbidden types.
**Resolution:** Resolved 2026-05-22 — `ForbiddenTypeAnalyzer.Analyze` now runs a second
pass that resolves `GetTypeInfo` on every `TypeSyntax` node and recursively unwraps array
element types and generic type arguments, so forbidden types named via `typeof`, generic
arguments (`List<FileInfo>`), casts, `is`/`as` patterns, `default(T)`, array-creation
element types, and explicitly-typed local declarations are all rejected at compile. The
original member/call node-kind switch is kept (deliberately narrow to avoid flagging
inherited members such as `typeof(int).Name`), and a span+type dedupe prevents duplicate
rejections from the two passes. Regression tests added in `ScriptSandboxTests` for each
node form plus over-block guards for allowed generics/`typeof`.
### Core.Scripting-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
| Status | Resolved |
**Description:** There is no bound on memory a script may allocate or on the number of
threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
concern" deferred to v3, but in-process execution means a script doing
`new byte[int.MaxValue]` repeatedly (or `Enumerable.Range(0,int.MaxValue).ToList()` — LINQ
is allow-listed) can drive the whole server to `OutOfMemoryException`, an outage. The
timeout does not help: the allocation can exhaust memory well before 250ms elapses, and
the orphaned thread-pool thread documented in `TimedScriptEvaluator` keeps the allocation
rooted. `System.Threading.Tasks` is not on the deny-list, so a script can also
`Task.Run` an unbounded fan-out of background work that outlives the timeout entirely.
**Recommendation:** At minimum, document this as a known accepted risk in
`docs/ScriptedAlarms.md` / `docs/VirtualTags.md` rather than only in a code comment, and
add the `Task`/`Parallel` namespaces to the forbidden list (scripts are synchronous
predicates — they have no legitimate need to start background tasks). For memory, gate
script authoring behind an Admin permission and treat the test-harness preview as the
control point, or track an explicit v3 issue for out-of-process execution. Record the
decision so it is not silently lost.
**Resolution:** Resolved 2026-05-22 — added `System.Threading.Tasks` to `ForbiddenNamespacePrefixes` (blocking `Task.Run` / `Parallel` fan-out); documented the unbounded-memory accepted risk and the `Task` denial rationale in `docs/VirtualTags.md` (new "Known resource limits" subsection) and cross-referenced from `docs/ScriptedAlarms.md`.
### Core.Scripting-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `DependencyExtractor.cs:73` |
| Status | Resolved |
**Description:** The walker matches tag-access calls purely by spelling — any
`InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as
a `ScriptContext` tag access, regardless of the receiver. A script that defines a local
type with a `GetTag(string)` method and calls `other.GetTag("X")`, or calls
`this.GetTag(...)` on a script-defined helper, has spurious dependencies harvested (or, if
the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long
as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not
check that the receiver is the `ctx` identifier — it accepts any member access with the
matching name. The `DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag` test
passes only because the helper there is a *free* function (not member-access form); a
member-access call to a non-ctx `GetTag` is untested and would be misattributed.
**Recommendation:** In `VisitInvocationExpression`, additionally require that
`member.Expression` is an `IdentifierNameSyntax` with `Identifier.ValueText == "ctx"`
(matching the `ScriptGlobals<TContext>.ctx` field name). Add a test for
`someOtherObject.GetTag("X")` asserting it is ignored.
**Resolution:** Resolved 2026-05-22 — `VisitInvocationExpression` now additionally checks that `member.Expression` is an `IdentifierNameSyntax` with `ValueText == "ctx"` before treating the call as a dependency; test `Ignores_member_access_GetTag_on_non_ctx_receiver` added to `DependencyExtractorTests`.
### Core.Scripting-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `DependencyExtractor.cs:97` |
| Status | Resolved |
**Description:** A raw string literal token passed as the tag path (a raw triple-quote
literal) tokenizes as `SingleLineRawStringLiteralToken` /
`MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check
`literal.Token.IsKind(SyntaxKind.StringLiteralToken)` therefore rejects an
otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading
rejection message. This is an edge case (operators rarely write raw strings for tag
paths) but the error text would confuse anyone who does.
**Recommendation:** Accept all string-literal token kinds — check
`literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, or include
the raw-string token kinds, so a static raw string is harvested rather than rejected.
**Resolution:** Resolved 2026-05-23 — `HandleTagCall` now checks `literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, which covers regular string literals, single-line raw strings, and multi-line raw strings uniformly. Regression tests `Accepts_single_line_raw_string_literal_path` and `Accepts_multi_line_raw_string_literal_path` added to `DependencyExtractorTests`.
### Core.Scripting-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `CompiledScriptCache.cs:55` |
| Status | Resolved |
**Description:** On a failed compile the `catch` block calls
`_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for
the same bad source, both observe the same faulted `Lazy` and throw, and both call
`TryRemove(key)`. If a concurrent retry re-adds a new `Lazy` for that key between the two
removals, the second unconditional `TryRemove` could evict the in-flight retry entry. The
window is small and the consequence is only a redundant recompile, so severity is Low —
but the removal should be key+value scoped for correctness.
**Recommendation:** Use the `ConcurrentDictionary.TryRemove(KeyValuePair<,>)` overload to
remove only the specific faulted `Lazy` instance, so a concurrently re-added entry is not
evicted.
**Resolution:** Resolved 2026-05-23 — `GetOrCompile`'s catch block now evicts via `_cache.TryRemove(new KeyValuePair<string, Lazy<…>>(key, lazy))`, comparing the value reference so only the faulted Lazy is removed; a concurrent retry that re-inserted a fresh Lazy under the same key is preserved. Regression test `Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` added to `CompiledScriptCacheTests` (reflection-driven deterministic race: the faulted Lazy's factory swaps the dictionary entry to a fresh Lazy as a side effect of its throw, modelling the precise race window).
### Core.Scripting-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `TimedScriptEvaluator.cs:60` |
| Status | Resolved |
**Description:** `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits
`WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the
timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cancellation
is non-deterministic, so the same shutdown can sometimes surface as
`ScriptTimeoutException` and sometimes as `OperationCanceledException`. The class docs
assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown
path depends on to avoid misclassifying shutdown as a script fault — but the
implementation does not guarantee it when both fire close together.
**Recommendation:** After catching `TimeoutException`, check `ct.IsCancellationRequested`
and throw `OperationCanceledException(ct)` instead of `ScriptTimeoutException` when the
caller's token is cancelled, so caller cancellation deterministically wins regardless of
race ordering.
**Resolution:** Resolved 2026-05-22 — in the `catch (TimeoutException)` handler, `ct.IsCancellationRequested` is now checked and `OperationCanceledException(ct)` thrown before `ScriptTimeoutException`, so caller cancellation deterministically wins regardless of race ordering; regression test `Caller_cancellation_wins_even_when_timeout_fires_first` added to `TimedScriptEvaluatorTests`.
### Core.Scripting-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Status | Resolved |
**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
delegate, which keeps the dynamically emitted script assembly loaded for the process
lifetime — emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded.
`Clear()` drops the dictionary entries but does **not** unload the emitted assemblies;
they leak. Across many config-generation publishes (each `Clear()` followed by recompiling
every script), the process accumulates dead script assemblies. For the expected "low
thousands" of scripts this is benign, but a long-running server with frequent publishes
will see steady managed-memory growth that never returns.
**Recommendation:** Document the per-publish assembly accretion as a known limitation, or
compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload prior
generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
high-publish-frequency deployments are aware.
**Resolution:** Resolved 2026-05-23 — switched the compile pipeline off the legacy
`CSharpScript.CreateDelegate` path (which emits into the default, non-collectible
`AssemblyLoadContext`) and onto a hand-rolled `CSharpCompilation`
`Compilation.Emit(MemoryStream)``ScriptAssemblyLoadContext.LoadFromStream` chain,
with the new `ScriptAssemblyLoadContext` constructed `isCollectible: true`. Each
compiled script lives in its own ALC; `ScriptEvaluator` now implements `IDisposable`
and calls `AssemblyLoadContext.Unload()` on dispose. `CompiledScriptCache.Clear()`
disposes every materialised evaluator before dropping its dictionary entry, and
`CompiledScriptCache` itself is now `IDisposable` for graceful server shutdown.
After a publish-replace cycle the prior generation's emitted assemblies become
eligible for GC; the reclaim is GC-timing-sensitive (Unload is
*eligible-for-collection*, not synchronous) and the next collection cycle reclaims
them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry
via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and
`ForbiddenTypeAnalyzer` is the sole security gate (consistent with the
Core.Scripting-001 / -002 model). `docs/VirtualTags.md` "Compile cache" section rewritten;
`docs/ScriptedAlarms.md` cross-reference updated to drop the obsolete restart guidance.
Regression tests added in `CompiledScriptCacheTests`:
`Dispose_unloads_compiled_script_assembly_load_context`,
`Clear_disposes_every_materialised_evaluator`, and
`GetOrCompile_after_Dispose_throws_ObjectDisposedException`; the first two
prove ALC unload via `WeakReference` + bounded `GC.Collect()` loops. Suite now 104
green (was 101). Authoring convention: the synthesized wrapper is an ordinary
C# static method, so scripts must end with explicit `return …;` per ordinary C# rules
(the legacy `CSharpScript` "last expression yields result" shorthand no longer applies);
every script in the existing corpus already uses explicit `return` so this is a doc-only
change for new authors.
### Core.Scripting-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `ForbiddenTypeAnalyzer.cs:45` |
| Status | Resolved |
**Description:** The Phase 7 plan decision #6
(`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden
surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually
denies a broader set — `System.Threading.Thread`, `System.Runtime.InteropServices`, and
`Microsoft.Win32` (registry) — which is sensible hardening but is undocumented in the plan
and in `docs/ScriptedAlarms.md` (which defers sandbox rules to `VirtualTags.md`). An
operator reading the design docs cannot predict that a registry or interop reference will
be rejected. Conversely the plan does not record the `System.Environment` /
`System.Diagnostics` decisions. The code and the design document have drifted.
**Recommendation:** Update the plan's decision #6 (or `docs/VirtualTags.md`) to list the
authoritative deny-list exactly as `ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes`
defines it, including the `System.Environment` allowed-compromise, so the docs match the
code.
**Resolution:** Resolved 2026-05-23 — `docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 row + the "Sandbox escape" compliance-check row now enumerate the authoritative deny-list exactly as `ForbiddenTypeAnalyzer` defines it (namespace-prefix denies for `System.IO` / `System.Net` / `System.Diagnostics` / `System.Reflection` / `System.Threading.Tasks` / `System.Runtime.InteropServices` / `Microsoft.Win32`; type-granular denies for `System.Environment` / `System.AppDomain` / `System.GC` / `System.Activator` / `System.Threading.Thread`), and the compliance-check row lists the syntactic vectors (`typeof` / generic arg / cast / `is`/`as` / `default(T)` / array element / declared local) the broadened analyzer covers. `docs/VirtualTags.md` already documents the same list and is unchanged.
### Core.Scripting-010
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Status | Resolved |
**Description:** The sandbox-escape test suite covers only the four obvious vectors
(File / Http / Process / Reflection) as direct member-access calls. It does not test:
`typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to
forbidden types, `System.Environment.Exit` / `FailFast`, `System.Threading.Thread`,
`System.Runtime.InteropServices`, `Microsoft.Win32` registry access, `Activator`, or
`System.AppDomain`. Given that the analyzer is the sole security boundary for in-process
untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went
undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated
"sandbox escape tests" but the implemented set is materially narrower than the threat
surface.
**Recommendation:** Add a parameterised escape-test covering every node form in
Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
assert a `ScriptSandboxViolationException` (or `CompilationErrorException`) at compile.
**Resolution:** Resolved 2026-05-22 — added `ScriptSandboxTests` cases for `System.Threading.Thread`, `System.Threading.Tasks.Task.Run`, `System.Runtime.InteropServices.Marshal`, and `Microsoft.Win32.Registry` (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.
### Core.Scripting-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
| Status | Resolved |
**Description:** Two source files have no direct test coverage: `ScriptContext`
(`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and
not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the
`ArgumentNullException` / `ArgumentException` guards on `contextType` at
`ScriptSandbox.cs:45-48` are never asserted). `ScriptLogCompanionSink` and
`ScriptLoggerFactory` have tests, but there is no test that a script's `ctx.Logger` Error
emission surfaces via the companion sink end-to-end (factory + sink integration is
untested). These are minor gaps but leave guard clauses and the logging integration
unverified.
**Recommendation:** Add unit tests for `ScriptSandbox.Build` argument validation, for
`ScriptContext.Deadband` at and around the tolerance boundary, and an end-to-end test that
a script logging at Error level produces both a `scripts-*.log` event and a companion
Warning event.
**Resolution:** Resolved 2026-05-23 — added three new test files: `ScriptSandboxBuildTests` covers the `Build` null / non-`ScriptContext` / base-class / concrete-subclass paths; `ScriptContextTests` locks `Deadband` boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new `Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs` test in `ScriptLogCompanionSinkTests` wires `ScriptLoggerFactory` + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with `ScriptName`. Suite now 101 green (was 85 before).
### Core.Scripting-012
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Status | Resolved |
**Description:** The Core.Scripting-008 rewrite broadened the BCL references list
from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the
full `TRUSTED_PLATFORM_ASSEMBLIES` set filtered to `System.*` + `netstandard` +
`Microsoft.Win32.Registry`. This change correctly delegates the security gate to
`ForbiddenTypeAnalyzer` (the new comment in `ScriptSandbox` calls this out
explicitly), but the analyzer's deny-list has not been expanded to match the new
attack surface, and three categories of dangerous BCL types in the `System.*`
allow-listed assemblies are now reachable from script source:
1. **`System.Threading.ThreadPool`** (in namespace `System.Threading`). The
Core.Scripting-003 fix added `System.Threading.Tasks` to deny `Task.Run` /
`Parallel` fan-out because background work that outlives the per-evaluation
timeout is the explicit threat. `ThreadPool.QueueUserWorkItem`,
`ThreadPool.UnsafeQueueUserWorkItem`, and `ThreadPool.RegisterWaitForSingleObject`
are exactly the same threat — they schedule background work that outlives the
`WaitAsync(Timeout)` budget and tie up worker threads — but `System.Threading`
itself is allowed (because `CancellationToken` / `SemaphoreSlim` / `Volatile`
live there). The Core.Scripting-003 resolution is incomplete on the new
reference surface.
2. **`System.Threading.Timer`** (same namespace). Schedules a background
callback; the script returns control to the engine but the timer keeps
firing past the evaluation budget. Same threat as `Task.Run`.
3. **`System.Runtime.Loader.AssemblyLoadContext`** (in namespace
`System.Runtime.Loader`, which is not denied — only `System.Runtime.InteropServices`
is). The constructor + `LoadFromAssemblyPath` / `LoadFromStream` /
`LoadFromAssemblyName` let a script load an arbitrary DLL into the host
process. Pass (1) of the analyzer resolves the receiver type
(`AssemblyLoadContext`, allowed) + the invocation symbol's containing type
(also `AssemblyLoadContext`, allowed) and lets the call through. Pass (2)
only inspects `TypeSyntax` nodes — if the script discards the returned
`Assembly` (e.g. `alc.LoadFromAssemblyPath(@"C:\evil.dll");`) there is no
`TypeSyntax` for the analyzer to walk and the call is accepted. Triggering
execution of the loaded code from inside the sandbox is hard (most of
`Assembly`'s surface is in `System.Reflection`, which is denied) but the
defense-in-depth gap is real: an attacker who can author a script also
typically controls a file path on the server (Admin UI uploads, share
mounts) and loading an assembly is the prerequisite to every chained
escape — module initializers, type-resolve handlers, and a future analyzer
slip would all become exploitable.
In addition, two lower-impact `System.*` types are reachable that arguably
shouldn't be: **`System.Console.SetOut`** / **`Console.SetError`** could
redirect the host's console streams (requires constructing a
`System.IO.TextWriter`, which is blocked, so the practical exploit is
`Console.WriteLine` log-spam only), and **`System.Globalization.CultureInfo.DefaultThreadCurrentCulture`**
could perturb the entire process's formatting behavior (subtle but real cross-script
side effect).
The original Core.Scripting-001 finding called out the model: when an allow-listed
namespace contains dangerous types, those types must be denied type-granularly.
The new reference surface introduces several more such types and the deny-list
has not been kept in sync.
**Recommendation:** Add `System.Threading.ThreadPool` and `System.Threading.Timer`
to `ForbiddenFullTypeNames`. Add `System.Runtime.Loader` as a namespace prefix
to `ForbiddenNamespacePrefixes` (every type in `System.Runtime.Loader`
`AssemblyLoadContext`, `AssemblyDependencyResolver`, `AssemblyLoadEventArgs` — is
out of script scope). Consider adding `System.Console` to `ForbiddenFullTypeNames`
to stop log-spam through the host's console streams, and at minimum document
`CultureInfo.DefaultThreadCurrentCulture` as an accepted cross-script side
effect. Each addition must have a regression test in `ScriptSandboxTests`
mirroring the Core.Scripting-010 vector style. Update
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 + the
"Sandbox escape" compliance-check row to enumerate the additions, per the
Core.Scripting-009 doc-sync convention.
**Resolution:** Resolved 2026-05-23 — added `System.Runtime.Loader` to
`ForbiddenNamespacePrefixes` (the namespace-prefix form preferred over
type-granular per the recommendation; future BCL additions to that namespace
are denied by default). Added `System.Threading.ThreadPool` and
`System.Threading.Timer` to `ForbiddenFullTypeNames` — both live in
`System.Threading` shared with allowed sync primitives so they must be
type-granular. Regression tests added to `ScriptSandboxTests`:
`Rejects_ThreadPool_QueueUserWorkItem_at_compile`,
`Rejects_Timer_new_at_compile`, `Rejects_AssemblyLoadContext_at_compile`.
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 +
the Sandbox-escape compliance-check row both updated per the
Core.Scripting-009 doc-sync convention. The two lower-impact suggestions
from the recommendation (`System.Console`, `CultureInfo.DefaultThreadCurrentCulture`)
were intentionally not addressed: `Console.SetOut` requires constructing
a `System.IO.TextWriter` which is already blocked, leaving only
`Console.WriteLine` log-spam (annoyance, not a security threat); and
`CultureInfo.DefaultThreadCurrentCulture` is a cross-script side-effect
worth knowing about but doesn't escape the sandbox. Recording both as
accepted minor risks. Test totals after fix: Core.Scripting 107 green
(was 104 — +3 new rejection tests).
### Core.Scripting-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Status | Open |
**Description:** The synthesized wrapper pastes the user's source verbatim
between `{` and `}` braces inside a static method body, with a `#line 1`
directive and no escaping. The legacy `CSharpScript.CreateDelegate` path was
robust to this because Roslyn's scripting compiler parses script source as a
top-level statement sequence; the new hand-rolled path is parsing ordinary C# in
a method body, so a script that injects matching `{` / `}` braces can extend the
synthesized compilation unit with additional methods, classes, or `#line`
directives. For example, a script body of
`return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() {`
ends the `Run` method early, declares a sibling `Evil` method (and even a
sibling `CompiledScript2` class) inside the synthesized namespace, then opens an
unclosed method that consumes the wrapper's trailing `}\n}`. With matching brace
counts the script parses cleanly and compiles.
`ForbiddenTypeAnalyzer` walks every descendant of every syntax tree, so any
forbidden BCL types named inside the injected methods are still caught — the
finding is **not** a direct sandbox escape. However:
- It silently relaxes the operator-visible authoring contract documented in
`docs/VirtualTags.md` ("scripts are statement bodies that end with an
explicit `return …;`") to "scripts can be any compilable C# inside the
`CompiledScript` namespace" — operators have access to features the design
did not intend to expose (local types defined as siblings of `Run`, custom
module initializers via attributes, etc.).
- A script can embed its own `#line` directives that override the
`#line 1` we emit just above the user source, producing misleading error
locations in compiler diagnostics surfaced to the operator.
- Future hardening that relies on syntactic-shape assumptions (e.g.
"every script has exactly one method") would silently fail.
- It widens the analyzer's surface: the analyzer's correctness now depends on
Pass (2) correctly walking every conceivable C# construct that can name a
type, including ones a normal script body would never contain
(`UnmanagedCallersOnly` attribute, function pointer types `delegate*<...>`,
pattern types, switch arm types, …).
**Recommendation:** Either (a) reject scripts whose parsed body contains
declarations other than statements — walk the wrapper's syntax tree after parse
and require that the only members of `CompiledScript` are the single `Run`
method, raising a `CompilationErrorException` if anything else appears — or
(b) parse the user source independently as a `BlockSyntax` and inject the
parsed block as the method body via the Roslyn syntax API, which makes
brace-mismatched / class-injecting source unparseable. Add a regression test
covering at least the brace-injection vector
(`return 0; } public static int Evil() { return 0;`).
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
### Core.Scripting-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Status | Open |
**Description:** `Clear()` snapshots `_cache.Keys.ToArray()` then iterates,
calling `TryRemove(key, out var lazy)` on each — the key-only overload, not
the value-scoped one used in `GetOrCompile`'s catch block. Between the
snapshot and a given `TryRemove`, a concurrent `GetOrCompile(scriptSource)`
call that hashes to the same key can re-insert a fresh `Lazy` whose `.Value`
the caller already retained. The unconditional `TryRemove` then removes that
fresh `Lazy` and `DisposeLazyIfMaterialised(lazy)` calls `Dispose()` on its
evaluator — unloading the ALC while the concurrent caller still holds a
reference to the evaluator and intends to invoke it.
This is exactly the race-window pattern the Core.Scripting-006 resolution
fixed in `GetOrCompile`'s catch block (the test
`Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` locks it
there). `Clear()` carries the same shape but uses the older, value-blind
overload, so the same race that finding-006 addresses is still latent on the
publish-replace path.
In current production wiring `Clear()` is intended for config-publish + tests
— neither overlaps steady-state evaluation under the documented design — so
the in-practice impact is low. But the cache is checked in as the
forward-looking compile cache for the engines (per `Script.SourceHash`'s docs
and the cache's own remarks); a future wiring that calls `Clear()` from
publish while evaluations are in flight would dispose live evaluators.
**Recommendation:** Replace the snapshot + `TryRemove(key, out var lazy)`
sequence with an enumeration that captures the `Lazy` reference at snapshot
time and uses the value-scoped `TryRemove(KeyValuePair<,>)` overload, mirroring
the Core.Scripting-006 fix:
```csharp
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
```
Add a regression test that races `GetOrCompile` against `Clear` and asserts
the caller's evaluator is still usable.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
### Core.Scripting-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Status | Open |
**Description:** `ToCSharpTypeName` is documented to handle nested types
(`Outer+Inner``Outer.Inner`) via `Replace('+', '.')` for the
non-generic path (line 269) but the generic path (line 263-266) constructs the
name from `def.FullName!` then takes a substring up to the backtick. For a
**nested generic** type — e.g. `Outer.Inner<T>` whose `FullName` is
`Outer+Inner`1` — `Replace('+', '.')` is applied first, then `Substring(0, IndexOf('`'))`
on `"Outer.Inner`1"` produces `"Outer.Inner"`, which is correct. Good.
However, the generic branch does NOT handle the case where the OPEN generic
type itself is nested with `+` inside the parent's name when the parent is
also generic (`Outer<TOuter>.Inner<TInner>` — `FullName` is
`Outer`1+Inner`1[[TOuter,TInner]]`). For that shape `Substring(0, IndexOf('`'))`
truncates at the first backtick — yielding `"Outer.Inner"` — silently dropping
the closed type arguments of `Outer<TOuter>`. The resulting source string is
syntactically valid but semantically wrong: `global::Outer.Inner<TInner>` does
not name `Outer<TOuter>.Inner<TInner>`.
The production code never hits this shape — `TResult` is always one of
`object?`, `bool`, `int`, `double`, `string?`, `DateTime` across the
virtual-tag engine, the alarm engine, the test-harness, and the test suite,
and `ScriptGlobals<TContext>` is always a top-level generic over a top-level
`ScriptContext` subclass. The bug is latent. But it is a foot-gun for a
future caller (e.g. a Phase-8 driver that wires a context type defined as a
nested generic for grouping reasons) and the XML-doc comment claims
"handles nested types" without qualifying it.
A second smaller correctness gap on the same path: the comment claims
`global::`-qualified FQNs prevent accidental capture by the wrapper's `using`
directives, which is true for the generic / non-generic branches, but the
primitive aliases (`bool`, `int`, `string`, `object`, …) are emitted unqualified.
A script that defines a local `class bool` (now possible per Core.Scripting-013)
would shadow the alias. Probably benign, but worth a comment.
**Recommendation:** Add a check in the generic branch that walks the FullName
backtick-by-backtick — or use `INamedTypeSymbol`-style name composition from
`def.DeclaringType` recursively — so multi-arity-nested generics emit
correctly. At minimum update the XML doc to qualify "handles nested types" as
"handles single-level nesting; nested generics whose parent is itself generic
are not supported". Add a `ToCSharpTypeName` unit test (currently nothing
exercises this method directly — coverage relies on the end-to-end compile path,
so the bug surfaces only as a misleading Roslyn diagnostic).
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
### Core.Scripting-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Status | Resolved |
**Description:** The Core.Scripting-008 resolution introduced
`ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes
each materialised evaluator before dropping its dictionary entry, so per-publish
ALC accretion is no longer process-lifetime rooted **inside the cache**. But
neither production consumer of `ScriptEvaluator` uses the cache — both
`VirtualTagEngine.Load` and `ScriptedAlarmEngine.LoadAsync` call
`ScriptEvaluator<TContext, TResult>.Compile(...)` directly (lines 105 / 160
respectively), store the evaluator inside an internal `VirtualTagState` /
`AlarmState` record, and on the next `Load` simply call `_tags.Clear()` /
`_alarms.Clear()`. The dropped `ScriptEvaluator` references never have
`Dispose()` called on them, so the underlying `ScriptAssemblyLoadContext`
instances are never `Unload()`-ed. The .NET runtime guarantees that a
collectible ALC stays alive until `Unload()` is called explicitly — having
"no strong references" is necessary but not sufficient. So the publish-replace
cycle leaks every prior generation's emitted assembly exactly as before the
fix, even though the fix's infrastructure is in place.
The Core.Scripting-008 regression tests in `CompiledScriptCacheTests`
(`Dispose_unloads_compiled_script_assembly_load_context` /
`Clear_disposes_every_materialised_evaluator`) prove the contract on
`CompiledScriptCache`, but neither engine uses that class. There is no
integration test exercising the actual publish path — i.e. that calling
`VirtualTagEngine.Load(...)` twice with different definitions makes the prior
generation's ALC eligible for GC. As a result the fix's headline guarantee
("Server restarts are no longer required to reclaim compiled-script memory" —
`docs/VirtualTags.md`) is not actually delivered to the production engines.
This is the same observable behavior the original Core.Scripting-008 finding
described, surfacing on a different code path that the resolution did not touch.
**Recommendation:** Either route the engines' compile path through
`CompiledScriptCache<TContext, TResult>` (the documented design — the cache
already returns the same evaluator instance for identical source, and its
`Clear()` now performs the right disposal — and `Script.SourceHash`'s doc-comment
already names this as the cache key), or make the engines' `Load` methods
dispose the previous `ScriptEvaluator` instances before reassigning. The
former is the cleaner change because it also collapses redundant compiles
across publishes for unchanged scripts. Add an integration test along the
lines of `CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator`
for each engine: snapshot the per-evaluator emitted assembly via
`WeakReference`, call `Load(...)` with a different definition set, and assert
the prior generation's assemblies become collectable.
**Resolution:** Resolved 2026-05-23 — took the cleaner route from the
recommendation: routed both engines' compile paths through
`CompiledScriptCache<TContext, TResult>`. `VirtualTagEngine` and
`ScriptedAlarmEngine` each gained a private `_compileCache` instance field,
their `Load`/`LoadAsync` methods now call `_compileCache.GetOrCompile(source)`
instead of `ScriptEvaluator.Compile(source)` directly, and the cache is cleared
on publish-replace alongside the existing `_tags` / `_alarms` clears so the
prior generation's ALCs are disposed before recompile. Engine `Dispose` now
also calls `_compileCache.Dispose()` so the engine-shutdown path actually
releases the emitted assemblies. **Side-fix:** discovered + fixed an
adjacent bug in `CompiledScriptCache.Dispose()` itself — it set
`_disposed = true` before calling `Clear()`, but `Clear()`'s pre-existing
`if (_disposed) return` guard then aborted the drain unconditionally, so
the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard
on `Clear()` (the operation is idempotent — clearing an empty/cleared cache
is safe). Without this side-fix the engine-Dispose path would have left
the cached evaluators rooted forever even though the call chain looked
correct. **Side-fix for ScriptedAlarmEngine.Dispose:** moved the pre-existing
"do NOT clear `_alarms` here" comment to "clear `_alarms` AFTER the drain"
because the AlarmState records hold the `TimedScriptEvaluator`/`ScriptEvaluator`
delegates that root the emitted assembly — leaving them in `_alarms` after
Dispose was the same root-the-script-forever pattern this finding is about,
just on the engine side rather than the cache side. The `_alarms` clear is
safe after the `Task.WhenAll` drain because that drain guarantees no
background callback is mid-flight. Regression tests added:
`VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly` and
`ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly`
each uses `WeakReference` + bounded `GC.Collect()` to prove the emitted
assembly is reclaimable after `engine.Dispose()`. **Important test pattern
detail:** the alarms test originally failed because its helper was
`async Task<WeakReference>` — async state machines capture locals as
state-struct fields and can keep them alive past the method's apparent end.
Rewrote as a synchronous helper using `LoadAsync(...).GetAwaiter().GetResult()`
inside two cooperating `[MethodImpl(MethodImplOptions.NoInlining)]` helpers
(`CompileAlarmAndCaptureWeak` + `ExtractEmittedAssemblyWeakRef`) so the
intermediate reflection locals die when each helper returns. Test totals
after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was
56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).