Single commit covering the four small/medium fixes from the updated code review. Core.Scripting-014 (Medium, Concurrency): CompiledScriptCache.Clear() used the key-only TryRemove(key, out var lazy) overload — same race shape Core.Scripting-006 closed in GetOrCompile's catch block. A concurrent re-add between snapshot and TryRemove was evicted + disposed while the new caller still held it. Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload. Regression test Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives added. Core.Scripting-013 (Medium, Security): Hand-rolled BuildWrapperSource pastes user source between literal braces; brace-balanced source could inject sibling methods/classes alongside CompiledScript.Run. Analyzer still walked the injected members so it wasn't a direct escape, but it relaxed the documented 'method body' authoring contract. Added EnforceSingleRunMember: after ParseText, the compilation unit must hold exactly one type (CompiledScript) and that type must hold exactly one member (the Run method). Any deviation throws CompilationErrorException with LMX001/ LMX002 diagnostic IDs and a Core.Scripting-013 reference in the message. Two regression tests added covering the sibling-method and sibling-class injection vectors. Core.Scripting-015 (Low, Correctness, latent): ToCSharpTypeName's generic branch truncated at the first backtick via IndexOf, silently dropping closed args of nested-generic shapes (Outer<T>.Inner<U>). No production caller exercises this shape today (all TContext/TResult are top-level non-nested), so the bug was latent. Rewrote the generic branch to walk the FullName segment-by- segment, consuming generic args per segment so nested shapes emit valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken Outer<T,U>). Core.ScriptedAlarms-013 (Low, Documentation): The internal test accessors TryGetScratchReadCacheForTest / TryGetScratchContextForTest return live mutable scratch refilled in place under _evalGate. XML docs didn't warn future test authors about the synchronization contract. Added a <remarks> block to each documenting the only-safe-on-quiesced-engine + identity-or-single-key contract. Verification (suites green): Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests) Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix) Core.VirtualTags.Tests: 57/57 (unchanged) After this commit, all 12 findings from the updated re-review are closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
757 lines
45 KiB
Markdown
757 lines
45 KiB
Markdown
# Code Review — Core.Scripting
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-23 |
|
|
| Commit reviewed | `a9be809` |
|
|
| Status | Reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
The 2026-05-23 re-review only covers code touched between commits `76d35d1` and
|
|
`a9be809` (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL
|
|
references). Categories where the new code surface produced no issues are
|
|
recorded as "No new issues" for that pass.
|
|
|
|
| # | Category | Result (76d35d1) | Result (a9be809, new code only) |
|
|
|---|---|---|---|
|
|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 | Core.Scripting-015 |
|
|
| 2 | OtOpcUa conventions | No issues found | No new issues |
|
|
| 3 | Concurrency & thread safety | Core.Scripting-006 | Core.Scripting-014 |
|
|
| 4 | Error handling & resilience | Core.Scripting-007 | No new issues |
|
|
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 | Core.Scripting-012, Core.Scripting-013 |
|
|
| 6 | Performance & resource management | Core.Scripting-008 | Core.Scripting-016 |
|
|
| 7 | Design-document adherence | Core.Scripting-009 | No new issues |
|
|
| 8 | Code organization & conventions | No issues found | No new issues |
|
|
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 | No new issues |
|
|
| 10 | Documentation & comments | No issues found | No new issues |
|
|
|
|
## Findings
|
|
|
|
### Core.Scripting-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Critical |
|
|
| Category | Security |
|
|
| Location | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `System.Environment` lives in the allowed `System` namespace (it is in
|
|
`System.Private.CoreLib`, which is allow-listed for primitives) and is not on the
|
|
forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling
|
|
`System.Environment.Exit(0)` or `System.Environment.FailFast("...")`. Both terminate the
|
|
host process immediately. Because scripted-alarm predicates and virtual-tag scripts run
|
|
in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10
|
|
server process"), a single malicious or buggy predicate brings down the entire server —
|
|
an outage affecting every connected client and every driver. `ScriptSandboxTests` only
|
|
pins the *read* path (`Environment.GetEnvironmentVariable`) as an accepted compromise; the
|
|
process-killing members are not considered. The whole-process kill far exceeds the
|
|
"read-only process state" justification the test comments rely on.
|
|
|
|
**Recommendation:** The forbidden surface must be member-granular, not namespace-granular,
|
|
for types in allowed namespaces. Add an explicit forbidden-member deny-list to
|
|
`ForbiddenTypeAnalyzer` covering at minimum `System.Environment.Exit`,
|
|
`System.Environment.FailFast`, `System.AppDomain`, `System.GC` (e.g. `GC.Collect`,
|
|
`GC.AddMemoryPressure`), and `System.Activator.CreateInstance` (a reflection-equivalent
|
|
escape). Reject these in `CheckSymbol` by resolved method symbol, with a test for each.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — added a type-granular `ForbiddenFullTypeNames`
|
|
deny-list (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`) to
|
|
`ForbiddenTypeAnalyzer`; `CheckSymbol` now rejects any resolved type symbol whose
|
|
fully-qualified name matches, alongside the existing namespace-prefix check, so dangerous
|
|
`System`-namespace process-control types are blocked at compile while legitimate `System`
|
|
types (Math, String, …) stay usable. Regression tests added in `ScriptSandboxTests` for
|
|
`Environment.Exit` / `Environment.FailFast` / `Environment.GetEnvironmentVariable` /
|
|
`AppDomain` / `GC.Collect` / `Activator.CreateInstance`.
|
|
|
|
### Core.Scripting-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Location | `ForbiddenTypeAnalyzer.cs:70` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The syntax walker only inspects four node kinds:
|
|
`ObjectCreationExpressionSyntax`, `InvocationExpressionSyntax` with a member-access target,
|
|
`MemberAccessExpressionSyntax`, and bare `IdentifierNameSyntax`. It never visits
|
|
`TypeOfExpressionSyntax`, generic type-argument lists (`GenericNameSyntax` /
|
|
`TypeArgumentListSyntax`), cast expressions (`CastExpressionSyntax`), `is`/`as` type
|
|
patterns, `default(T)` expressions, array-creation element types, or `using`/local
|
|
declared types. A script such as `typeof(System.IO.File)`,
|
|
`new System.Collections.Generic.List<System.IO.FileInfo>()`,
|
|
`(System.IO.Stream)null`, or `default(System.Reflection.Assembly)` references a forbidden
|
|
type without ever producing a node the walker examines, so the forbidden-type check is
|
|
bypassed. The Phase 7 plan A.6 explicitly calls out `typeof` as a sandbox-escape attempt
|
|
that "must fail at compile" — it currently does not.
|
|
|
|
**Recommendation:** Walk every `TypeSyntax` node (handle `TypeOfExpressionSyntax`,
|
|
`CastExpressionSyntax`, generic argument lists, and the type operand of
|
|
`IsPatternExpression` / binary `as`). The simplest robust fix is to enumerate all
|
|
`DescendantNodes()` and, for any node, resolve both `GetSymbolInfo` and `GetTypeInfo`,
|
|
then check the resolved type plus every type argument. Add tests covering `typeof`,
|
|
generic arguments, casts, and `default(T)` with forbidden types.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — `ForbiddenTypeAnalyzer.Analyze` now runs a second
|
|
pass that resolves `GetTypeInfo` on every `TypeSyntax` node and recursively unwraps array
|
|
element types and generic type arguments, so forbidden types named via `typeof`, generic
|
|
arguments (`List<FileInfo>`), casts, `is`/`as` patterns, `default(T)`, array-creation
|
|
element types, and explicitly-typed local declarations are all rejected at compile. The
|
|
original member/call node-kind switch is kept (deliberately narrow to avoid flagging
|
|
inherited members such as `typeof(int).Name`), and a span+type dedupe prevents duplicate
|
|
rejections from the two passes. Regression tests added in `ScriptSandboxTests` for each
|
|
node form plus over-block guards for allowed generics/`typeof`.
|
|
|
|
### Core.Scripting-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** There is no bound on memory a script may allocate or on the number of
|
|
threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
|
|
concern" deferred to v3, but in-process execution means a script doing
|
|
`new byte[int.MaxValue]` repeatedly (or `Enumerable.Range(0,int.MaxValue).ToList()` — LINQ
|
|
is allow-listed) can drive the whole server to `OutOfMemoryException`, an outage. The
|
|
timeout does not help: the allocation can exhaust memory well before 250ms elapses, and
|
|
the orphaned thread-pool thread documented in `TimedScriptEvaluator` keeps the allocation
|
|
rooted. `System.Threading.Tasks` is not on the deny-list, so a script can also
|
|
`Task.Run` an unbounded fan-out of background work that outlives the timeout entirely.
|
|
|
|
**Recommendation:** At minimum, document this as a known accepted risk in
|
|
`docs/ScriptedAlarms.md` / `docs/VirtualTags.md` rather than only in a code comment, and
|
|
add the `Task`/`Parallel` namespaces to the forbidden list (scripts are synchronous
|
|
predicates — they have no legitimate need to start background tasks). For memory, gate
|
|
script authoring behind an Admin permission and treat the test-harness preview as the
|
|
control point, or track an explicit v3 issue for out-of-process execution. Record the
|
|
decision so it is not silently lost.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — added `System.Threading.Tasks` to `ForbiddenNamespacePrefixes` (blocking `Task.Run` / `Parallel` fan-out); documented the unbounded-memory accepted risk and the `Task` denial rationale in `docs/VirtualTags.md` (new "Known resource limits" subsection) and cross-referenced from `docs/ScriptedAlarms.md`.
|
|
|
|
### Core.Scripting-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `DependencyExtractor.cs:73` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The walker matches tag-access calls purely by spelling — any
|
|
`InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as
|
|
a `ScriptContext` tag access, regardless of the receiver. A script that defines a local
|
|
type with a `GetTag(string)` method and calls `other.GetTag("X")`, or calls
|
|
`this.GetTag(...)` on a script-defined helper, has spurious dependencies harvested (or, if
|
|
the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long
|
|
as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not
|
|
check that the receiver is the `ctx` identifier — it accepts any member access with the
|
|
matching name. The `DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag` test
|
|
passes only because the helper there is a *free* function (not member-access form); a
|
|
member-access call to a non-ctx `GetTag` is untested and would be misattributed.
|
|
|
|
**Recommendation:** In `VisitInvocationExpression`, additionally require that
|
|
`member.Expression` is an `IdentifierNameSyntax` with `Identifier.ValueText == "ctx"`
|
|
(matching the `ScriptGlobals<TContext>.ctx` field name). Add a test for
|
|
`someOtherObject.GetTag("X")` asserting it is ignored.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — `VisitInvocationExpression` now additionally checks that `member.Expression` is an `IdentifierNameSyntax` with `ValueText == "ctx"` before treating the call as a dependency; test `Ignores_member_access_GetTag_on_non_ctx_receiver` added to `DependencyExtractorTests`.
|
|
|
|
### Core.Scripting-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `DependencyExtractor.cs:97` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** A raw string literal token passed as the tag path (a raw triple-quote
|
|
literal) tokenizes as `SingleLineRawStringLiteralToken` /
|
|
`MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check
|
|
`literal.Token.IsKind(SyntaxKind.StringLiteralToken)` therefore rejects an
|
|
otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading
|
|
rejection message. This is an edge case (operators rarely write raw strings for tag
|
|
paths) but the error text would confuse anyone who does.
|
|
|
|
**Recommendation:** Accept all string-literal token kinds — check
|
|
`literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, or include
|
|
the raw-string token kinds, so a static raw string is harvested rather than rejected.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — `HandleTagCall` now checks `literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, which covers regular string literals, single-line raw strings, and multi-line raw strings uniformly. Regression tests `Accepts_single_line_raw_string_literal_path` and `Accepts_multi_line_raw_string_literal_path` added to `DependencyExtractorTests`.
|
|
|
|
### Core.Scripting-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `CompiledScriptCache.cs:55` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** On a failed compile the `catch` block calls
|
|
`_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for
|
|
the same bad source, both observe the same faulted `Lazy` and throw, and both call
|
|
`TryRemove(key)`. If a concurrent retry re-adds a new `Lazy` for that key between the two
|
|
removals, the second unconditional `TryRemove` could evict the in-flight retry entry. The
|
|
window is small and the consequence is only a redundant recompile, so severity is Low —
|
|
but the removal should be key+value scoped for correctness.
|
|
|
|
**Recommendation:** Use the `ConcurrentDictionary.TryRemove(KeyValuePair<,>)` overload to
|
|
remove only the specific faulted `Lazy` instance, so a concurrently re-added entry is not
|
|
evicted.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — `GetOrCompile`'s catch block now evicts via `_cache.TryRemove(new KeyValuePair<string, Lazy<…>>(key, lazy))`, comparing the value reference so only the faulted Lazy is removed; a concurrent retry that re-inserted a fresh Lazy under the same key is preserved. Regression test `Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` added to `CompiledScriptCacheTests` (reflection-driven deterministic race: the faulted Lazy's factory swaps the dictionary entry to a fresh Lazy as a side effect of its throw, modelling the precise race window).
|
|
|
|
### Core.Scripting-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `TimedScriptEvaluator.cs:60` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits
|
|
`WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the
|
|
timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cancellation
|
|
is non-deterministic, so the same shutdown can sometimes surface as
|
|
`ScriptTimeoutException` and sometimes as `OperationCanceledException`. The class docs
|
|
assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown
|
|
path depends on to avoid misclassifying shutdown as a script fault — but the
|
|
implementation does not guarantee it when both fire close together.
|
|
|
|
**Recommendation:** After catching `TimeoutException`, check `ct.IsCancellationRequested`
|
|
and throw `OperationCanceledException(ct)` instead of `ScriptTimeoutException` when the
|
|
caller's token is cancelled, so caller cancellation deterministically wins regardless of
|
|
race ordering.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — in the `catch (TimeoutException)` handler, `ct.IsCancellationRequested` is now checked and `OperationCanceledException(ct)` thrown before `ScriptTimeoutException`, so caller cancellation deterministically wins regardless of race ordering; regression test `Caller_cancellation_wins_even_when_timeout_fires_first` added to `TimedScriptEvaluatorTests`.
|
|
|
|
### Core.Scripting-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
|
|
remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
|
|
delegate, which keeps the dynamically emitted script assembly loaded for the process
|
|
lifetime — emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded.
|
|
`Clear()` drops the dictionary entries but does **not** unload the emitted assemblies;
|
|
they leak. Across many config-generation publishes (each `Clear()` followed by recompiling
|
|
every script), the process accumulates dead script assemblies. For the expected "low
|
|
thousands" of scripts this is benign, but a long-running server with frequent publishes
|
|
will see steady managed-memory growth that never returns.
|
|
|
|
**Recommendation:** Document the per-publish assembly accretion as a known limitation, or
|
|
compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload prior
|
|
generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
|
|
high-publish-frequency deployments are aware.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — switched the compile pipeline off the legacy
|
|
`CSharpScript.CreateDelegate` path (which emits into the default, non-collectible
|
|
`AssemblyLoadContext`) and onto a hand-rolled `CSharpCompilation` →
|
|
`Compilation.Emit(MemoryStream)` → `ScriptAssemblyLoadContext.LoadFromStream` chain,
|
|
with the new `ScriptAssemblyLoadContext` constructed `isCollectible: true`. Each
|
|
compiled script lives in its own ALC; `ScriptEvaluator` now implements `IDisposable`
|
|
and calls `AssemblyLoadContext.Unload()` on dispose. `CompiledScriptCache.Clear()`
|
|
disposes every materialised evaluator before dropping its dictionary entry, and
|
|
`CompiledScriptCache` itself is now `IDisposable` for graceful server shutdown.
|
|
After a publish-replace cycle the prior generation's emitted assemblies become
|
|
eligible for GC; the reclaim is GC-timing-sensitive (Unload is
|
|
*eligible-for-collection*, not synchronous) and the next collection cycle reclaims
|
|
them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry
|
|
via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and
|
|
`ForbiddenTypeAnalyzer` is the sole security gate (consistent with the
|
|
Core.Scripting-001 / -002 model). `docs/VirtualTags.md` "Compile cache" section rewritten;
|
|
`docs/ScriptedAlarms.md` cross-reference updated to drop the obsolete restart guidance.
|
|
Regression tests added in `CompiledScriptCacheTests`:
|
|
`Dispose_unloads_compiled_script_assembly_load_context`,
|
|
`Clear_disposes_every_materialised_evaluator`, and
|
|
`GetOrCompile_after_Dispose_throws_ObjectDisposedException`; the first two
|
|
prove ALC unload via `WeakReference` + bounded `GC.Collect()` loops. Suite now 104
|
|
green (was 101). Authoring convention: the synthesized wrapper is an ordinary
|
|
C# static method, so scripts must end with explicit `return …;` per ordinary C# rules
|
|
(the legacy `CSharpScript` "last expression yields result" shorthand no longer applies);
|
|
every script in the existing corpus already uses explicit `return` so this is a doc-only
|
|
change for new authors.
|
|
|
|
### Core.Scripting-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `ForbiddenTypeAnalyzer.cs:45` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Phase 7 plan decision #6
|
|
(`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden
|
|
surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually
|
|
denies a broader set — `System.Threading.Thread`, `System.Runtime.InteropServices`, and
|
|
`Microsoft.Win32` (registry) — which is sensible hardening but is undocumented in the plan
|
|
and in `docs/ScriptedAlarms.md` (which defers sandbox rules to `VirtualTags.md`). An
|
|
operator reading the design docs cannot predict that a registry or interop reference will
|
|
be rejected. Conversely the plan does not record the `System.Environment` /
|
|
`System.Diagnostics` decisions. The code and the design document have drifted.
|
|
|
|
**Recommendation:** Update the plan's decision #6 (or `docs/VirtualTags.md`) to list the
|
|
authoritative deny-list exactly as `ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes`
|
|
defines it, including the `System.Environment` allowed-compromise, so the docs match the
|
|
code.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — `docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 row + the "Sandbox escape" compliance-check row now enumerate the authoritative deny-list exactly as `ForbiddenTypeAnalyzer` defines it (namespace-prefix denies for `System.IO` / `System.Net` / `System.Diagnostics` / `System.Reflection` / `System.Threading.Tasks` / `System.Runtime.InteropServices` / `Microsoft.Win32`; type-granular denies for `System.Environment` / `System.AppDomain` / `System.GC` / `System.Activator` / `System.Threading.Thread`), and the compliance-check row lists the syntactic vectors (`typeof` / generic arg / cast / `is`/`as` / `default(T)` / array element / declared local) the broadened analyzer covers. `docs/VirtualTags.md` already documents the same list and is unchanged.
|
|
|
|
### Core.Scripting-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The sandbox-escape test suite covers only the four obvious vectors
|
|
(File / Http / Process / Reflection) as direct member-access calls. It does not test:
|
|
`typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to
|
|
forbidden types, `System.Environment.Exit` / `FailFast`, `System.Threading.Thread`,
|
|
`System.Runtime.InteropServices`, `Microsoft.Win32` registry access, `Activator`, or
|
|
`System.AppDomain`. Given that the analyzer is the sole security boundary for in-process
|
|
untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went
|
|
undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated
|
|
"sandbox escape tests" but the implemented set is materially narrower than the threat
|
|
surface.
|
|
|
|
**Recommendation:** Add a parameterised escape-test covering every node form in
|
|
Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
|
|
assert a `ScriptSandboxViolationException` (or `CompilationErrorException`) at compile.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — added `ScriptSandboxTests` cases for `System.Threading.Thread`, `System.Threading.Tasks.Task.Run`, `System.Runtime.InteropServices.Marshal`, and `Microsoft.Win32.Registry` (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.
|
|
|
|
### Core.Scripting-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Two source files have no direct test coverage: `ScriptContext`
|
|
(`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and
|
|
not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the
|
|
`ArgumentNullException` / `ArgumentException` guards on `contextType` at
|
|
`ScriptSandbox.cs:45-48` are never asserted). `ScriptLogCompanionSink` and
|
|
`ScriptLoggerFactory` have tests, but there is no test that a script's `ctx.Logger` Error
|
|
emission surfaces via the companion sink end-to-end (factory + sink integration is
|
|
untested). These are minor gaps but leave guard clauses and the logging integration
|
|
unverified.
|
|
|
|
**Recommendation:** Add unit tests for `ScriptSandbox.Build` argument validation, for
|
|
`ScriptContext.Deadband` at and around the tolerance boundary, and an end-to-end test that
|
|
a script logging at Error level produces both a `scripts-*.log` event and a companion
|
|
Warning event.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — added three new test files: `ScriptSandboxBuildTests` covers the `Build` null / non-`ScriptContext` / base-class / concrete-subclass paths; `ScriptContextTests` locks `Deadband` boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new `Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs` test in `ScriptLogCompanionSinkTests` wires `ScriptLoggerFactory` + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with `ScriptName`. Suite now 101 green (was 85 before).
|
|
|
|
### Core.Scripting-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Location | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Core.Scripting-008 rewrite broadened the BCL references list
|
|
from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the
|
|
full `TRUSTED_PLATFORM_ASSEMBLIES` set filtered to `System.*` + `netstandard` +
|
|
`Microsoft.Win32.Registry`. This change correctly delegates the security gate to
|
|
`ForbiddenTypeAnalyzer` (the new comment in `ScriptSandbox` calls this out
|
|
explicitly), but the analyzer's deny-list has not been expanded to match the new
|
|
attack surface, and three categories of dangerous BCL types in the `System.*`
|
|
allow-listed assemblies are now reachable from script source:
|
|
|
|
1. **`System.Threading.ThreadPool`** (in namespace `System.Threading`). The
|
|
Core.Scripting-003 fix added `System.Threading.Tasks` to deny `Task.Run` /
|
|
`Parallel` fan-out because background work that outlives the per-evaluation
|
|
timeout is the explicit threat. `ThreadPool.QueueUserWorkItem`,
|
|
`ThreadPool.UnsafeQueueUserWorkItem`, and `ThreadPool.RegisterWaitForSingleObject`
|
|
are exactly the same threat — they schedule background work that outlives the
|
|
`WaitAsync(Timeout)` budget and tie up worker threads — but `System.Threading`
|
|
itself is allowed (because `CancellationToken` / `SemaphoreSlim` / `Volatile`
|
|
live there). The Core.Scripting-003 resolution is incomplete on the new
|
|
reference surface.
|
|
2. **`System.Threading.Timer`** (same namespace). Schedules a background
|
|
callback; the script returns control to the engine but the timer keeps
|
|
firing past the evaluation budget. Same threat as `Task.Run`.
|
|
3. **`System.Runtime.Loader.AssemblyLoadContext`** (in namespace
|
|
`System.Runtime.Loader`, which is not denied — only `System.Runtime.InteropServices`
|
|
is). The constructor + `LoadFromAssemblyPath` / `LoadFromStream` /
|
|
`LoadFromAssemblyName` let a script load an arbitrary DLL into the host
|
|
process. Pass (1) of the analyzer resolves the receiver type
|
|
(`AssemblyLoadContext`, allowed) + the invocation symbol's containing type
|
|
(also `AssemblyLoadContext`, allowed) and lets the call through. Pass (2)
|
|
only inspects `TypeSyntax` nodes — if the script discards the returned
|
|
`Assembly` (e.g. `alc.LoadFromAssemblyPath(@"C:\evil.dll");`) there is no
|
|
`TypeSyntax` for the analyzer to walk and the call is accepted. Triggering
|
|
execution of the loaded code from inside the sandbox is hard (most of
|
|
`Assembly`'s surface is in `System.Reflection`, which is denied) but the
|
|
defense-in-depth gap is real: an attacker who can author a script also
|
|
typically controls a file path on the server (Admin UI uploads, share
|
|
mounts) and loading an assembly is the prerequisite to every chained
|
|
escape — module initializers, type-resolve handlers, and a future analyzer
|
|
slip would all become exploitable.
|
|
|
|
In addition, two lower-impact `System.*` types are reachable that arguably
|
|
shouldn't be: **`System.Console.SetOut`** / **`Console.SetError`** could
|
|
redirect the host's console streams (requires constructing a
|
|
`System.IO.TextWriter`, which is blocked, so the practical exploit is
|
|
`Console.WriteLine` log-spam only), and **`System.Globalization.CultureInfo.DefaultThreadCurrentCulture`**
|
|
could perturb the entire process's formatting behavior (subtle but real cross-script
|
|
side effect).
|
|
|
|
The original Core.Scripting-001 finding called out the model: when an allow-listed
|
|
namespace contains dangerous types, those types must be denied type-granularly.
|
|
The new reference surface introduces several more such types and the deny-list
|
|
has not been kept in sync.
|
|
|
|
**Recommendation:** Add `System.Threading.ThreadPool` and `System.Threading.Timer`
|
|
to `ForbiddenFullTypeNames`. Add `System.Runtime.Loader` as a namespace prefix
|
|
to `ForbiddenNamespacePrefixes` (every type in `System.Runtime.Loader` —
|
|
`AssemblyLoadContext`, `AssemblyDependencyResolver`, `AssemblyLoadEventArgs` — is
|
|
out of script scope). Consider adding `System.Console` to `ForbiddenFullTypeNames`
|
|
to stop log-spam through the host's console streams, and at minimum document
|
|
`CultureInfo.DefaultThreadCurrentCulture` as an accepted cross-script side
|
|
effect. Each addition must have a regression test in `ScriptSandboxTests`
|
|
mirroring the Core.Scripting-010 vector style. Update
|
|
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 + the
|
|
"Sandbox escape" compliance-check row to enumerate the additions, per the
|
|
Core.Scripting-009 doc-sync convention.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — added `System.Runtime.Loader` to
|
|
`ForbiddenNamespacePrefixes` (the namespace-prefix form preferred over
|
|
type-granular per the recommendation; future BCL additions to that namespace
|
|
are denied by default). Added `System.Threading.ThreadPool` and
|
|
`System.Threading.Timer` to `ForbiddenFullTypeNames` — both live in
|
|
`System.Threading` shared with allowed sync primitives so they must be
|
|
type-granular. Regression tests added to `ScriptSandboxTests`:
|
|
`Rejects_ThreadPool_QueueUserWorkItem_at_compile`,
|
|
`Rejects_Timer_new_at_compile`, `Rejects_AssemblyLoadContext_at_compile`.
|
|
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 +
|
|
the Sandbox-escape compliance-check row both updated per the
|
|
Core.Scripting-009 doc-sync convention. The two lower-impact suggestions
|
|
from the recommendation (`System.Console`, `CultureInfo.DefaultThreadCurrentCulture`)
|
|
were intentionally not addressed: `Console.SetOut` requires constructing
|
|
a `System.IO.TextWriter` which is already blocked, leaving only
|
|
`Console.WriteLine` log-spam (annoyance, not a security threat); and
|
|
`CultureInfo.DefaultThreadCurrentCulture` is a cross-script side-effect
|
|
worth knowing about but doesn't escape the sandbox. Recording both as
|
|
accepted minor risks. Test totals after fix: Core.Scripting 107 green
|
|
(was 104 — +3 new rejection tests).
|
|
|
|
### Core.Scripting-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The synthesized wrapper pastes the user's source verbatim
|
|
between `{` and `}` braces inside a static method body, with a `#line 1`
|
|
directive and no escaping. The legacy `CSharpScript.CreateDelegate` path was
|
|
robust to this because Roslyn's scripting compiler parses script source as a
|
|
top-level statement sequence; the new hand-rolled path is parsing ordinary C# in
|
|
a method body, so a script that injects matching `{` / `}` braces can extend the
|
|
synthesized compilation unit with additional methods, classes, or `#line`
|
|
directives. For example, a script body of
|
|
`return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() {`
|
|
ends the `Run` method early, declares a sibling `Evil` method (and even a
|
|
sibling `CompiledScript2` class) inside the synthesized namespace, then opens an
|
|
unclosed method that consumes the wrapper's trailing `}\n}`. With matching brace
|
|
counts the script parses cleanly and compiles.
|
|
|
|
`ForbiddenTypeAnalyzer` walks every descendant of every syntax tree, so any
|
|
forbidden BCL types named inside the injected methods are still caught — the
|
|
finding is **not** a direct sandbox escape. However:
|
|
|
|
- It silently relaxes the operator-visible authoring contract documented in
|
|
`docs/VirtualTags.md` ("scripts are statement bodies that end with an
|
|
explicit `return …;`") to "scripts can be any compilable C# inside the
|
|
`CompiledScript` namespace" — operators have access to features the design
|
|
did not intend to expose (local types defined as siblings of `Run`, custom
|
|
module initializers via attributes, etc.).
|
|
- A script can embed its own `#line` directives that override the
|
|
`#line 1` we emit just above the user source, producing misleading error
|
|
locations in compiler diagnostics surfaced to the operator.
|
|
- Future hardening that relies on syntactic-shape assumptions (e.g.
|
|
"every script has exactly one method") would silently fail.
|
|
- It widens the analyzer's surface: the analyzer's correctness now depends on
|
|
Pass (2) correctly walking every conceivable C# construct that can name a
|
|
type, including ones a normal script body would never contain
|
|
(`UnmanagedCallersOnly` attribute, function pointer types `delegate*<...>`,
|
|
pattern types, switch arm types, …).
|
|
|
|
**Recommendation:** Either (a) reject scripts whose parsed body contains
|
|
declarations other than statements — walk the wrapper's syntax tree after parse
|
|
and require that the only members of `CompiledScript` are the single `Run`
|
|
method, raising a `CompilationErrorException` if anything else appears — or
|
|
(b) parse the user source independently as a `BlockSyntax` and inject the
|
|
parsed block as the method body via the Roslyn syntax API, which makes
|
|
brace-mismatched / class-injecting source unparseable. Add a regression test
|
|
covering at least the brace-injection vector
|
|
(`return 0; } public static int Evil() { return 0;`).
|
|
|
|
**Resolution:** Resolved 2026-05-23 — took option (a) from the recommendation:
|
|
added an `EnforceSingleRunMember` step to `ScriptEvaluator.Compile` (runs after
|
|
`CSharpSyntaxTree.ParseText` of the synthesized wrapper, before Roslyn
|
|
compile). The check requires exactly one type declaration in the compilation
|
|
unit (the `CompiledScript` class) AND exactly one member on that class (the
|
|
`Run` method). Any deviation — a sibling class, an additional namespace, a
|
|
sibling method or nested type alongside `Run` — throws
|
|
`CompilationErrorException` with diagnostic IDs `LMX001` / `LMX002` and a
|
|
message that names Core.Scripting-013 and points at the offending span. Two
|
|
regression tests added: `Rejects_sibling_method_injection_via_balanced_braces`
|
|
(injects a sibling method via `} public static int Evil() { …`) and
|
|
`Rejects_sibling_class_injection_via_balanced_braces` (injects an entire
|
|
sibling namespace + class). Option (b) (parse the user source independently
|
|
as a `BlockSyntax` and inject via Roslyn syntax API) was considered but the
|
|
parse-and-validate approach is more readable, gives clearer error messages,
|
|
and keeps the wrapper-source generation textual.
|
|
|
|
### Core.Scripting-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `CompiledScriptCache.cs:91-103` (`Clear`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `Clear()` snapshots `_cache.Keys.ToArray()` then iterates,
|
|
calling `TryRemove(key, out var lazy)` on each — the key-only overload, not
|
|
the value-scoped one used in `GetOrCompile`'s catch block. Between the
|
|
snapshot and a given `TryRemove`, a concurrent `GetOrCompile(scriptSource)`
|
|
call that hashes to the same key can re-insert a fresh `Lazy` whose `.Value`
|
|
the caller already retained. The unconditional `TryRemove` then removes that
|
|
fresh `Lazy` and `DisposeLazyIfMaterialised(lazy)` calls `Dispose()` on its
|
|
evaluator — unloading the ALC while the concurrent caller still holds a
|
|
reference to the evaluator and intends to invoke it.
|
|
|
|
This is exactly the race-window pattern the Core.Scripting-006 resolution
|
|
fixed in `GetOrCompile`'s catch block (the test
|
|
`Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` locks it
|
|
there). `Clear()` carries the same shape but uses the older, value-blind
|
|
overload, so the same race that finding-006 addresses is still latent on the
|
|
publish-replace path.
|
|
|
|
In current production wiring `Clear()` is intended for config-publish + tests
|
|
— neither overlaps steady-state evaluation under the documented design — so
|
|
the in-practice impact is low. But the cache is checked in as the
|
|
forward-looking compile cache for the engines (per `Script.SourceHash`'s docs
|
|
and the cache's own remarks); a future wiring that calls `Clear()` from
|
|
publish while evaluations are in flight would dispose live evaluators.
|
|
|
|
**Recommendation:** Replace the snapshot + `TryRemove(key, out var lazy)`
|
|
sequence with an enumeration that captures the `Lazy` reference at snapshot
|
|
time and uses the value-scoped `TryRemove(KeyValuePair<,>)` overload, mirroring
|
|
the Core.Scripting-006 fix:
|
|
|
|
```csharp
|
|
foreach (var entry in _cache.ToArray())
|
|
{
|
|
if (_cache.TryRemove(entry))
|
|
DisposeLazyIfMaterialised(entry.Value);
|
|
}
|
|
```
|
|
|
|
Add a regression test that races `GetOrCompile` against `Clear` and asserts
|
|
the caller's evaluator is still usable.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim:
|
|
replaced `foreach (var key in _cache.Keys.ToArray())` + key-only
|
|
`TryRemove(key, out var lazy)` with `foreach (var entry in _cache.ToArray())` +
|
|
value-scoped `TryRemove(entry)` (the `KeyValuePair<,>` overload). A concurrent
|
|
GetOrCompile re-add between the snapshot and the remove inserts a fresh Lazy
|
|
under the same key; the value-scoped comparison sees the mismatch and leaves
|
|
the fresh entry intact (instead of evicting + disposing the live evaluator
|
|
the concurrent caller still holds). Regression test
|
|
`Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives` added
|
|
to `CompiledScriptCacheTests` — single-threaded simulation that snapshots
|
|
the dict, mutates the entry to a fresh Lazy mid-flight, drives the same
|
|
value-scoped TryRemove overload Clear now uses, and asserts the fresh entry
|
|
survives. The two-thread race would be flaky to model directly; the
|
|
single-threaded semantic test is sufficient because the fix is the
|
|
overload-selection itself.
|
|
|
|
### Core.Scripting-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `ToCSharpTypeName` is documented to handle nested types
|
|
(`Outer+Inner` → `Outer.Inner`) via `Replace('+', '.')` for the
|
|
non-generic path (line 269) but the generic path (line 263-266) constructs the
|
|
name from `def.FullName!` then takes a substring up to the backtick. For a
|
|
**nested generic** type — e.g. `Outer.Inner<T>` whose `FullName` is
|
|
`Outer+Inner`1` — `Replace('+', '.')` is applied first, then `Substring(0, IndexOf('`'))`
|
|
on `"Outer.Inner`1"` produces `"Outer.Inner"`, which is correct. Good.
|
|
|
|
However, the generic branch does NOT handle the case where the OPEN generic
|
|
type itself is nested with `+` inside the parent's name when the parent is
|
|
also generic (`Outer<TOuter>.Inner<TInner>` — `FullName` is
|
|
`Outer`1+Inner`1[[TOuter,TInner]]`). For that shape `Substring(0, IndexOf('`'))`
|
|
truncates at the first backtick — yielding `"Outer.Inner"` — silently dropping
|
|
the closed type arguments of `Outer<TOuter>`. The resulting source string is
|
|
syntactically valid but semantically wrong: `global::Outer.Inner<TInner>` does
|
|
not name `Outer<TOuter>.Inner<TInner>`.
|
|
|
|
The production code never hits this shape — `TResult` is always one of
|
|
`object?`, `bool`, `int`, `double`, `string?`, `DateTime` across the
|
|
virtual-tag engine, the alarm engine, the test-harness, and the test suite,
|
|
and `ScriptGlobals<TContext>` is always a top-level generic over a top-level
|
|
`ScriptContext` subclass. The bug is latent. But it is a foot-gun for a
|
|
future caller (e.g. a Phase-8 driver that wires a context type defined as a
|
|
nested generic for grouping reasons) and the XML-doc comment claims
|
|
"handles nested types" without qualifying it.
|
|
|
|
A second smaller correctness gap on the same path: the comment claims
|
|
`global::`-qualified FQNs prevent accidental capture by the wrapper's `using`
|
|
directives, which is true for the generic / non-generic branches, but the
|
|
primitive aliases (`bool`, `int`, `string`, `object`, …) are emitted unqualified.
|
|
A script that defines a local `class bool` (now possible per Core.Scripting-013)
|
|
would shadow the alias. Probably benign, but worth a comment.
|
|
|
|
**Recommendation:** Add a check in the generic branch that walks the FullName
|
|
backtick-by-backtick — or use `INamedTypeSymbol`-style name composition from
|
|
`def.DeclaringType` recursively — so multi-arity-nested generics emit
|
|
correctly. At minimum update the XML doc to qualify "handles nested types" as
|
|
"handles single-level nesting; nested generics whose parent is itself generic
|
|
are not supported". Add a `ToCSharpTypeName` unit test (currently nothing
|
|
exercises this method directly — coverage relies on the end-to-end compile path,
|
|
so the bug surfaces only as a misleading Roslyn diagnostic).
|
|
|
|
**Resolution:** Resolved 2026-05-23 — rewrote the generic-type branch of
|
|
`ToCSharpTypeName` to walk the `FullName` segment-by-segment (split on `.`
|
|
after `+ → .` substitution). For each segment ending in `Name\`N`, the
|
|
algorithm consumes N generic arguments from `t.GetGenericArguments()` in
|
|
order and emits them as `<…>` on that segment. Nested generic-in-generic
|
|
shapes (`Outer<T>.Inner<U>`) now emit as
|
|
`global::Ns.Outer<T>.Inner<U>` (valid C#) rather than the pre-fix
|
|
`global::Ns.Outer<T, U>` (which dropped the segment boundary entirely
|
|
because `IndexOf('`')` truncated at the first backtick). No production
|
|
caller exercises this shape today (all `TContext` / `TResult` types in
|
|
the codebase are top-level non-nested), so the fix is preemptive — but
|
|
the algorithm is now correct for any future nested-generic context type.
|
|
|
|
### Core.Scripting-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Performance & resource management |
|
|
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The Core.Scripting-008 resolution introduced
|
|
`ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes
|
|
each materialised evaluator before dropping its dictionary entry, so per-publish
|
|
ALC accretion is no longer process-lifetime rooted **inside the cache**. But
|
|
neither production consumer of `ScriptEvaluator` uses the cache — both
|
|
`VirtualTagEngine.Load` and `ScriptedAlarmEngine.LoadAsync` call
|
|
`ScriptEvaluator<TContext, TResult>.Compile(...)` directly (lines 105 / 160
|
|
respectively), store the evaluator inside an internal `VirtualTagState` /
|
|
`AlarmState` record, and on the next `Load` simply call `_tags.Clear()` /
|
|
`_alarms.Clear()`. The dropped `ScriptEvaluator` references never have
|
|
`Dispose()` called on them, so the underlying `ScriptAssemblyLoadContext`
|
|
instances are never `Unload()`-ed. The .NET runtime guarantees that a
|
|
collectible ALC stays alive until `Unload()` is called explicitly — having
|
|
"no strong references" is necessary but not sufficient. So the publish-replace
|
|
cycle leaks every prior generation's emitted assembly exactly as before the
|
|
fix, even though the fix's infrastructure is in place.
|
|
|
|
The Core.Scripting-008 regression tests in `CompiledScriptCacheTests`
|
|
(`Dispose_unloads_compiled_script_assembly_load_context` /
|
|
`Clear_disposes_every_materialised_evaluator`) prove the contract on
|
|
`CompiledScriptCache`, but neither engine uses that class. There is no
|
|
integration test exercising the actual publish path — i.e. that calling
|
|
`VirtualTagEngine.Load(...)` twice with different definitions makes the prior
|
|
generation's ALC eligible for GC. As a result the fix's headline guarantee
|
|
("Server restarts are no longer required to reclaim compiled-script memory" —
|
|
`docs/VirtualTags.md`) is not actually delivered to the production engines.
|
|
|
|
This is the same observable behavior the original Core.Scripting-008 finding
|
|
described, surfacing on a different code path that the resolution did not touch.
|
|
|
|
**Recommendation:** Either route the engines' compile path through
|
|
`CompiledScriptCache<TContext, TResult>` (the documented design — the cache
|
|
already returns the same evaluator instance for identical source, and its
|
|
`Clear()` now performs the right disposal — and `Script.SourceHash`'s doc-comment
|
|
already names this as the cache key), or make the engines' `Load` methods
|
|
dispose the previous `ScriptEvaluator` instances before reassigning. The
|
|
former is the cleaner change because it also collapses redundant compiles
|
|
across publishes for unchanged scripts. Add an integration test along the
|
|
lines of `CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator`
|
|
for each engine: snapshot the per-evaluator emitted assembly via
|
|
`WeakReference`, call `Load(...)` with a different definition set, and assert
|
|
the prior generation's assemblies become collectable.
|
|
|
|
**Resolution:** Resolved 2026-05-23 — took the cleaner route from the
|
|
recommendation: routed both engines' compile paths through
|
|
`CompiledScriptCache<TContext, TResult>`. `VirtualTagEngine` and
|
|
`ScriptedAlarmEngine` each gained a private `_compileCache` instance field,
|
|
their `Load`/`LoadAsync` methods now call `_compileCache.GetOrCompile(source)`
|
|
instead of `ScriptEvaluator.Compile(source)` directly, and the cache is cleared
|
|
on publish-replace alongside the existing `_tags` / `_alarms` clears so the
|
|
prior generation's ALCs are disposed before recompile. Engine `Dispose` now
|
|
also calls `_compileCache.Dispose()` so the engine-shutdown path actually
|
|
releases the emitted assemblies. **Side-fix:** discovered + fixed an
|
|
adjacent bug in `CompiledScriptCache.Dispose()` itself — it set
|
|
`_disposed = true` before calling `Clear()`, but `Clear()`'s pre-existing
|
|
`if (_disposed) return` guard then aborted the drain unconditionally, so
|
|
the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard
|
|
on `Clear()` (the operation is idempotent — clearing an empty/cleared cache
|
|
is safe). Without this side-fix the engine-Dispose path would have left
|
|
the cached evaluators rooted forever even though the call chain looked
|
|
correct. **Side-fix for ScriptedAlarmEngine.Dispose:** moved the pre-existing
|
|
"do NOT clear `_alarms` here" comment to "clear `_alarms` AFTER the drain"
|
|
because the AlarmState records hold the `TimedScriptEvaluator`/`ScriptEvaluator`
|
|
delegates that root the emitted assembly — leaving them in `_alarms` after
|
|
Dispose was the same root-the-script-forever pattern this finding is about,
|
|
just on the engine side rather than the cache side. The `_alarms` clear is
|
|
safe after the `Task.WhenAll` drain because that drain guarantees no
|
|
background callback is mid-flight. Regression tests added:
|
|
`VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly` and
|
|
`ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly` —
|
|
each uses `WeakReference` + bounded `GC.Collect()` to prove the emitted
|
|
assembly is reclaimable after `engine.Dispose()`. **Important test pattern
|
|
detail:** the alarms test originally failed because its helper was
|
|
`async Task<WeakReference>` — async state machines capture locals as
|
|
state-struct fields and can keep them alive past the method's apparent end.
|
|
Rewrote as a synchronous helper using `LoadAsync(...).GetAwaiter().GetResult()`
|
|
inside two cooperating `[MethodImpl(MethodImplOptions.NoInlining)]` helpers
|
|
(`CompileAlarmAndCaptureWeak` + `ExtractEmittedAssemblyWeakRef`) so the
|
|
intermediate reflection locals die when each helper returns. Test totals
|
|
after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was
|
|
56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).
|