Re-reviewed the four modules with source changes since the previous review commit76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md header (date 2026-05-23, commita9be809) and appended new findings under continued numbering. Regenerated README.md. ## New findings — 12 total across 4 modules ### Core.Scripting (5 new, IDs -012 to -016) - **-012 High Security** — broadened BCL references (System.* + netstandard) re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which the analyzer's deny-list doesn't cover. Re-introduces the background-work threat Core.Scripting-003 closed via System.Threading.Tasks deny. - **-013 Medium Security** — hand-rolled wrapper-source generation lets brace-balanced user source inject sibling methods/classes alongside CompiledScript.Run. Analyzer still gates forbidden types, but the documented 'method body' authoring contract is silently relaxed. - **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only TryRemove(key, out _) — the same race the -006 resolution fixed in GetOrCompile's catch is latent here on publish-replace. - **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick; silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>). Latent — no production caller uses this shape today. - **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call ScriptEvaluator.Compile directly without going through CompiledScriptCache, so the headline -008 collectible-ALC fix doesn't run on the actual production path — the per-publish leak is still in effect. ### Core.ScriptedAlarms (1 new, ID -013) - **-013 Low Documentation** — new internal test accessors return the live mutable scratch dictionary; XML docs don't warn future test authors about the synchronisation contract. ### Driver.Cli.Common (2 new, IDs -007, -008) - **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper, both of which use the correct 0x808B0000). 0x80550000 is actually BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy) all use the wrong 0x80550000; this session's SnapshotFormatter extension propagated the wrong name and the test asserts against the same wrong value so CI is blind — same shape of bug as Driver.Cli.Common-001. - **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes Theory is redundant with the existing well-known Theory (same five InlineData rows added to both) and uses weaker ShouldContain assertion than the well-known Theory's ShouldBe. ### Driver.Galaxy (4 new, IDs -015 to -018) - **-015 Medium Security** — vendored DLLs (libs/) have no recorded provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256 checksum in libs/README.md. Tampering / accidental swap undetectable. - **-016 Medium Performance** — version skew between declared PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 / Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client 2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now (assembly-version refs are loose) but precise shape that produces a runtime MissingMethodException. - **-017 Low Design** — no contract-version handshake between the driver and the gateway; proto could evolve under the gateway without the driver noticing. - **-018 Low Documentation** — libs/README.md points at the wrong sibling csproj as the version source-of-truth; missing SpecificVersion=false on the Reference items; missing mxaccessgw source-commit SHA. ## Particularly notable Two findings undercut commits from this session: - Driver.Cli.Common-007 invalidates commit5a9c459(which named 0x80550000 as BadDeviceFailure across the cross-CLI shortlist). - Core.Scripting-016 invalidates the production effect of commit7b6ab2e(the collectible-ALC fix wired Dispose only via CompiledScriptCache, which the engines don't use). The wider native-mapper miscoding behind -007 also affects three driver modules outside this session's edit scope (FocasStatusMapper, AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
39 KiB
Code Review — Core.Scripting
| Field | Value |
|---|---|
| Module | src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting |
| Reviewer | Claude Code |
| Review date | 2026-05-23 |
| Commit reviewed | a9be809 |
| Status | Reviewed |
| Open findings | 5 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
The 2026-05-23 re-review only covers code touched between commits 76d35d1 and
a9be809 (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL
references). Categories where the new code surface produced no issues are
recorded as "No new issues" for that pass.
| # | Category | Result (76d35d1) |
Result (a9be809, new code only) |
|---|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 | Core.Scripting-015 |
| 2 | OtOpcUa conventions | No issues found | No new issues |
| 3 | Concurrency & thread safety | Core.Scripting-006 | Core.Scripting-014 |
| 4 | Error handling & resilience | Core.Scripting-007 | No new issues |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 | Core.Scripting-012, Core.Scripting-013 |
| 6 | Performance & resource management | Core.Scripting-008 | Core.Scripting-016 |
| 7 | Design-document adherence | Core.Scripting-009 | No new issues |
| 8 | Code organization & conventions | No issues found | No new issues |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 | No new issues |
| 10 | Documentation & comments | No issues found | No new issues |
Findings
Core.Scripting-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Security |
| Location | ForbiddenTypeAnalyzer.cs:45, ScriptSandbox.cs:54 |
| Status | Resolved |
Description: System.Environment lives in the allowed System namespace (it is in
System.Private.CoreLib, which is allow-listed for primitives) and is not on the
forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling
System.Environment.Exit(0) or System.Environment.FailFast("..."). Both terminate the
host process immediately. Because scripted-alarm predicates and virtual-tag scripts run
in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10
server process"), a single malicious or buggy predicate brings down the entire server —
an outage affecting every connected client and every driver. ScriptSandboxTests only
pins the read path (Environment.GetEnvironmentVariable) as an accepted compromise; the
process-killing members are not considered. The whole-process kill far exceeds the
"read-only process state" justification the test comments rely on.
Recommendation: The forbidden surface must be member-granular, not namespace-granular,
for types in allowed namespaces. Add an explicit forbidden-member deny-list to
ForbiddenTypeAnalyzer covering at minimum System.Environment.Exit,
System.Environment.FailFast, System.AppDomain, System.GC (e.g. GC.Collect,
GC.AddMemoryPressure), and System.Activator.CreateInstance (a reflection-equivalent
escape). Reject these in CheckSymbol by resolved method symbol, with a test for each.
Resolution: Resolved 2026-05-22 — added a type-granular ForbiddenFullTypeNames
deny-list (System.Environment, System.AppDomain, System.GC, System.Activator) to
ForbiddenTypeAnalyzer; CheckSymbol now rejects any resolved type symbol whose
fully-qualified name matches, alongside the existing namespace-prefix check, so dangerous
System-namespace process-control types are blocked at compile while legitimate System
types (Math, String, …) stay usable. Regression tests added in ScriptSandboxTests for
Environment.Exit / Environment.FailFast / Environment.GetEnvironmentVariable /
AppDomain / GC.Collect / Activator.CreateInstance.
Core.Scripting-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | ForbiddenTypeAnalyzer.cs:70 |
| Status | Resolved |
Description: The syntax walker only inspects four node kinds:
ObjectCreationExpressionSyntax, InvocationExpressionSyntax with a member-access target,
MemberAccessExpressionSyntax, and bare IdentifierNameSyntax. It never visits
TypeOfExpressionSyntax, generic type-argument lists (GenericNameSyntax /
TypeArgumentListSyntax), cast expressions (CastExpressionSyntax), is/as type
patterns, default(T) expressions, array-creation element types, or using/local
declared types. A script such as typeof(System.IO.File),
new System.Collections.Generic.List<System.IO.FileInfo>(),
(System.IO.Stream)null, or default(System.Reflection.Assembly) references a forbidden
type without ever producing a node the walker examines, so the forbidden-type check is
bypassed. The Phase 7 plan A.6 explicitly calls out typeof as a sandbox-escape attempt
that "must fail at compile" — it currently does not.
Recommendation: Walk every TypeSyntax node (handle TypeOfExpressionSyntax,
CastExpressionSyntax, generic argument lists, and the type operand of
IsPatternExpression / binary as). The simplest robust fix is to enumerate all
DescendantNodes() and, for any node, resolve both GetSymbolInfo and GetTypeInfo,
then check the resolved type plus every type argument. Add tests covering typeof,
generic arguments, casts, and default(T) with forbidden types.
Resolution: Resolved 2026-05-22 — ForbiddenTypeAnalyzer.Analyze now runs a second
pass that resolves GetTypeInfo on every TypeSyntax node and recursively unwraps array
element types and generic type arguments, so forbidden types named via typeof, generic
arguments (List<FileInfo>), casts, is/as patterns, default(T), array-creation
element types, and explicitly-typed local declarations are all rejected at compile. The
original member/call node-kind switch is kept (deliberately narrow to avoid flagging
inherited members such as typeof(int).Name), and a span+type dedupe prevents duplicate
rejections from the two passes. Regression tests added in ScriptSandboxTests for each
node form plus over-block guards for allowed generics/typeof.
Core.Scripting-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | TimedScriptEvaluator.cs:9, ScriptSandbox.cs:30 |
| Status | Resolved |
Description: There is no bound on memory a script may allocate or on the number of
threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
concern" deferred to v3, but in-process execution means a script doing
new byte[int.MaxValue] repeatedly (or Enumerable.Range(0,int.MaxValue).ToList() — LINQ
is allow-listed) can drive the whole server to OutOfMemoryException, an outage. The
timeout does not help: the allocation can exhaust memory well before 250ms elapses, and
the orphaned thread-pool thread documented in TimedScriptEvaluator keeps the allocation
rooted. System.Threading.Tasks is not on the deny-list, so a script can also
Task.Run an unbounded fan-out of background work that outlives the timeout entirely.
Recommendation: At minimum, document this as a known accepted risk in
docs/ScriptedAlarms.md / docs/VirtualTags.md rather than only in a code comment, and
add the Task/Parallel namespaces to the forbidden list (scripts are synchronous
predicates — they have no legitimate need to start background tasks). For memory, gate
script authoring behind an Admin permission and treat the test-harness preview as the
control point, or track an explicit v3 issue for out-of-process execution. Record the
decision so it is not silently lost.
Resolution: Resolved 2026-05-22 — added System.Threading.Tasks to ForbiddenNamespacePrefixes (blocking Task.Run / Parallel fan-out); documented the unbounded-memory accepted risk and the Task denial rationale in docs/VirtualTags.md (new "Known resource limits" subsection) and cross-referenced from docs/ScriptedAlarms.md.
Core.Scripting-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | DependencyExtractor.cs:73 |
| Status | Resolved |
Description: The walker matches tag-access calls purely by spelling — any
InvocationExpressionSyntax whose member name is GetTag or SetVirtualTag is treated as
a ScriptContext tag access, regardless of the receiver. A script that defines a local
type with a GetTag(string) method and calls other.GetTag("X"), or calls
this.GetTag(...) on a script-defined helper, has spurious dependencies harvested (or, if
the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long
as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not
check that the receiver is the ctx identifier — it accepts any member access with the
matching name. The DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag test
passes only because the helper there is a free function (not member-access form); a
member-access call to a non-ctx GetTag is untested and would be misattributed.
Recommendation: In VisitInvocationExpression, additionally require that
member.Expression is an IdentifierNameSyntax with Identifier.ValueText == "ctx"
(matching the ScriptGlobals<TContext>.ctx field name). Add a test for
someOtherObject.GetTag("X") asserting it is ignored.
Resolution: Resolved 2026-05-22 — VisitInvocationExpression now additionally checks that member.Expression is an IdentifierNameSyntax with ValueText == "ctx" before treating the call as a dependency; test Ignores_member_access_GetTag_on_non_ctx_receiver added to DependencyExtractorTests.
Core.Scripting-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | DependencyExtractor.cs:97 |
| Status | Resolved |
Description: A raw string literal token passed as the tag path (a raw triple-quote
literal) tokenizes as SingleLineRawStringLiteralToken /
MultiLineRawStringLiteralToken, not StringLiteralToken. The check
literal.Token.IsKind(SyntaxKind.StringLiteralToken) therefore rejects an
otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading
rejection message. This is an edge case (operators rarely write raw strings for tag
paths) but the error text would confuse anyone who does.
Recommendation: Accept all string-literal token kinds — check
literal.IsKind(SyntaxKind.StringLiteralExpression) on the expression node, or include
the raw-string token kinds, so a static raw string is harvested rather than rejected.
Resolution: Resolved 2026-05-23 — HandleTagCall now checks literal.IsKind(SyntaxKind.StringLiteralExpression) on the expression node, which covers regular string literals, single-line raw strings, and multi-line raw strings uniformly. Regression tests Accepts_single_line_raw_string_literal_path and Accepts_multi_line_raw_string_literal_path added to DependencyExtractorTests.
Core.Scripting-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | CompiledScriptCache.cs:55 |
| Status | Resolved |
Description: On a failed compile the catch block calls
_cache.TryRemove(key, out _) without a value comparison. If two threads race a miss for
the same bad source, both observe the same faulted Lazy and throw, and both call
TryRemove(key). If a concurrent retry re-adds a new Lazy for that key between the two
removals, the second unconditional TryRemove could evict the in-flight retry entry. The
window is small and the consequence is only a redundant recompile, so severity is Low —
but the removal should be key+value scoped for correctness.
Recommendation: Use the ConcurrentDictionary.TryRemove(KeyValuePair<,>) overload to
remove only the specific faulted Lazy instance, so a concurrently re-added entry is not
evicted.
Resolution: Resolved 2026-05-23 — GetOrCompile's catch block now evicts via _cache.TryRemove(new KeyValuePair<string, Lazy<…>>(key, lazy)), comparing the value reference so only the faulted Lazy is removed; a concurrent retry that re-inserted a fresh Lazy under the same key is preserved. Regression test Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry added to CompiledScriptCacheTests (reflection-driven deterministic race: the faulted Lazy's factory swaps the dictionary entry to a fresh Lazy as a side effect of its throw, modelling the precise race window).
Core.Scripting-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | TimedScriptEvaluator.cs:60 |
| Status | Resolved |
Description: RunAsync wraps the inner run in Task.Run(...) and then awaits
WaitAsync(Timeout, ct). If the caller-supplied ct cancels at roughly the same time the
timeout elapses, the order in which WaitAsync observes the timeout vs. the cancellation
is non-deterministic, so the same shutdown can sometimes surface as
ScriptTimeoutException and sometimes as OperationCanceledException. The class docs
assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown
path depends on to avoid misclassifying shutdown as a script fault — but the
implementation does not guarantee it when both fire close together.
Recommendation: After catching TimeoutException, check ct.IsCancellationRequested
and throw OperationCanceledException(ct) instead of ScriptTimeoutException when the
caller's token is cancelled, so caller cancellation deterministically wins regardless of
race ordering.
Resolution: Resolved 2026-05-22 — in the catch (TimeoutException) handler, ct.IsCancellationRequested is now checked and OperationCanceledException(ct) thrown before ScriptTimeoutException, so caller cancellation deterministically wins regardless of race ordering; regression test Caller_cancellation_wins_even_when_timeout_fires_first added to TimedScriptEvaluatorTests.
Core.Scripting-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | CompiledScriptCache.cs:34, ScriptEvaluator.cs:34 |
| Status | Resolved |
Description: CompiledScriptCache has no capacity bound (acknowledged in the class
remarks) and no eviction. Each cached ScriptEvaluator holds a Roslyn ScriptRunner<T>
delegate, which keeps the dynamically emitted script assembly loaded for the process
lifetime — emitted assemblies in the default AssemblyLoadContext cannot be unloaded.
Clear() drops the dictionary entries but does not unload the emitted assemblies;
they leak. Across many config-generation publishes (each Clear() followed by recompiling
every script), the process accumulates dead script assemblies. For the expected "low
thousands" of scripts this is benign, but a long-running server with frequent publishes
will see steady managed-memory growth that never returns.
Recommendation: Document the per-publish assembly accretion as a known limitation, or
compile scripts into a collectible AssemblyLoadContext so Clear() can unload prior
generations. At minimum add a note to docs/ScriptedAlarms.md so operators with
high-publish-frequency deployments are aware.
Resolution: Resolved 2026-05-23 — switched the compile pipeline off the legacy
CSharpScript.CreateDelegate path (which emits into the default, non-collectible
AssemblyLoadContext) and onto a hand-rolled CSharpCompilation →
Compilation.Emit(MemoryStream) → ScriptAssemblyLoadContext.LoadFromStream chain,
with the new ScriptAssemblyLoadContext constructed isCollectible: true. Each
compiled script lives in its own ALC; ScriptEvaluator now implements IDisposable
and calls AssemblyLoadContext.Unload() on dispose. CompiledScriptCache.Clear()
disposes every materialised evaluator before dropping its dictionary entry, and
CompiledScriptCache itself is now IDisposable for graceful server shutdown.
After a publish-replace cycle the prior generation's emitted assemblies become
eligible for GC; the reclaim is GC-timing-sensitive (Unload is
eligible-for-collection, not synchronous) and the next collection cycle reclaims
them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry
via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and
ForbiddenTypeAnalyzer is the sole security gate (consistent with the
Core.Scripting-001 / -002 model). docs/VirtualTags.md "Compile cache" section rewritten;
docs/ScriptedAlarms.md cross-reference updated to drop the obsolete restart guidance.
Regression tests added in CompiledScriptCacheTests:
Dispose_unloads_compiled_script_assembly_load_context,
Clear_disposes_every_materialised_evaluator, and
GetOrCompile_after_Dispose_throws_ObjectDisposedException; the first two
prove ALC unload via WeakReference + bounded GC.Collect() loops. Suite now 104
green (was 101). Authoring convention: the synthesized wrapper is an ordinary
C# static method, so scripts must end with explicit return …; per ordinary C# rules
(the legacy CSharpScript "last expression yields result" shorthand no longer applies);
every script in the existing corpus already uses explicit return so this is a doc-only
change for new authors.
Core.Scripting-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | ForbiddenTypeAnalyzer.cs:45 |
| Status | Resolved |
Description: The Phase 7 plan decision #6
(docs/v2/implementation/phase-7-scripting-and-alarming.md) enumerates the forbidden
surface as "No HttpClient / File / Process / reflection". ForbiddenTypeAnalyzer actually
denies a broader set — System.Threading.Thread, System.Runtime.InteropServices, and
Microsoft.Win32 (registry) — which is sensible hardening but is undocumented in the plan
and in docs/ScriptedAlarms.md (which defers sandbox rules to VirtualTags.md). An
operator reading the design docs cannot predict that a registry or interop reference will
be rejected. Conversely the plan does not record the System.Environment /
System.Diagnostics decisions. The code and the design document have drifted.
Recommendation: Update the plan's decision #6 (or docs/VirtualTags.md) to list the
authoritative deny-list exactly as ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes
defines it, including the System.Environment allowed-compromise, so the docs match the
code.
Resolution: Resolved 2026-05-23 — docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6 row + the "Sandbox escape" compliance-check row now enumerate the authoritative deny-list exactly as ForbiddenTypeAnalyzer defines it (namespace-prefix denies for System.IO / System.Net / System.Diagnostics / System.Reflection / System.Threading.Tasks / System.Runtime.InteropServices / Microsoft.Win32; type-granular denies for System.Environment / System.AppDomain / System.GC / System.Activator / System.Threading.Thread), and the compliance-check row lists the syntactic vectors (typeof / generic arg / cast / is/as / default(T) / array element / declared local) the broadened analyzer covers. docs/VirtualTags.md already documents the same list and is unchanged.
Core.Scripting-010
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54 |
| Status | Resolved |
Description: The sandbox-escape test suite covers only the four obvious vectors
(File / Http / Process / Reflection) as direct member-access calls. It does not test:
typeof(forbidden), generic type arguments (List<FileInfo>), cast expressions to
forbidden types, System.Environment.Exit / FailFast, System.Threading.Thread,
System.Runtime.InteropServices, Microsoft.Win32 registry access, Activator, or
System.AppDomain. Given that the analyzer is the sole security boundary for in-process
untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went
undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated
"sandbox escape tests" but the implemented set is materially narrower than the threat
surface.
Recommendation: Add a parameterised escape-test covering every node form in
Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
assert a ScriptSandboxViolationException (or CompilationErrorException) at compile.
Resolution: Resolved 2026-05-22 — added ScriptSandboxTests cases for System.Threading.Thread, System.Threading.Tasks.Task.Run, System.Runtime.InteropServices.Marshal, and Microsoft.Win32.Registry (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.
Core.Scripting-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ |
| Status | Resolved |
Description: Two source files have no direct test coverage: ScriptContext
(Deadband static helper is exercised only indirectly through ScriptSandboxTests, and
not for its boundary tolerance behaviour) and ScriptSandbox.Build itself (the
ArgumentNullException / ArgumentException guards on contextType at
ScriptSandbox.cs:45-48 are never asserted). ScriptLogCompanionSink and
ScriptLoggerFactory have tests, but there is no test that a script's ctx.Logger Error
emission surfaces via the companion sink end-to-end (factory + sink integration is
untested). These are minor gaps but leave guard clauses and the logging integration
unverified.
Recommendation: Add unit tests for ScriptSandbox.Build argument validation, for
ScriptContext.Deadband at and around the tolerance boundary, and an end-to-end test that
a script logging at Error level produces both a scripts-*.log event and a companion
Warning event.
Resolution: Resolved 2026-05-23 — added three new test files: ScriptSandboxBuildTests covers the Build null / non-ScriptContext / base-class / concrete-subclass paths; ScriptContextTests locks Deadband boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs test in ScriptLogCompanionSinkTests wires ScriptLoggerFactory + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with ScriptName. Suite now 101 green (was 85 before).
Core.Scripting-012
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | ForbiddenTypeAnalyzer.cs:60-76, ScriptSandbox.cs:96-126 |
| Status | Open |
Description: The Core.Scripting-008 rewrite broadened the BCL references list
from a narrow allow-list (System.Private.CoreLib + System.Linq only) to the
full TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
Microsoft.Win32.Registry. This change correctly delegates the security gate to
ForbiddenTypeAnalyzer (the new comment in ScriptSandbox calls this out
explicitly), but the analyzer's deny-list has not been expanded to match the new
attack surface, and three categories of dangerous BCL types in the System.*
allow-listed assemblies are now reachable from script source:
System.Threading.ThreadPool(in namespaceSystem.Threading). The Core.Scripting-003 fix addedSystem.Threading.Tasksto denyTask.Run/Parallelfan-out because background work that outlives the per-evaluation timeout is the explicit threat.ThreadPool.QueueUserWorkItem,ThreadPool.UnsafeQueueUserWorkItem, andThreadPool.RegisterWaitForSingleObjectare exactly the same threat — they schedule background work that outlives theWaitAsync(Timeout)budget and tie up worker threads — butSystem.Threadingitself is allowed (becauseCancellationToken/SemaphoreSlim/Volatilelive there). The Core.Scripting-003 resolution is incomplete on the new reference surface.System.Threading.Timer(same namespace). Schedules a background callback; the script returns control to the engine but the timer keeps firing past the evaluation budget. Same threat asTask.Run.System.Runtime.Loader.AssemblyLoadContext(in namespaceSystem.Runtime.Loader, which is not denied — onlySystem.Runtime.InteropServicesis). The constructor +LoadFromAssemblyPath/LoadFromStream/LoadFromAssemblyNamelet a script load an arbitrary DLL into the host process. Pass (1) of the analyzer resolves the receiver type (AssemblyLoadContext, allowed) + the invocation symbol's containing type (alsoAssemblyLoadContext, allowed) and lets the call through. Pass (2) only inspectsTypeSyntaxnodes — if the script discards the returnedAssembly(e.g.alc.LoadFromAssemblyPath(@"C:\evil.dll");) there is noTypeSyntaxfor the analyzer to walk and the call is accepted. Triggering execution of the loaded code from inside the sandbox is hard (most ofAssembly's surface is inSystem.Reflection, which is denied) but the defense-in-depth gap is real: an attacker who can author a script also typically controls a file path on the server (Admin UI uploads, share mounts) and loading an assembly is the prerequisite to every chained escape — module initializers, type-resolve handlers, and a future analyzer slip would all become exploitable.
In addition, two lower-impact System.* types are reachable that arguably
shouldn't be: System.Console.SetOut / Console.SetError could
redirect the host's console streams (requires constructing a
System.IO.TextWriter, which is blocked, so the practical exploit is
Console.WriteLine log-spam only), and System.Globalization.CultureInfo.DefaultThreadCurrentCulture
could perturb the entire process's formatting behavior (subtle but real cross-script
side effect).
The original Core.Scripting-001 finding called out the model: when an allow-listed namespace contains dangerous types, those types must be denied type-granularly. The new reference surface introduces several more such types and the deny-list has not been kept in sync.
Recommendation: Add System.Threading.ThreadPool and System.Threading.Timer
to ForbiddenFullTypeNames. Add System.Runtime.Loader as a namespace prefix
to ForbiddenNamespacePrefixes (every type in System.Runtime.Loader —
AssemblyLoadContext, AssemblyDependencyResolver, AssemblyLoadEventArgs — is
out of script scope). Consider adding System.Console to ForbiddenFullTypeNames
to stop log-spam through the host's console streams, and at minimum document
CultureInfo.DefaultThreadCurrentCulture as an accepted cross-script side
effect. Each addition must have a regression test in ScriptSandboxTests
mirroring the Core.Scripting-010 vector style. Update
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6 + the
"Sandbox escape" compliance-check row to enumerate the additions, per the
Core.Scripting-009 doc-sync convention.
Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)
Core.Scripting-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | ScriptEvaluator.cs:202-225 (BuildWrapperSource) |
| Status | Open |
Description: The synthesized wrapper pastes the user's source verbatim
between { and } braces inside a static method body, with a #line 1
directive and no escaping. The legacy CSharpScript.CreateDelegate path was
robust to this because Roslyn's scripting compiler parses script source as a
top-level statement sequence; the new hand-rolled path is parsing ordinary C# in
a method body, so a script that injects matching { / } braces can extend the
synthesized compilation unit with additional methods, classes, or #line
directives. For example, a script body of
return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() {
ends the Run method early, declares a sibling Evil method (and even a
sibling CompiledScript2 class) inside the synthesized namespace, then opens an
unclosed method that consumes the wrapper's trailing }\n}. With matching brace
counts the script parses cleanly and compiles.
ForbiddenTypeAnalyzer walks every descendant of every syntax tree, so any
forbidden BCL types named inside the injected methods are still caught — the
finding is not a direct sandbox escape. However:
- It silently relaxes the operator-visible authoring contract documented in
docs/VirtualTags.md("scripts are statement bodies that end with an explicitreturn …;") to "scripts can be any compilable C# inside theCompiledScriptnamespace" — operators have access to features the design did not intend to expose (local types defined as siblings ofRun, custom module initializers via attributes, etc.). - A script can embed its own
#linedirectives that override the#line 1we emit just above the user source, producing misleading error locations in compiler diagnostics surfaced to the operator. - Future hardening that relies on syntactic-shape assumptions (e.g. "every script has exactly one method") would silently fail.
- It widens the analyzer's surface: the analyzer's correctness now depends on
Pass (2) correctly walking every conceivable C# construct that can name a
type, including ones a normal script body would never contain
(
UnmanagedCallersOnlyattribute, function pointer typesdelegate*<...>, pattern types, switch arm types, …).
Recommendation: Either (a) reject scripts whose parsed body contains
declarations other than statements — walk the wrapper's syntax tree after parse
and require that the only members of CompiledScript are the single Run
method, raising a CompilationErrorException if anything else appears — or
(b) parse the user source independently as a BlockSyntax and inject the
parsed block as the method body via the Roslyn syntax API, which makes
brace-mismatched / class-injecting source unparseable. Add a regression test
covering at least the brace-injection vector
(return 0; } public static int Evil() { return 0;).
Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)
Core.Scripting-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | CompiledScriptCache.cs:91-103 (Clear) |
| Status | Open |
Description: Clear() snapshots _cache.Keys.ToArray() then iterates,
calling TryRemove(key, out var lazy) on each — the key-only overload, not
the value-scoped one used in GetOrCompile's catch block. Between the
snapshot and a given TryRemove, a concurrent GetOrCompile(scriptSource)
call that hashes to the same key can re-insert a fresh Lazy whose .Value
the caller already retained. The unconditional TryRemove then removes that
fresh Lazy and DisposeLazyIfMaterialised(lazy) calls Dispose() on its
evaluator — unloading the ALC while the concurrent caller still holds a
reference to the evaluator and intends to invoke it.
This is exactly the race-window pattern the Core.Scripting-006 resolution
fixed in GetOrCompile's catch block (the test
Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry locks it
there). Clear() carries the same shape but uses the older, value-blind
overload, so the same race that finding-006 addresses is still latent on the
publish-replace path.
In current production wiring Clear() is intended for config-publish + tests
— neither overlaps steady-state evaluation under the documented design — so
the in-practice impact is low. But the cache is checked in as the
forward-looking compile cache for the engines (per Script.SourceHash's docs
and the cache's own remarks); a future wiring that calls Clear() from
publish while evaluations are in flight would dispose live evaluators.
Recommendation: Replace the snapshot + TryRemove(key, out var lazy)
sequence with an enumeration that captures the Lazy reference at snapshot
time and uses the value-scoped TryRemove(KeyValuePair<,>) overload, mirroring
the Core.Scripting-006 fix:
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
Add a regression test that races GetOrCompile against Clear and asserts
the caller's evaluator is still usable.
Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)
Core.Scripting-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | ScriptEvaluator.cs:234-270 (ToCSharpTypeName) |
| Status | Open |
Description: ToCSharpTypeName is documented to handle nested types
(Outer+Inner → Outer.Inner) via Replace('+', '.') for the
non-generic path (line 269) but the generic path (line 263-266) constructs the
name from def.FullName! then takes a substring up to the backtick. For a
nested generic type — e.g. Outer.Inner<T> whose FullName is
Outer+Inner1—Replace('+', '.')is applied first, thenSubstring(0, IndexOf(''))
on "Outer.Inner1"produces"Outer.Inner"`, which is correct. Good.
However, the generic branch does NOT handle the case where the OPEN generic
type itself is nested with + inside the parent's name when the parent is
also generic (Outer<TOuter>.Inner<TInner> — FullName is
Outer1+Inner1[[TOuter,TInner]]). For that shape Substring(0, IndexOf(''))truncates at the first backtick — yielding"Outer.Inner"— silently dropping the closed type arguments ofOuter. The resulting source string is syntactically valid but semantically wrong: global::Outer.Innerdoes not nameOuter.Inner`.
The production code never hits this shape — TResult is always one of
object?, bool, int, double, string?, DateTime across the
virtual-tag engine, the alarm engine, the test-harness, and the test suite,
and ScriptGlobals<TContext> is always a top-level generic over a top-level
ScriptContext subclass. The bug is latent. But it is a foot-gun for a
future caller (e.g. a Phase-8 driver that wires a context type defined as a
nested generic for grouping reasons) and the XML-doc comment claims
"handles nested types" without qualifying it.
A second smaller correctness gap on the same path: the comment claims
global::-qualified FQNs prevent accidental capture by the wrapper's using
directives, which is true for the generic / non-generic branches, but the
primitive aliases (bool, int, string, object, …) are emitted unqualified.
A script that defines a local class bool (now possible per Core.Scripting-013)
would shadow the alias. Probably benign, but worth a comment.
Recommendation: Add a check in the generic branch that walks the FullName
backtick-by-backtick — or use INamedTypeSymbol-style name composition from
def.DeclaringType recursively — so multi-arity-nested generics emit
correctly. At minimum update the XML doc to qualify "handles nested types" as
"handles single-level nesting; nested generics whose parent is itself generic
are not supported". Add a ToCSharpTypeName unit test (currently nothing
exercises this method directly — coverage relies on the end-to-end compile path,
so the bug surfaces only as a misleading Roslyn diagnostic).
Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)
Core.Scripting-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117, src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182 |
| Status | Open |
Description: The Core.Scripting-008 resolution introduced
ScriptEvaluator.IDisposable + CompiledScriptCache.Clear() that disposes
each materialised evaluator before dropping its dictionary entry, so per-publish
ALC accretion is no longer process-lifetime rooted inside the cache. But
neither production consumer of ScriptEvaluator uses the cache — both
VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync call
ScriptEvaluator<TContext, TResult>.Compile(...) directly (lines 105 / 160
respectively), store the evaluator inside an internal VirtualTagState /
AlarmState record, and on the next Load simply call _tags.Clear() /
_alarms.Clear(). The dropped ScriptEvaluator references never have
Dispose() called on them, so the underlying ScriptAssemblyLoadContext
instances are never Unload()-ed. The .NET runtime guarantees that a
collectible ALC stays alive until Unload() is called explicitly — having
"no strong references" is necessary but not sufficient. So the publish-replace
cycle leaks every prior generation's emitted assembly exactly as before the
fix, even though the fix's infrastructure is in place.
The Core.Scripting-008 regression tests in CompiledScriptCacheTests
(Dispose_unloads_compiled_script_assembly_load_context /
Clear_disposes_every_materialised_evaluator) prove the contract on
CompiledScriptCache, but neither engine uses that class. There is no
integration test exercising the actual publish path — i.e. that calling
VirtualTagEngine.Load(...) twice with different definitions makes the prior
generation's ALC eligible for GC. As a result the fix's headline guarantee
("Server restarts are no longer required to reclaim compiled-script memory" —
docs/VirtualTags.md) is not actually delivered to the production engines.
This is the same observable behavior the original Core.Scripting-008 finding described, surfacing on a different code path that the resolution did not touch.
Recommendation: Either route the engines' compile path through
CompiledScriptCache<TContext, TResult> (the documented design — the cache
already returns the same evaluator instance for identical source, and its
Clear() now performs the right disposal — and Script.SourceHash's doc-comment
already names this as the cache key), or make the engines' Load methods
dispose the previous ScriptEvaluator instances before reassigning. The
former is the cleaner change because it also collapses redundant compiles
across publishes for unchanged scripts. Add an integration test along the
lines of CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator
for each engine: snapshot the per-evaluator emitted assembly via
WeakReference, call Load(...) with a different definition set, and assert
the prior generation's assemblies become collectable.
Resolution: (empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)