10 KiB
Shared Script-Analysis Consolidation (M3) — Design
Status: Proposed (awaiting approval)
Supersedes: the M3.1–M3.4 tasks in docs/plans/2026-06-15-stillpending-phase1-implementation.md (those assumed a contained "wire Roslyn into TemplateEngine.ScriptCompiler" change; this design replaces them with a consolidation, per the chosen approach).
Worktree/branch: worktree-m3-script-trust-boundary.
Goal
Make the design-time script trust gate authoritative and eliminate the four-way divergence in script trust enforcement by extracting one shared, authoritative analyzer that every script call site delegates to.
Background — the problem
Script trust enforcement (forbidden-API deny-list + compile) exists in four places, each implemented differently, and they disagree:
| Site | File | Mechanism | Forbidden-API gaps |
|---|---|---|---|
| TemplateEngine (the deploy gate) | Validation/ScriptCompiler.cs, ValidationService.cs |
Fake: substring scan + brace balancing | Bypassable by alias / using static / global::; no real compile. This is the authoritative deploy-blocking gate (DeploymentManager.FlatteningPipeline, ManagementActor). |
| SiteRuntime | Scripts/ScriptCompilationService.cs |
Semantic symbol analysis + CSharpScript.Compile against real ScriptGlobals |
No reflection-gateway hardening (typeof(x).Assembly.GetType("System.IO.File")). |
| InboundAPI | ForbiddenApiChecker.cs |
Syntactic walker + reflection-gateway + dynamic/Activator hardening |
Syntactic only (no symbol resolution); forbids all System.Diagnostics. |
| CentralUI | ScriptAnalysis/ScriptAnalysisService.cs (+ SandboxScriptHost) |
Semantic + full CSharpScript.Compile against a globals stub |
Lenient threading list (allows most System.Threading). |
The deploy gate — the one that actually blocks a bad script from shipping — is the weakest. The strong machinery exists only in higher layers (SiteRuntime/InboundAPI/CentralUI), which TemplateEngine cannot reference.
Approach (chosen): extract a shared analyzer project
New project ZB.MOM.WW.ScadaBridge.ScriptAnalysis sitting just above Commons
(references only Commons + Microsoft.CodeAnalysis.CSharp.Scripting +
Microsoft.CodeAnalysis.CSharp.Workspaces). The four call sites delegate to it.
Public surface
ScriptTrustPolicy— the single source of truth for the trust model:ForbiddenScopes(namespaces/types),AllowedExceptions,ReflectionGatewayMembers,ForbiddenIdentifiers(dynamic,Activator),DefaultReferences(canonical script metadata references) +DefaultImports.
ScriptTrustValidator.FindViolations(code, extraReferences?)— the authoritative forbidden-API gate. Fuses SiteRuntime's semantic symbol resolution (resolves aliases,using static,global::, transitive imports — inspects the resolved symbol, not the spelling) with InboundAPI's syntactic reflection-gateway /dynamic/Activatorhardening. Needs no globals type. Returns violation messages; empty == clean.RoslynScriptCompiler—Compile(code, globalsType?, extraReferences?, extraImports?)→ realCSharpScript.Create(...).Compile(), returns error diagnostics. Each caller passes its ownglobalsType.ParseDiagnostics(code)→ syntax-only diagnostics (for callers without globals).
ScriptCompileSurface/TriggerCompileSurface— lightweight compile-only globals stubs mirroringScriptGlobals/TriggerExpressionGlobalsmember-for-member, depending only onCommons.Types(.Scripts). Never executed — they exist purely so the deploy gate can bind real API-using scripts (Attributes[...],Notify.To(...).Send(...),ExternalSystem.Call(...), …) and thus catch undefined-symbol / type errors at central validation time. Drift from the real globals is caught by a reflection parity test (below).
Per-consumer migration (behavior-preserving delegation)
- TemplateEngine (
ScriptCompiler.TryCompile):FindViolations+RoslynScriptCompiler.Compile(code, typeof(ScriptCompileSurface))→ the deploy gate now does authoritative forbidden-API and a real type-checking compile.ValidationService.CheckExpressionSyntax→FindViolations+ compile againstTriggerCompileSurface. DeleteForbiddenPatterns+ the substring scan. Add project ref toScriptAnalysis. - SiteRuntime (
ScriptCompilationService.ValidateTrustModel): delegate toFindViolations; keepCSharpScript.Compileagainst the realScriptGlobalsfor execution. - InboundAPI (
ForbiddenApiChecker.FindViolations): delegate to shared (reflection-gateway behavior preserved, now centralized). - CentralUI (
ScriptAnalysisService.FindForbiddenApiUsages/ValidateTrustModel): delegate to shared; keep the editor diagnostic-marker mapping and the Test-Run executing host (SandboxScriptHost, which fires real central services).
Unified forbidden-API policy (strictest-safe union)
- Forbid:
System.IO,System.Diagnostics.Process,System.Threading,System.Reflection,System.Net,System.Runtime.InteropServices,Microsoft.Win32. - Allow exceptions:
System.Threading.Tasks,System.Threading.CancellationToken,System.Threading.CancellationTokenSource. - Reflection-gateway members (blocked on any receiver):
GetType,GetTypeInfo,Assembly,Module,CreateInstance,InvokeMember,GetMethod(s),GetConstructor(s),GetField(s),GetProperty(ies),GetMember(s),GetRuntimeMethod(s),MethodHandle,TypeHandle. - Forbidden identifiers:
dynamic,Activator.
Three genuine conflicts between the current lists, resolved toward the actual runtime gate:
System.Diagnostics— InboundAPI forbids the whole namespace; SiteRuntime forbids only.Process. → Forbid.Processonly (allowStopwatch,Debug,Activity). Slightly loosens InboundAPI on harmless types; matches the real runtime.System.Net— SiteRuntime forbids onlySockets+Http; others forbid all. → Forbid allSystem.Net(scripts use theExternalSystemhelper for HTTP). Slightly tightens SiteRuntime; safe.System.Threading— CentralUI allows most of it. → Forbid all except Tasks + CancellationToken(Source) (matches the real runtime). Tightens the CentralUI editor.
Net effect: the unified policy ≈ SiteRuntime's runtime policy (the gate that
actually executes) + InboundAPI's reflection hardening + all-System.Net +
InteropServices/Win32. Because SiteRuntime already enforces ~this at execution
time, no script that currently deploys-and-runs newly breaks; several real gaps close
(the deploy gate becomes real; the editor stops under-warning; reflection escapes are
blocked everywhere).
Layering / cycle check
ScriptAnalysis → { Commons, Microsoft.CodeAnalysis.* }. Referenced by TemplateEngine,
SiteRuntime, InboundAPI, CentralUI. None of those are referenced by Commons, and
ScriptAnalysis references none of them → no cycles.
Test strategy
- New
ScriptAnalysis.Tests— adversarial corpus that must all yield violations: namespace alias,using static,global::,typeof(x).Assembly.GetType("System.IO.File"),dynamic,Activator,InteropServices,Microsoft.Win32. Legit scripts must be clean:await Task,Stopwatch, LINQ,Math,CancellationToken.ParseDiagnosticscatches syntax errors;Compile(..., typeof(ScriptCompileSurface))catches undefined-symbol/type errors and accepts real API-using scripts. - Drift parity test — reflect over
ScriptGlobalsvsScriptCompileSurface(andTriggerExpressionGlobalsvsTriggerCompileSurface); assert public member-name parity so the stub can't silently drift. - Preserve & re-point existing tests —
ScriptCompilerTests,ForbiddenApiCheckerTests,ScriptAnalysisServiceTests(+ async-resolve, console),ScriptCompilationServiceTests,TrustModelSemanticTests,SandboxTestsmust stay green against the delegating implementations (adjust expected wording where messages change). - Adversarial deploy test (TemplateEngine integration) — a bypass script fails to deploy.
Migration sequencing
- Wave 0 — spike (M3.0): confirm
ScriptCompileSurfacecan mirrorScriptGlobalswith Commons-only deps; confirm package refs resolve in the new project. - Wave 1 — foundation (M3.1): build
ScriptAnalysis(policy + validator + compiler + surfaces) with its full test suite. Everything else depends on this. - Wave 2 — consumers (M3.2–M3.5, parallelizable, disjoint projects): TemplateEngine ∥ SiteRuntime ∥ InboundAPI ∥ CentralUI each delegate to the shared analyzer; each keeps its own tests green. Pathspec commits to avoid shared-index contamination.
- Wave 3 — integration + docs (M3.6): full-solution build; targeted tests across all five projects + adversarial deploy test; docs (Component-TemplateEngine / -SiteRuntime / -InboundAPI / -Commons, README + CLAUDE.md component list, remove the "SECURITY LIMITATION / advisory" notes now that the gate is real); fixture cleanup for any latent-invalid scripts the real compile surfaces.
Risks
- Policy change surfaces latent-invalid fixtures/scripts — budgeted in Wave 3 (same risk the original plan flagged for M3.4).
ScriptCompileSurfacedrift vs realScriptGlobals— mitigated by the parity test.- Touching four working, security-sensitive components — mitigated by behavior-preserving delegation + keeping each consumer's existing tests + the classification-driven review chain (high-risk = serial spec→code review).
Out of scope
- Physically unifying the executing globals types (
ScriptGlobals,SandboxScriptHost) — they stay with their owners; only the compile-surface stub + trust policy are shared. - A true runtime sandbox (restricted
AssemblyLoadContext/ out-of-process) — still future work; the static gate is defence-in-depth, as already noted in InboundAPI-015.