Files
ScadaBridge/docs/plans/2026-06-16-script-analysis-consolidation-design.md
T

10 KiB
Raw Blame History

Shared Script-Analysis Consolidation (M3) — Design

Status: Proposed (awaiting approval) Supersedes: the M3.1M3.4 tasks in docs/plans/2026-06-15-stillpending-phase1-implementation.md (those assumed a contained "wire Roslyn into TemplateEngine.ScriptCompiler" change; this design replaces them with a consolidation, per the chosen approach). Worktree/branch: worktree-m3-script-trust-boundary.

Goal

Make the design-time script trust gate authoritative and eliminate the four-way divergence in script trust enforcement by extracting one shared, authoritative analyzer that every script call site delegates to.

Background — the problem

Script trust enforcement (forbidden-API deny-list + compile) exists in four places, each implemented differently, and they disagree:

Site File Mechanism Forbidden-API gaps
TemplateEngine (the deploy gate) Validation/ScriptCompiler.cs, ValidationService.cs Fake: substring scan + brace balancing Bypassable by alias / using static / global::; no real compile. This is the authoritative deploy-blocking gate (DeploymentManager.FlatteningPipeline, ManagementActor).
SiteRuntime Scripts/ScriptCompilationService.cs Semantic symbol analysis + CSharpScript.Compile against real ScriptGlobals No reflection-gateway hardening (typeof(x).Assembly.GetType("System.IO.File")).
InboundAPI ForbiddenApiChecker.cs Syntactic walker + reflection-gateway + dynamic/Activator hardening Syntactic only (no symbol resolution); forbids all System.Diagnostics.
CentralUI ScriptAnalysis/ScriptAnalysisService.cs (+ SandboxScriptHost) Semantic + full CSharpScript.Compile against a globals stub Lenient threading list (allows most System.Threading).

The deploy gate — the one that actually blocks a bad script from shipping — is the weakest. The strong machinery exists only in higher layers (SiteRuntime/InboundAPI/CentralUI), which TemplateEngine cannot reference.

Approach (chosen): extract a shared analyzer project

New project ZB.MOM.WW.ScadaBridge.ScriptAnalysis sitting just above Commons (references only Commons + Microsoft.CodeAnalysis.CSharp.Scripting + Microsoft.CodeAnalysis.CSharp.Workspaces). The four call sites delegate to it.

Public surface

  1. ScriptTrustPolicy — the single source of truth for the trust model:
    • ForbiddenScopes (namespaces/types), AllowedExceptions, ReflectionGatewayMembers, ForbiddenIdentifiers (dynamic, Activator),
    • DefaultReferences (canonical script metadata references) + DefaultImports.
  2. ScriptTrustValidator.FindViolations(code, extraReferences?) — the authoritative forbidden-API gate. Fuses SiteRuntime's semantic symbol resolution (resolves aliases, using static, global::, transitive imports — inspects the resolved symbol, not the spelling) with InboundAPI's syntactic reflection-gateway / dynamic / Activator hardening. Needs no globals type. Returns violation messages; empty == clean.
  3. RoslynScriptCompiler
    • Compile(code, globalsType?, extraReferences?, extraImports?) → real CSharpScript.Create(...).Compile(), returns error diagnostics. Each caller passes its own globalsType.
    • ParseDiagnostics(code) → syntax-only diagnostics (for callers without globals).
  4. ScriptCompileSurface / TriggerCompileSurface — lightweight compile-only globals stubs mirroring ScriptGlobals / TriggerExpressionGlobals member-for-member, depending only on Commons.Types(.Scripts). Never executed — they exist purely so the deploy gate can bind real API-using scripts (Attributes[...], Notify.To(...).Send(...), ExternalSystem.Call(...), …) and thus catch undefined-symbol / type errors at central validation time. Drift from the real globals is caught by a reflection parity test (below).

Per-consumer migration (behavior-preserving delegation)

  • TemplateEngine (ScriptCompiler.TryCompile): FindViolations + RoslynScriptCompiler.Compile(code, typeof(ScriptCompileSurface)) → the deploy gate now does authoritative forbidden-API and a real type-checking compile. ValidationService.CheckExpressionSyntaxFindViolations + compile against TriggerCompileSurface. Delete ForbiddenPatterns + the substring scan. Add project ref to ScriptAnalysis.
  • SiteRuntime (ScriptCompilationService.ValidateTrustModel): delegate to FindViolations; keep CSharpScript.Compile against the real ScriptGlobals for execution.
  • InboundAPI (ForbiddenApiChecker.FindViolations): delegate to shared (reflection-gateway behavior preserved, now centralized).
  • CentralUI (ScriptAnalysisService.FindForbiddenApiUsages / ValidateTrustModel): delegate to shared; keep the editor diagnostic-marker mapping and the Test-Run executing host (SandboxScriptHost, which fires real central services).

Unified forbidden-API policy (strictest-safe union)

  • Forbid: System.IO, System.Diagnostics.Process, System.Threading, System.Reflection, System.Net, System.Runtime.InteropServices, Microsoft.Win32.
  • Allow exceptions: System.Threading.Tasks, System.Threading.CancellationToken, System.Threading.CancellationTokenSource.
  • Reflection-gateway members (blocked on any receiver): GetType, GetTypeInfo, Assembly, Module, CreateInstance, InvokeMember, GetMethod(s), GetConstructor(s), GetField(s), GetProperty(ies), GetMember(s), GetRuntimeMethod(s), MethodHandle, TypeHandle.
  • Forbidden identifiers: dynamic, Activator.

Three genuine conflicts between the current lists, resolved toward the actual runtime gate:

  1. System.Diagnostics — InboundAPI forbids the whole namespace; SiteRuntime forbids only .Process. → Forbid .Process only (allow Stopwatch, Debug, Activity). Slightly loosens InboundAPI on harmless types; matches the real runtime.
  2. System.Net — SiteRuntime forbids only Sockets+Http; others forbid all. → Forbid all System.Net (scripts use the ExternalSystem helper for HTTP). Slightly tightens SiteRuntime; safe.
  3. System.Threading — CentralUI allows most of it. → Forbid all except Tasks + CancellationToken(Source) (matches the real runtime). Tightens the CentralUI editor.

Net effect: the unified policy ≈ SiteRuntime's runtime policy (the gate that actually executes) + InboundAPI's reflection hardening + all-System.Net + InteropServices/Win32. Because SiteRuntime already enforces ~this at execution time, no script that currently deploys-and-runs newly breaks; several real gaps close (the deploy gate becomes real; the editor stops under-warning; reflection escapes are blocked everywhere).

Layering / cycle check

ScriptAnalysis → { Commons, Microsoft.CodeAnalysis.* }. Referenced by TemplateEngine, SiteRuntime, InboundAPI, CentralUI. None of those are referenced by Commons, and ScriptAnalysis references none of them → no cycles.

Test strategy

  • New ScriptAnalysis.Tests — adversarial corpus that must all yield violations: namespace alias, using static, global::, typeof(x).Assembly.GetType("System.IO.File"), dynamic, Activator, InteropServices, Microsoft.Win32. Legit scripts must be clean: await Task, Stopwatch, LINQ, Math, CancellationToken. ParseDiagnostics catches syntax errors; Compile(..., typeof(ScriptCompileSurface)) catches undefined-symbol/type errors and accepts real API-using scripts.
  • Drift parity test — reflect over ScriptGlobals vs ScriptCompileSurface (and TriggerExpressionGlobals vs TriggerCompileSurface); assert public member-name parity so the stub can't silently drift.
  • Preserve & re-point existing testsScriptCompilerTests, ForbiddenApiCheckerTests, ScriptAnalysisServiceTests (+ async-resolve, console), ScriptCompilationServiceTests, TrustModelSemanticTests, SandboxTests must stay green against the delegating implementations (adjust expected wording where messages change).
  • Adversarial deploy test (TemplateEngine integration) — a bypass script fails to deploy.

Migration sequencing

  • Wave 0 — spike (M3.0): confirm ScriptCompileSurface can mirror ScriptGlobals with Commons-only deps; confirm package refs resolve in the new project.
  • Wave 1 — foundation (M3.1): build ScriptAnalysis (policy + validator + compiler + surfaces) with its full test suite. Everything else depends on this.
  • Wave 2 — consumers (M3.2M3.5, parallelizable, disjoint projects): TemplateEngine ∥ SiteRuntime ∥ InboundAPI ∥ CentralUI each delegate to the shared analyzer; each keeps its own tests green. Pathspec commits to avoid shared-index contamination.
  • Wave 3 — integration + docs (M3.6): full-solution build; targeted tests across all five projects + adversarial deploy test; docs (Component-TemplateEngine / -SiteRuntime / -InboundAPI / -Commons, README + CLAUDE.md component list, remove the "SECURITY LIMITATION / advisory" notes now that the gate is real); fixture cleanup for any latent-invalid scripts the real compile surfaces.

Risks

  • Policy change surfaces latent-invalid fixtures/scripts — budgeted in Wave 3 (same risk the original plan flagged for M3.4).
  • ScriptCompileSurface drift vs real ScriptGlobals — mitigated by the parity test.
  • Touching four working, security-sensitive components — mitigated by behavior-preserving delegation + keeping each consumer's existing tests + the classification-driven review chain (high-risk = serial spec→code review).

Out of scope

  • Physically unifying the executing globals types (ScriptGlobals, SandboxScriptHost) — they stay with their owners; only the compile-surface stub + trust policy are shared.
  • A true runtime sandbox (restricted AssemblyLoadContext / out-of-process) — still future work; the static gate is defence-in-depth, as already noted in InboundAPI-015.