From 8e99f22b2432f4c0ff3a94b461b562723c558b3e Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 16 Jun 2026 19:07:32 -0400 Subject: [PATCH] =?UTF-8?q?docs(m3):=20design=20=E2=80=94=20shared=20Scrip?= =?UTF-8?q?tAnalysis=20project=20consolidating=20the=204=20trust-model=20a?= =?UTF-8?q?nalyzers?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...16-script-analysis-consolidation-design.md | 159 ++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 docs/plans/2026-06-16-script-analysis-consolidation-design.md diff --git a/docs/plans/2026-06-16-script-analysis-consolidation-design.md b/docs/plans/2026-06-16-script-analysis-consolidation-design.md new file mode 100644 index 00000000..de8fa28b --- /dev/null +++ b/docs/plans/2026-06-16-script-analysis-consolidation-design.md @@ -0,0 +1,159 @@ +# Shared Script-Analysis Consolidation (M3) — Design + +**Status:** Proposed (awaiting approval) +**Supersedes:** the M3.1–M3.4 tasks in `docs/plans/2026-06-15-stillpending-phase1-implementation.md` (those assumed a contained "wire Roslyn into TemplateEngine.ScriptCompiler" change; this design replaces them with a consolidation, per the chosen approach). +**Worktree/branch:** `worktree-m3-script-trust-boundary`. + +## Goal + +Make the **design-time script trust gate authoritative** and eliminate the four-way +divergence in script trust enforcement by extracting one shared, authoritative +analyzer that every script call site delegates to. + +## Background — the problem + +Script trust enforcement (forbidden-API deny-list + compile) exists in **four** +places, each implemented differently, and they disagree: + +| Site | File | Mechanism | Forbidden-API gaps | +|---|---|---|---| +| **TemplateEngine** (the deploy gate) | `Validation/ScriptCompiler.cs`, `ValidationService.cs` | **Fake**: substring scan + brace balancing | Bypassable by alias / `using static` / `global::`; no real compile. **This is the authoritative deploy-blocking gate** (`DeploymentManager.FlatteningPipeline`, `ManagementActor`). | +| **SiteRuntime** | `Scripts/ScriptCompilationService.cs` | Semantic symbol analysis + `CSharpScript.Compile` against real `ScriptGlobals` | No reflection-gateway hardening (`typeof(x).Assembly.GetType("System.IO.File")`). | +| **InboundAPI** | `ForbiddenApiChecker.cs` | Syntactic walker + reflection-gateway + `dynamic`/`Activator` hardening | Syntactic only (no symbol resolution); forbids all `System.Diagnostics`. | +| **CentralUI** | `ScriptAnalysis/ScriptAnalysisService.cs` (+ `SandboxScriptHost`) | Semantic + full `CSharpScript.Compile` against a globals stub | Lenient threading list (allows most `System.Threading`). | + +The deploy gate — the one that actually blocks a bad script from shipping — is the +**weakest**. The strong machinery exists only in higher layers +(SiteRuntime/InboundAPI/CentralUI), which TemplateEngine cannot reference. + +## Approach (chosen): extract a shared analyzer project + +New project **`ZB.MOM.WW.ScadaBridge.ScriptAnalysis`** sitting just above Commons +(references only `Commons` + `Microsoft.CodeAnalysis.CSharp.Scripting` + +`Microsoft.CodeAnalysis.CSharp.Workspaces`). The four call sites delegate to it. + +### Public surface + +1. **`ScriptTrustPolicy`** — the single source of truth for the trust model: + - `ForbiddenScopes` (namespaces/types), `AllowedExceptions`, + `ReflectionGatewayMembers`, `ForbiddenIdentifiers` (`dynamic`, `Activator`), + - `DefaultReferences` (canonical script metadata references) + `DefaultImports`. +2. **`ScriptTrustValidator.FindViolations(code, extraReferences?)`** — the + **authoritative forbidden-API gate**. Fuses SiteRuntime's *semantic* symbol + resolution (resolves aliases, `using static`, `global::`, transitive imports — + inspects the resolved symbol, not the spelling) with InboundAPI's *syntactic* + reflection-gateway / `dynamic` / `Activator` hardening. Needs **no** globals type. + Returns violation messages; empty == clean. +3. **`RoslynScriptCompiler`** — + - `Compile(code, globalsType?, extraReferences?, extraImports?)` → real + `CSharpScript.Create(...).Compile()`, returns error diagnostics. Each caller + passes **its own** `globalsType`. + - `ParseDiagnostics(code)` → syntax-only diagnostics (for callers without globals). +4. **`ScriptCompileSurface`** / **`TriggerCompileSurface`** — lightweight + **compile-only** globals stubs mirroring `ScriptGlobals` / `TriggerExpressionGlobals` + member-for-member, depending only on `Commons.Types(.Scripts)`. **Never executed** — + they exist purely so the deploy gate can bind real API-using scripts + (`Attributes[...]`, `Notify.To(...).Send(...)`, `ExternalSystem.Call(...)`, …) and + thus catch undefined-symbol / type errors at central validation time. Drift from the + real globals is caught by a reflection parity test (below). + +### Per-consumer migration (behavior-preserving delegation) + +- **TemplateEngine** (`ScriptCompiler.TryCompile`): `FindViolations` + + `RoslynScriptCompiler.Compile(code, typeof(ScriptCompileSurface))` → the deploy + gate now does authoritative forbidden-API **and** a real type-checking compile. + `ValidationService.CheckExpressionSyntax` → `FindViolations` + compile against + `TriggerCompileSurface`. Delete `ForbiddenPatterns` + the substring scan. Add + project ref to `ScriptAnalysis`. +- **SiteRuntime** (`ScriptCompilationService.ValidateTrustModel`): delegate to + `FindViolations`; keep `CSharpScript.Compile` against the **real** `ScriptGlobals` + for execution. +- **InboundAPI** (`ForbiddenApiChecker.FindViolations`): delegate to shared + (reflection-gateway behavior preserved, now centralized). +- **CentralUI** (`ScriptAnalysisService.FindForbiddenApiUsages` / `ValidateTrustModel`): + delegate to shared; keep the editor diagnostic-marker mapping and the Test-Run + executing host (`SandboxScriptHost`, which fires real central services). + +## Unified forbidden-API policy (strictest-safe union) + +- **Forbid:** `System.IO`, `System.Diagnostics.Process`, `System.Threading`, + `System.Reflection`, `System.Net`, `System.Runtime.InteropServices`, `Microsoft.Win32`. +- **Allow exceptions:** `System.Threading.Tasks`, `System.Threading.CancellationToken`, + `System.Threading.CancellationTokenSource`. +- **Reflection-gateway members** (blocked on any receiver): `GetType`, `GetTypeInfo`, + `Assembly`, `Module`, `CreateInstance`, `InvokeMember`, `GetMethod(s)`, + `GetConstructor(s)`, `GetField(s)`, `GetProperty(ies)`, `GetMember(s)`, + `GetRuntimeMethod(s)`, `MethodHandle`, `TypeHandle`. +- **Forbidden identifiers:** `dynamic`, `Activator`. + +Three genuine conflicts between the current lists, resolved toward the actual runtime gate: + +1. **`System.Diagnostics`** — InboundAPI forbids the whole namespace; SiteRuntime + forbids only `.Process`. → **Forbid `.Process` only** (allow `Stopwatch`, `Debug`, + `Activity`). *Slightly loosens InboundAPI on harmless types; matches the real runtime.* +2. **`System.Net`** — SiteRuntime forbids only `Sockets`+`Http`; others forbid all. → + **Forbid all `System.Net`** (scripts use the `ExternalSystem` helper for HTTP). + *Slightly tightens SiteRuntime; safe.* +3. **`System.Threading`** — CentralUI allows most of it. → **Forbid all except Tasks + + CancellationToken(Source)** (matches the real runtime). *Tightens the CentralUI editor.* + +**Net effect:** the unified policy ≈ SiteRuntime's runtime policy (the gate that +actually executes) + InboundAPI's reflection hardening + all-`System.Net` + +`InteropServices`/`Win32`. Because SiteRuntime already enforces ~this at execution +time, no script that currently deploys-and-runs newly breaks; several real gaps close +(the deploy gate becomes real; the editor stops under-warning; reflection escapes are +blocked everywhere). + +## Layering / cycle check + +`ScriptAnalysis → { Commons, Microsoft.CodeAnalysis.* }`. Referenced by TemplateEngine, +SiteRuntime, InboundAPI, CentralUI. None of those are referenced *by* Commons, and +ScriptAnalysis references none of them → **no cycles**. + +## Test strategy + +- **New `ScriptAnalysis.Tests`** — adversarial corpus that must all yield violations: + namespace alias, `using static`, `global::`, `typeof(x).Assembly.GetType("System.IO.File")`, + `dynamic`, `Activator`, `InteropServices`, `Microsoft.Win32`. Legit scripts must be + clean: `await Task`, `Stopwatch`, LINQ, `Math`, `CancellationToken`. `ParseDiagnostics` + catches syntax errors; `Compile(..., typeof(ScriptCompileSurface))` catches + undefined-symbol/type errors **and** accepts real API-using scripts. +- **Drift parity test** — reflect over `ScriptGlobals` vs `ScriptCompileSurface` (and + `TriggerExpressionGlobals` vs `TriggerCompileSurface`); assert public member-name parity + so the stub can't silently drift. +- **Preserve & re-point existing tests** — `ScriptCompilerTests`, + `ForbiddenApiCheckerTests`, `ScriptAnalysisServiceTests` (+ async-resolve, console), + `ScriptCompilationServiceTests`, `TrustModelSemanticTests`, `SandboxTests` must stay + green against the delegating implementations (adjust expected wording where messages change). +- **Adversarial deploy test** (TemplateEngine integration) — a bypass script **fails to deploy**. + +## Migration sequencing + +- **Wave 0 — spike (M3.0):** confirm `ScriptCompileSurface` can mirror `ScriptGlobals` + with Commons-only deps; confirm package refs resolve in the new project. +- **Wave 1 — foundation (M3.1):** build `ScriptAnalysis` (policy + validator + compiler + + surfaces) with its full test suite. Everything else depends on this. +- **Wave 2 — consumers (M3.2–M3.5, parallelizable, disjoint projects):** TemplateEngine ∥ + SiteRuntime ∥ InboundAPI ∥ CentralUI each delegate to the shared analyzer; each keeps + its own tests green. *Pathspec commits to avoid shared-index contamination.* +- **Wave 3 — integration + docs (M3.6):** full-solution build; targeted tests across all + five projects + adversarial deploy test; docs (Component-TemplateEngine / -SiteRuntime / + -InboundAPI / -Commons, README + CLAUDE.md component list, remove the "SECURITY + LIMITATION / advisory" notes now that the gate is real); fixture cleanup for any + latent-invalid scripts the real compile surfaces. + +## Risks + +- **Policy change surfaces latent-invalid fixtures/scripts** — budgeted in Wave 3 + (same risk the original plan flagged for M3.4). +- **`ScriptCompileSurface` drift** vs real `ScriptGlobals` — mitigated by the parity test. +- **Touching four working, security-sensitive components** — mitigated by + behavior-preserving delegation + keeping each consumer's existing tests + the + classification-driven review chain (high-risk = serial spec→code review). + +## Out of scope + +- Physically unifying the **executing** globals types (`ScriptGlobals`, `SandboxScriptHost`) + — they stay with their owners; only the compile-surface stub + trust policy are shared. +- A true runtime sandbox (restricted `AssemblyLoadContext` / out-of-process) — still + future work; the static gate is defence-in-depth, as already noted in InboundAPI-015.