Files
ScadaBridge/docs/plans/2026-06-16-script-analysis-consolidation-design.md
T

160 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Shared Script-Analysis Consolidation (M3) — Design
**Status:** Proposed (awaiting approval)
**Supersedes:** the M3.1M3.4 tasks in `docs/plans/2026-06-15-stillpending-phase1-implementation.md` (those assumed a contained "wire Roslyn into TemplateEngine.ScriptCompiler" change; this design replaces them with a consolidation, per the chosen approach).
**Worktree/branch:** `worktree-m3-script-trust-boundary`.
## Goal
Make the **design-time script trust gate authoritative** and eliminate the four-way
divergence in script trust enforcement by extracting one shared, authoritative
analyzer that every script call site delegates to.
## Background — the problem
Script trust enforcement (forbidden-API deny-list + compile) exists in **four**
places, each implemented differently, and they disagree:
| Site | File | Mechanism | Forbidden-API gaps |
|---|---|---|---|
| **TemplateEngine** (the deploy gate) | `Validation/ScriptCompiler.cs`, `ValidationService.cs` | **Fake**: substring scan + brace balancing | Bypassable by alias / `using static` / `global::`; no real compile. **This is the authoritative deploy-blocking gate** (`DeploymentManager.FlatteningPipeline`, `ManagementActor`). |
| **SiteRuntime** | `Scripts/ScriptCompilationService.cs` | Semantic symbol analysis + `CSharpScript.Compile` against real `ScriptGlobals` | No reflection-gateway hardening (`typeof(x).Assembly.GetType("System.IO.File")`). |
| **InboundAPI** | `ForbiddenApiChecker.cs` | Syntactic walker + reflection-gateway + `dynamic`/`Activator` hardening | Syntactic only (no symbol resolution); forbids all `System.Diagnostics`. |
| **CentralUI** | `ScriptAnalysis/ScriptAnalysisService.cs` (+ `SandboxScriptHost`) | Semantic + full `CSharpScript.Compile` against a globals stub | Lenient threading list (allows most `System.Threading`). |
The deploy gate — the one that actually blocks a bad script from shipping — is the
**weakest**. The strong machinery exists only in higher layers
(SiteRuntime/InboundAPI/CentralUI), which TemplateEngine cannot reference.
## Approach (chosen): extract a shared analyzer project
New project **`ZB.MOM.WW.ScadaBridge.ScriptAnalysis`** sitting just above Commons
(references only `Commons` + `Microsoft.CodeAnalysis.CSharp.Scripting` +
`Microsoft.CodeAnalysis.CSharp.Workspaces`). The four call sites delegate to it.
### Public surface
1. **`ScriptTrustPolicy`** — the single source of truth for the trust model:
- `ForbiddenScopes` (namespaces/types), `AllowedExceptions`,
`ReflectionGatewayMembers`, `ForbiddenIdentifiers` (`dynamic`, `Activator`),
- `DefaultReferences` (canonical script metadata references) + `DefaultImports`.
2. **`ScriptTrustValidator.FindViolations(code, extraReferences?)`** — the
**authoritative forbidden-API gate**. Fuses SiteRuntime's *semantic* symbol
resolution (resolves aliases, `using static`, `global::`, transitive imports —
inspects the resolved symbol, not the spelling) with InboundAPI's *syntactic*
reflection-gateway / `dynamic` / `Activator` hardening. Needs **no** globals type.
Returns violation messages; empty == clean.
3. **`RoslynScriptCompiler`** —
- `Compile(code, globalsType?, extraReferences?, extraImports?)` → real
`CSharpScript.Create(...).Compile()`, returns error diagnostics. Each caller
passes **its own** `globalsType`.
- `ParseDiagnostics(code)` → syntax-only diagnostics (for callers without globals).
4. **`ScriptCompileSurface`** / **`TriggerCompileSurface`** — lightweight
**compile-only** globals stubs mirroring `ScriptGlobals` / `TriggerExpressionGlobals`
member-for-member, depending only on `Commons.Types(.Scripts)`. **Never executed**
they exist purely so the deploy gate can bind real API-using scripts
(`Attributes[...]`, `Notify.To(...).Send(...)`, `ExternalSystem.Call(...)`, …) and
thus catch undefined-symbol / type errors at central validation time. Drift from the
real globals is caught by a reflection parity test (below).
### Per-consumer migration (behavior-preserving delegation)
- **TemplateEngine** (`ScriptCompiler.TryCompile`): `FindViolations` +
`RoslynScriptCompiler.Compile(code, typeof(ScriptCompileSurface))` → the deploy
gate now does authoritative forbidden-API **and** a real type-checking compile.
`ValidationService.CheckExpressionSyntax``FindViolations` + compile against
`TriggerCompileSurface`. Delete `ForbiddenPatterns` + the substring scan. Add
project ref to `ScriptAnalysis`.
- **SiteRuntime** (`ScriptCompilationService.ValidateTrustModel`): delegate to
`FindViolations`; keep `CSharpScript.Compile` against the **real** `ScriptGlobals`
for execution.
- **InboundAPI** (`ForbiddenApiChecker.FindViolations`): delegate to shared
(reflection-gateway behavior preserved, now centralized).
- **CentralUI** (`ScriptAnalysisService.FindForbiddenApiUsages` / `ValidateTrustModel`):
delegate to shared; keep the editor diagnostic-marker mapping and the Test-Run
executing host (`SandboxScriptHost`, which fires real central services).
## Unified forbidden-API policy (strictest-safe union)
- **Forbid:** `System.IO`, `System.Diagnostics.Process`, `System.Threading`,
`System.Reflection`, `System.Net`, `System.Runtime.InteropServices`, `Microsoft.Win32`.
- **Allow exceptions:** `System.Threading.Tasks`, `System.Threading.CancellationToken`,
`System.Threading.CancellationTokenSource`.
- **Reflection-gateway members** (blocked on any receiver): `GetType`, `GetTypeInfo`,
`Assembly`, `Module`, `CreateInstance`, `InvokeMember`, `GetMethod(s)`,
`GetConstructor(s)`, `GetField(s)`, `GetProperty(ies)`, `GetMember(s)`,
`GetRuntimeMethod(s)`, `MethodHandle`, `TypeHandle`.
- **Forbidden identifiers:** `dynamic`, `Activator`.
Three genuine conflicts between the current lists, resolved toward the actual runtime gate:
1. **`System.Diagnostics`** — InboundAPI forbids the whole namespace; SiteRuntime
forbids only `.Process`. → **Forbid `.Process` only** (allow `Stopwatch`, `Debug`,
`Activity`). *Slightly loosens InboundAPI on harmless types; matches the real runtime.*
2. **`System.Net`** — SiteRuntime forbids only `Sockets`+`Http`; others forbid all. →
**Forbid all `System.Net`** (scripts use the `ExternalSystem` helper for HTTP).
*Slightly tightens SiteRuntime; safe.*
3. **`System.Threading`** — CentralUI allows most of it. → **Forbid all except Tasks +
CancellationToken(Source)** (matches the real runtime). *Tightens the CentralUI editor.*
**Net effect:** the unified policy ≈ SiteRuntime's runtime policy (the gate that
actually executes) + InboundAPI's reflection hardening + all-`System.Net` +
`InteropServices`/`Win32`. Because SiteRuntime already enforces ~this at execution
time, no script that currently deploys-and-runs newly breaks; several real gaps close
(the deploy gate becomes real; the editor stops under-warning; reflection escapes are
blocked everywhere).
## Layering / cycle check
`ScriptAnalysis → { Commons, Microsoft.CodeAnalysis.* }`. Referenced by TemplateEngine,
SiteRuntime, InboundAPI, CentralUI. None of those are referenced *by* Commons, and
ScriptAnalysis references none of them → **no cycles**.
## Test strategy
- **New `ScriptAnalysis.Tests`** — adversarial corpus that must all yield violations:
namespace alias, `using static`, `global::`, `typeof(x).Assembly.GetType("System.IO.File")`,
`dynamic`, `Activator`, `InteropServices`, `Microsoft.Win32`. Legit scripts must be
clean: `await Task`, `Stopwatch`, LINQ, `Math`, `CancellationToken`. `ParseDiagnostics`
catches syntax errors; `Compile(..., typeof(ScriptCompileSurface))` catches
undefined-symbol/type errors **and** accepts real API-using scripts.
- **Drift parity test** — reflect over `ScriptGlobals` vs `ScriptCompileSurface` (and
`TriggerExpressionGlobals` vs `TriggerCompileSurface`); assert public member-name parity
so the stub can't silently drift.
- **Preserve & re-point existing tests** — `ScriptCompilerTests`,
`ForbiddenApiCheckerTests`, `ScriptAnalysisServiceTests` (+ async-resolve, console),
`ScriptCompilationServiceTests`, `TrustModelSemanticTests`, `SandboxTests` must stay
green against the delegating implementations (adjust expected wording where messages change).
- **Adversarial deploy test** (TemplateEngine integration) — a bypass script **fails to deploy**.
## Migration sequencing
- **Wave 0 — spike (M3.0):** confirm `ScriptCompileSurface` can mirror `ScriptGlobals`
with Commons-only deps; confirm package refs resolve in the new project.
- **Wave 1 — foundation (M3.1):** build `ScriptAnalysis` (policy + validator + compiler +
surfaces) with its full test suite. Everything else depends on this.
- **Wave 2 — consumers (M3.2M3.5, parallelizable, disjoint projects):** TemplateEngine ∥
SiteRuntime ∥ InboundAPI ∥ CentralUI each delegate to the shared analyzer; each keeps
its own tests green. *Pathspec commits to avoid shared-index contamination.*
- **Wave 3 — integration + docs (M3.6):** full-solution build; targeted tests across all
five projects + adversarial deploy test; docs (Component-TemplateEngine / -SiteRuntime /
-InboundAPI / -Commons, README + CLAUDE.md component list, remove the "SECURITY
LIMITATION / advisory" notes now that the gate is real); fixture cleanup for any
latent-invalid scripts the real compile surfaces.
## Risks
- **Policy change surfaces latent-invalid fixtures/scripts** — budgeted in Wave 3
(same risk the original plan flagged for M3.4).
- **`ScriptCompileSurface` drift** vs real `ScriptGlobals` — mitigated by the parity test.
- **Touching four working, security-sensitive components** — mitigated by
behavior-preserving delegation + keeping each consumer's existing tests + the
classification-driven review chain (high-risk = serial spec→code review).
## Out of scope
- Physically unifying the **executing** globals types (`ScriptGlobals`, `SandboxScriptHost`)
— they stay with their owners; only the compile-surface stub + trust policy are shared.
- A true runtime sandbox (restricted `AssemblyLoadContext` / out-of-process) — still
future work; the static gate is defence-in-depth, as already noted in InboundAPI-015.