docs(plan): VirtualTag/script memory Phase-1 implementation plan (A0+A+guardrail)

This commit is contained in:
Joseph Doherty
2026-06-07 15:00:11 -04:00
parent 321d57938f
commit df772dd09a
2 changed files with 216 additions and 0 deletions
@@ -0,0 +1,205 @@
# VirtualTag/Script Memory — Phase 1 Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
**Goal:** Make VirtualTag scripts scale in memory — cut Roslyn's ~18 MiB/script (mostly unmanaged) to ~1.66 MiB by isolating the script globals type from Roslyn (A0), skip Roslyn entirely for the mirror passthrough shape (A), and warn on deploy when a config will compile many scripts (guardrail). Phase 2 (the C2 interpreter) is spec-only in the design doc — NOT in this plan.
**Architecture:** A0 extracts the script-callable types (`ScriptContext`, `ScriptGlobals<T>`, `VirtualTagContext`, `AlarmPredicateContext`) into a new lean assembly that never references Roslyn, so Roslyn's reference manager stops materialising the Roslyn metadata closure per compile. A adds a passthrough short-circuit in the evaluator. The guardrail is a non-blocking warning in the deploy handler.
**Tech Stack:** .NET 10, Microsoft.CodeAnalysis.CSharp.Scripting (Roslyn 4.12), Akka.NET, xUnit + Shouldly. Solution: `ZB.MOM.WW.OtOpcUa.slnx`, Central Package Management (`Directory.Packages.props`). Branch: `exp/vtag-script-memory` (off `master`). Design: `docs/plans/2026-06-07-virtualtag-script-memory-design.md`.
**Sequencing:** T1 (A0 split) first — it changes assembly layout the rest builds on. Then T2 (A0 measurement gate), T3 (A passthrough), T4 (guardrail), T5 (live verify). Run sequentially (shared working tree + build).
---
### Task 1: A0 — extract script-callable types into a lean, Roslyn-free assembly
**Classification:** high-risk
**Estimated implement time:** ~6 min (assembly split; can't be atomically subdivided — must build green as a unit)
**Parallelizable with:** none
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj`
- Move (git mv, keep namespaces unchanged): `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptContext.cs` and `.../ScriptGlobals.cs` → the new project dir; `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagContext.cs` → new project dir; `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmPredicateContext.cs` → new project dir
- Modify: `.../Core.Scripting/ZB.MOM.WW.OtOpcUa.Core.Scripting.csproj`, `.../Core.VirtualTags/...csproj`, `.../Core.ScriptedAlarms/...csproj` (add ProjectReference to the new assembly)
- Modify: `ZB.MOM.WW.OtOpcUa.slnx` (add the new project — use `dotnet sln add`)
**Why these four:** the production script globals type is `ScriptGlobals<VirtualTagContext>`. Roslyn's reference manager loads the transitive reference closure of *every assembly in the globalsType's type graph*. Today `ScriptGlobals`+`ScriptContext` are in `Core.Scripting` (→ Roslyn) and `VirtualTagContext`/`AlarmPredicateContext` are in assemblies that reference `Core.Scripting` (→ Roslyn). Moving all four into one lean assembly that references only `Core.Abstractions` (+ Serilog) makes the whole closure Roslyn-free. (Confirmed: all four files are Roslyn-free — `ScriptContext` uses only `Serilog` + `Core.Abstractions`; `ScriptGlobals` has no usings; the two contexts use `Serilog`, `Core.Abstractions`, `Core.Scripting`(only for the `ScriptContext` base, which is moving with them).) **Keep each file's existing namespace** (`Core.Scripting`, `Core.VirtualTags`, `Core.ScriptedAlarms`) so NO caller `using` statements change — a namespace may span assemblies. Only project references change.
**Step 1 — Create the lean project.** `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj`, mirroring `Core.Abstractions.csproj`'s style (net10.0, CPM — version-less PackageReference):
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Serilog"/>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj"/>
</ItemGroup>
</Project>
```
(Match the exact `<PropertyGroup>`/Nullable/ImplicitUsings settings used by the sibling Core projects — copy from `Core.Abstractions.csproj`.)
**Step 2 — Move the four files** (preserve git history):
```bash
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptGlobals.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmPredicateContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
```
Do NOT change the namespaces inside the moved files.
**Step 3 — Wire references.** Add to the solution + the three consumer csprojs:
```bash
dotnet sln ZB.MOM.WW.OtOpcUa.slnx add src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj
```
In `Core.Scripting.csproj`, `Core.VirtualTags.csproj`, `Core.ScriptedAlarms.csproj` add (keep all their existing references — they still need `Core.Scripting`/Roslyn for the engines/evaluators):
```xml
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj"/>
```
(Watch for a reference cycle: the lean assembly must NOT reference `Core.Scripting`. `Core.Scripting` → lean is fine; lean → `Core.Scripting` is forbidden. The moved files don't need anything from `Core.Scripting`, so there's no cycle.)
**Step 4 — Build the whole solution + run the existing script/alarm suites (behaviour-preserving).**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: build succeeds (the moved types resolve through the new project).
Run: `dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ ; dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/ ; dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/`
Expected: all green (no behaviour change). If a test project references the moved types and the build complains, add the lean project reference to that test csproj too.
**Step 5 — Commit.** `git add -A && git commit -m "refactor(scripting): extract script-callable types into Roslyn-free Core.Scripting.Abstractions (A0)"`
**Acceptance:** solution builds; existing script/alarm/VirtualTag suites green; the lean assembly has no `Microsoft.CodeAnalysis` in its (transitive) references (`dotnet list src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/...csproj reference` shows only Abstractions; no Roslyn).
---
### Task 2: A0 measurement gate — re-run the probe, confirm ~11× drop
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (needs Task 1)
**Files:**
- Modify: `tools/mem-probe/MemProbe/MemProbe.csproj` (its references to `VirtualTagContext`/`ScriptGlobals` now resolve via the lean assembly — update the ProjectReference to `Core.Scripting.Abstractions`, drop the now-unnecessary heavy refs if any)
- (No production code.)
**Step 1 — Point the probe at the post-A0 types.** Update `MemProbe.csproj` so `ScriptGlobals<VirtualTagContext>` resolves from `Core.Scripting.Abstractions` (add that ProjectReference; the probe should no longer need `Core.VirtualTags`/`Core.Scripting` directly for the globals type). Keep the `Microsoft.CodeAnalysis.CSharp.Scripting` package ref (the probe still compiles scripts).
**Step 2 — Run heavy + lean, capture numbers.**
Run: `dotnet run -c Release --project tools/mem-probe/MemProbe -- heavy` (twice) and `... -- lean` (twice).
Expected: **heavy** (production `ScriptGlobals<VirtualTagContext>`) per-script RSS is now in the **~1.66 MiB** regime (was ~18 MiB before A0) — i.e. heavy ≈ lean, both ~1.66 MiB. That is the proof A0 worked on the production types.
**Step 3 — Record the result** in a short comment block at the top of `tools/mem-probe/MemProbe/Program.cs` (before/after numbers) and commit.
**Step 4 — Commit.** `git add tools/mem-probe && git commit -m "test(mem-probe): confirm A0 drops production per-script RSS ~11x (18->~1.66 MiB)"`
**Acceptance:** post-A0 heavy-mode per-script RSS ≤ ~3 MiB (down from ~18). If it did NOT drop, A0 is incomplete (a type in the globalsType graph still transitively reaches Roslyn) — STOP and report which assembly still pulls in `Microsoft.CodeAnalysis`.
---
### Task 3: A — passthrough fast-path in the evaluator
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (sequential build)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs` (`Evaluate`, before the `_cache.GetOrAdd(expression, …)` at line ~56)
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/RoslynVirtualTagEvaluatorTests.cs`
**Step 1 — Write failing tests.** (Read the existing tests for the harness + how `VirtualTagEvalResult` success/value is asserted.)
- `Passthrough_returns_dependency_value_without_compiling`: `Evaluate("vt", "return ctx.GetTag(\"a\").Value;", new Dictionary{["a"]=42})` returns success with value `42`, and the evaluator's compile cache stays empty (expose/inspect `_cache.Count` via a test hook or assert no compilation occurred — if the cache field isn't observable, assert behaviour: a syntactically-broken-but-passthrough-shaped script still returns the value, proving Roslyn never ran).
- `Passthrough_with_whitespace_variants_still_matches`: leading/trailing whitespace + spaces inside `GetTag( "a" )` still passthrough.
- `Non_passthrough_falls_through_to_Roslyn`: `"return (int)ctx.GetTag(\"a\").Value + 1;"` still compiles + returns `43`.
- `Passthrough_missing_dependency`: returns the same no-value/Bad result the Roslyn path produces when the dep is absent.
**Step 2 — Run, confirm fail.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/ --filter "FullyQualifiedName~RoslynVirtualTagEvaluator"`
**Step 3 — Implement.** In `Evaluate`, before the cache lookup:
```csharp
// A — passthrough fast-path: the mirror shape `return ctx.GetTag("X").Value;` is ~bytes; skip Roslyn
// entirely (no compile, no cached assembly). Narrow exact pattern so near-misses fall through.
if (TryMatchPassthrough(expression, out var passthroughRef))
{
if (!dependencies.TryGetValue(passthroughRef, out var value))
return /* the same no-value result the Roslyn path returns when GetTag finds nothing */;
return /* VirtualTagEvalResult success wrapping `value` — mirror the existing success wrap */;
}
```
Add the matcher (compiled static regex):
```csharp
private static readonly System.Text.RegularExpressions.Regex PassthroughRegex =
new(@"^\s*return\s+ctx\s*\.\s*GetTag\s*\(\s*""([^""]+)""\s*\)\s*\.\s*Value\s*;\s*$",
System.Text.RegularExpressions.RegexOptions.Compiled);
private static bool TryMatchPassthrough(string expression, out string tagRef)
{
var m = PassthroughRegex.Match(expression);
if (m.Success) { tagRef = m.Groups[1].Value; return true; }
tagRef = string.Empty; return false;
}
```
Match the EXACT `VirtualTagEvalResult` construction the Roslyn path uses for success and for not-found (read lines ~7498 of the file and reuse those code paths — do not invent a new result shape).
**Step 4 — Run, confirm pass** + the broader evaluator suite is green.
**Step 5 — Commit.** `git add … && git commit -m "feat(vtag): passthrough fast-path skips Roslyn for mirror scripts (A)"`
---
### Task 4: Warn-only deploy guardrail
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (sequential build)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (`HandleStartDeploymentAsync`, after the `DraftValidator` gate, before the seal — non-blocking)
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AdminOperationsActorTests.cs`
**Step 1 — Write failing tests.**
- `StartDeployment_warns_when_many_scripts_will_compile`: seed N (e.g. 10) distinct NON-passthrough `Script` rows; assert the deploy is still `Accepted` AND the `StartDeploymentResult.Message` contains a compile-cost advisory (e.g. `"scripts will compile"` / the estimated MiB). (If the actor test can't easily read the log, assert via the Message string.)
- `StartDeployment_passthrough_scripts_do_not_count`: seed only passthrough `Script` rows → no/zero compile advisory (Accepted, no warning text).
**Step 2 — Run, confirm fail.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ --filter "FullyQualifiedName~AdminOperations"`
**Step 3 — Implement.** After the `DraftValidator` gate (the `if (errors.Count > 0)` block from the deploy gate) and before `ConfigComposer.SnapshotAndFlattenAsync`, add a NON-blocking estimate. Reuse the SAME passthrough regex as Task 3 (extract it to a small shared internal helper, e.g. `ScriptCompileEstimator` in `Core.Scripting.Abstractions` or a static in ControlPlane — or replicate it with a sync-note, matching the `ExtractFullName` replication convention used elsewhere):
```csharp
const double PerScriptMiB = 1.66; // measured post-A0 per-script RSS
var scripts = await db.Scripts.AsNoTracking().Select(s => s.SourceCode).ToListAsync();
var compiled = scripts.Where(src => !IsPassthrough(src)).Distinct(StringComparer.Ordinal).Count();
if (compiled > 0)
{
var estMiB = compiled * PerScriptMiB;
_log.Warning("StartDeployment: {Compiled} script(s) will compile (~{EstMiB:F0} MiB RSS per node); ensure node mem_limit covers it", compiled, estMiB);
// append advisory to the Accepted message below
}
```
On the `Accepted` `StartDeploymentResult`, include the advisory in `Message` (e.g. `Message = compiled > 0 ? $"{compiled} scripts will compile (~{estMiB:F0} MiB/node)" : null`). **Never reject.** Keep the existing Accepted/seal/dispatch flow otherwise unchanged.
**Step 4 — Run, confirm pass** + the AdminOperations suite green + Host builds.
**Step 5 — Commit.** `git add … && git commit -m "feat(deploy): warn-only script-compile-cost advisory on deploy"`
---
### Task 5: Live verification — the real acceptance gate
**Classification:** standard
**Estimated implement time:** ~5 min (+ deploy/settle)
**Parallelizable with:** none (needs T1, T3, T4)
**Steps (no new code — the proof the outage is gone):**
1. Build the docker-dev image off this branch: `cd docker-dev && docker compose build central-1`.
2. Ensure the canonical company overlay is loaded (galaxy mirror 396 + 40 canonical equipment + 1036 vtags) — from `scadaproj/otopcua-uns-loader`: `populate` + `populate-equipment`. (The DB may already have it; verify `SELECT COUNT(*) FROM VirtualTag` = 1036.)
3. Recreate the admin nodes on the new image: `docker compose up -d --no-deps --force-recreate central-1 central-2` (they have a 2g `mem_limit` from `master` `89c07fc`).
4. Headless deploy: `curl -s -X POST http://localhost:9200/api/deployments -H 'X-Api-Key: docker-dev-deploy-key'` → expect **202 Accepted** with the compile-cost advisory (passthrough mirrors → ~0 compiled, so the advisory should be ~0/small).
5. **The gate:** watch the central node through materialise + value-stream and confirm it **stays under its 2g `mem_limit`** and is NOT `OOMKilled`: `docker inspect otopcua-dev-central-1-1 --format 'OOM={{.State.OOMKilled}}'` = false; `docker stats --no-stream otopcua-dev-central-1-1` well under 2 GiB; `:9200` healthy, `:4840` serving.
6. (Optional) `otopcua_uns.py verify-equipment --require-good 396 --wait` to confirm live values still flow.
**Acceptance:** the full 1036-vtag overlay deploys Accepted AND the central node survives materialisation under its 2g limit (no OOM) — the outage is gone. If it still OOMs, report `docker stats` peak + the central-1 log around the death (A may not be wired into the live evaluator path, or A0 didn't take in the built image).
**This task changes the running docker-dev stack (the user's active env).** Coordinate: recreate only the admin nodes; don't disrupt the site nodes.
---
## After all tasks
Run the affected suites (`Core.Scripting`, `Core.VirtualTags`, `Core.ScriptedAlarms`, Host-integration evaluator, ControlPlane) + build the solution, then use **superpowers-extended-cc:finishing-a-development-branch**: verify green, present merge/PR/keep/discard for `exp/vtag-script-memory`. Merge/push only on the user's explicit go (the user manages this repo's integration).
@@ -0,0 +1,11 @@
{
"planPath": "docs/plans/2026-06-07-virtualtag-script-memory.md",
"tasks": [
{"id": 1, "subject": "Task 1: A0 — extract script types into lean Roslyn-free assembly", "status": "pending", "classification": "high-risk"},
{"id": 2, "subject": "Task 2: A0 measurement gate — re-run probe, confirm ~11x drop", "status": "pending", "classification": "small", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: A — passthrough fast-path in evaluator", "status": "pending", "classification": "small", "blockedBy": [1]},
{"id": 4, "subject": "Task 4: warn-only deploy guardrail", "status": "pending", "classification": "standard", "blockedBy": [1]},
{"id": 5, "subject": "Task 5: live docker-dev verification (1036-vtag overlay, no OOM)", "status": "pending", "classification": "standard", "blockedBy": [1, 3, 4]}
],
"lastUpdated": "2026-06-07"
}