docs(plan): VirtualTag/script memory Phase-1 implementation plan (A0+A+guardrail)
This commit is contained in:
@@ -0,0 +1,205 @@
|
||||
# VirtualTag/Script Memory — Phase 1 Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Make VirtualTag scripts scale in memory — cut Roslyn's ~18 MiB/script (mostly unmanaged) to ~1.66 MiB by isolating the script globals type from Roslyn (A0), skip Roslyn entirely for the mirror passthrough shape (A), and warn on deploy when a config will compile many scripts (guardrail). Phase 2 (the C2 interpreter) is spec-only in the design doc — NOT in this plan.
|
||||
|
||||
**Architecture:** A0 extracts the script-callable types (`ScriptContext`, `ScriptGlobals<T>`, `VirtualTagContext`, `AlarmPredicateContext`) into a new lean assembly that never references Roslyn, so Roslyn's reference manager stops materialising the Roslyn metadata closure per compile. A adds a passthrough short-circuit in the evaluator. The guardrail is a non-blocking warning in the deploy handler.
|
||||
|
||||
**Tech Stack:** .NET 10, Microsoft.CodeAnalysis.CSharp.Scripting (Roslyn 4.12), Akka.NET, xUnit + Shouldly. Solution: `ZB.MOM.WW.OtOpcUa.slnx`, Central Package Management (`Directory.Packages.props`). Branch: `exp/vtag-script-memory` (off `master`). Design: `docs/plans/2026-06-07-virtualtag-script-memory-design.md`.
|
||||
|
||||
**Sequencing:** T1 (A0 split) first — it changes assembly layout the rest builds on. Then T2 (A0 measurement gate), T3 (A passthrough), T4 (guardrail), T5 (live verify). Run sequentially (shared working tree + build).
|
||||
|
||||
---
|
||||
|
||||
### Task 1: A0 — extract script-callable types into a lean, Roslyn-free assembly
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~6 min (assembly split; can't be atomically subdivided — must build green as a unit)
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:**
|
||||
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj`
|
||||
- Move (git mv, keep namespaces unchanged): `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptContext.cs` and `.../ScriptGlobals.cs` → the new project dir; `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagContext.cs` → new project dir; `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmPredicateContext.cs` → new project dir
|
||||
- Modify: `.../Core.Scripting/ZB.MOM.WW.OtOpcUa.Core.Scripting.csproj`, `.../Core.VirtualTags/...csproj`, `.../Core.ScriptedAlarms/...csproj` (add ProjectReference to the new assembly)
|
||||
- Modify: `ZB.MOM.WW.OtOpcUa.slnx` (add the new project — use `dotnet sln add`)
|
||||
|
||||
**Why these four:** the production script globals type is `ScriptGlobals<VirtualTagContext>`. Roslyn's reference manager loads the transitive reference closure of *every assembly in the globalsType's type graph*. Today `ScriptGlobals`+`ScriptContext` are in `Core.Scripting` (→ Roslyn) and `VirtualTagContext`/`AlarmPredicateContext` are in assemblies that reference `Core.Scripting` (→ Roslyn). Moving all four into one lean assembly that references only `Core.Abstractions` (+ Serilog) makes the whole closure Roslyn-free. (Confirmed: all four files are Roslyn-free — `ScriptContext` uses only `Serilog` + `Core.Abstractions`; `ScriptGlobals` has no usings; the two contexts use `Serilog`, `Core.Abstractions`, `Core.Scripting`(only for the `ScriptContext` base, which is moving with them).) **Keep each file's existing namespace** (`Core.Scripting`, `Core.VirtualTags`, `Core.ScriptedAlarms`) so NO caller `using` statements change — a namespace may span assemblies. Only project references change.
|
||||
|
||||
**Step 1 — Create the lean project.** `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj`, mirroring `Core.Abstractions.csproj`'s style (net10.0, CPM — version-less PackageReference):
|
||||
```xml
|
||||
<Project Sdk="Microsoft.NET.Sdk">
|
||||
<PropertyGroup>
|
||||
<TargetFramework>net10.0</TargetFramework>
|
||||
</PropertyGroup>
|
||||
<ItemGroup>
|
||||
<PackageReference Include="Serilog"/>
|
||||
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj"/>
|
||||
</ItemGroup>
|
||||
</Project>
|
||||
```
|
||||
(Match the exact `<PropertyGroup>`/Nullable/ImplicitUsings settings used by the sibling Core projects — copy from `Core.Abstractions.csproj`.)
|
||||
|
||||
**Step 2 — Move the four files** (preserve git history):
|
||||
```bash
|
||||
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
|
||||
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptGlobals.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
|
||||
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
|
||||
git mv src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmPredicateContext.cs src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/
|
||||
```
|
||||
Do NOT change the namespaces inside the moved files.
|
||||
|
||||
**Step 3 — Wire references.** Add to the solution + the three consumer csprojs:
|
||||
```bash
|
||||
dotnet sln ZB.MOM.WW.OtOpcUa.slnx add src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj
|
||||
```
|
||||
In `Core.Scripting.csproj`, `Core.VirtualTags.csproj`, `Core.ScriptedAlarms.csproj` add (keep all their existing references — they still need `Core.Scripting`/Roslyn for the engines/evaluators):
|
||||
```xml
|
||||
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions.csproj"/>
|
||||
```
|
||||
(Watch for a reference cycle: the lean assembly must NOT reference `Core.Scripting`. `Core.Scripting` → lean is fine; lean → `Core.Scripting` is forbidden. The moved files don't need anything from `Core.Scripting`, so there's no cycle.)
|
||||
|
||||
**Step 4 — Build the whole solution + run the existing script/alarm suites (behaviour-preserving).**
|
||||
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
|
||||
Expected: build succeeds (the moved types resolve through the new project).
|
||||
Run: `dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ ; dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/ ; dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/`
|
||||
Expected: all green (no behaviour change). If a test project references the moved types and the build complains, add the lean project reference to that test csproj too.
|
||||
|
||||
**Step 5 — Commit.** `git add -A && git commit -m "refactor(scripting): extract script-callable types into Roslyn-free Core.Scripting.Abstractions (A0)"`
|
||||
|
||||
**Acceptance:** solution builds; existing script/alarm/VirtualTag suites green; the lean assembly has no `Microsoft.CodeAnalysis` in its (transitive) references (`dotnet list src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Abstractions/...csproj reference` shows only Abstractions; no Roslyn).
|
||||
|
||||
---
|
||||
|
||||
### Task 2: A0 measurement gate — re-run the probe, confirm ~11× drop
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none (needs Task 1)
|
||||
|
||||
**Files:**
|
||||
- Modify: `tools/mem-probe/MemProbe/MemProbe.csproj` (its references to `VirtualTagContext`/`ScriptGlobals` now resolve via the lean assembly — update the ProjectReference to `Core.Scripting.Abstractions`, drop the now-unnecessary heavy refs if any)
|
||||
- (No production code.)
|
||||
|
||||
**Step 1 — Point the probe at the post-A0 types.** Update `MemProbe.csproj` so `ScriptGlobals<VirtualTagContext>` resolves from `Core.Scripting.Abstractions` (add that ProjectReference; the probe should no longer need `Core.VirtualTags`/`Core.Scripting` directly for the globals type). Keep the `Microsoft.CodeAnalysis.CSharp.Scripting` package ref (the probe still compiles scripts).
|
||||
|
||||
**Step 2 — Run heavy + lean, capture numbers.**
|
||||
Run: `dotnet run -c Release --project tools/mem-probe/MemProbe -- heavy` (twice) and `... -- lean` (twice).
|
||||
Expected: **heavy** (production `ScriptGlobals<VirtualTagContext>`) per-script RSS is now in the **~1.66 MiB** regime (was ~18 MiB before A0) — i.e. heavy ≈ lean, both ~1.66 MiB. That is the proof A0 worked on the production types.
|
||||
|
||||
**Step 3 — Record the result** in a short comment block at the top of `tools/mem-probe/MemProbe/Program.cs` (before/after numbers) and commit.
|
||||
|
||||
**Step 4 — Commit.** `git add tools/mem-probe && git commit -m "test(mem-probe): confirm A0 drops production per-script RSS ~11x (18->~1.66 MiB)"`
|
||||
|
||||
**Acceptance:** post-A0 heavy-mode per-script RSS ≤ ~3 MiB (down from ~18). If it did NOT drop, A0 is incomplete (a type in the globalsType graph still transitively reaches Roslyn) — STOP and report which assembly still pulls in `Microsoft.CodeAnalysis`.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: A — passthrough fast-path in the evaluator
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none (sequential build)
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs` (`Evaluate`, before the `_cache.GetOrAdd(expression, …)` at line ~56)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/RoslynVirtualTagEvaluatorTests.cs`
|
||||
|
||||
**Step 1 — Write failing tests.** (Read the existing tests for the harness + how `VirtualTagEvalResult` success/value is asserted.)
|
||||
- `Passthrough_returns_dependency_value_without_compiling`: `Evaluate("vt", "return ctx.GetTag(\"a\").Value;", new Dictionary{["a"]=42})` returns success with value `42`, and the evaluator's compile cache stays empty (expose/inspect `_cache.Count` via a test hook or assert no compilation occurred — if the cache field isn't observable, assert behaviour: a syntactically-broken-but-passthrough-shaped script still returns the value, proving Roslyn never ran).
|
||||
- `Passthrough_with_whitespace_variants_still_matches`: leading/trailing whitespace + spaces inside `GetTag( "a" )` still passthrough.
|
||||
- `Non_passthrough_falls_through_to_Roslyn`: `"return (int)ctx.GetTag(\"a\").Value + 1;"` still compiles + returns `43`.
|
||||
- `Passthrough_missing_dependency`: returns the same no-value/Bad result the Roslyn path produces when the dep is absent.
|
||||
|
||||
**Step 2 — Run, confirm fail.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/ --filter "FullyQualifiedName~RoslynVirtualTagEvaluator"`
|
||||
|
||||
**Step 3 — Implement.** In `Evaluate`, before the cache lookup:
|
||||
```csharp
|
||||
// A — passthrough fast-path: the mirror shape `return ctx.GetTag("X").Value;` is ~bytes; skip Roslyn
|
||||
// entirely (no compile, no cached assembly). Narrow exact pattern so near-misses fall through.
|
||||
if (TryMatchPassthrough(expression, out var passthroughRef))
|
||||
{
|
||||
if (!dependencies.TryGetValue(passthroughRef, out var value))
|
||||
return /* the same no-value result the Roslyn path returns when GetTag finds nothing */;
|
||||
return /* VirtualTagEvalResult success wrapping `value` — mirror the existing success wrap */;
|
||||
}
|
||||
```
|
||||
Add the matcher (compiled static regex):
|
||||
```csharp
|
||||
private static readonly System.Text.RegularExpressions.Regex PassthroughRegex =
|
||||
new(@"^\s*return\s+ctx\s*\.\s*GetTag\s*\(\s*""([^""]+)""\s*\)\s*\.\s*Value\s*;\s*$",
|
||||
System.Text.RegularExpressions.RegexOptions.Compiled);
|
||||
private static bool TryMatchPassthrough(string expression, out string tagRef)
|
||||
{
|
||||
var m = PassthroughRegex.Match(expression);
|
||||
if (m.Success) { tagRef = m.Groups[1].Value; return true; }
|
||||
tagRef = string.Empty; return false;
|
||||
}
|
||||
```
|
||||
Match the EXACT `VirtualTagEvalResult` construction the Roslyn path uses for success and for not-found (read lines ~74–98 of the file and reuse those code paths — do not invent a new result shape).
|
||||
|
||||
**Step 4 — Run, confirm pass** + the broader evaluator suite is green.
|
||||
|
||||
**Step 5 — Commit.** `git add … && git commit -m "feat(vtag): passthrough fast-path skips Roslyn for mirror scripts (A)"`
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Warn-only deploy guardrail
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (sequential build)
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (`HandleStartDeploymentAsync`, after the `DraftValidator` gate, before the seal — non-blocking)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AdminOperationsActorTests.cs`
|
||||
|
||||
**Step 1 — Write failing tests.**
|
||||
- `StartDeployment_warns_when_many_scripts_will_compile`: seed N (e.g. 10) distinct NON-passthrough `Script` rows; assert the deploy is still `Accepted` AND the `StartDeploymentResult.Message` contains a compile-cost advisory (e.g. `"scripts will compile"` / the estimated MiB). (If the actor test can't easily read the log, assert via the Message string.)
|
||||
- `StartDeployment_passthrough_scripts_do_not_count`: seed only passthrough `Script` rows → no/zero compile advisory (Accepted, no warning text).
|
||||
|
||||
**Step 2 — Run, confirm fail.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ --filter "FullyQualifiedName~AdminOperations"`
|
||||
|
||||
**Step 3 — Implement.** After the `DraftValidator` gate (the `if (errors.Count > 0)` block from the deploy gate) and before `ConfigComposer.SnapshotAndFlattenAsync`, add a NON-blocking estimate. Reuse the SAME passthrough regex as Task 3 (extract it to a small shared internal helper, e.g. `ScriptCompileEstimator` in `Core.Scripting.Abstractions` or a static in ControlPlane — or replicate it with a sync-note, matching the `ExtractFullName` replication convention used elsewhere):
|
||||
```csharp
|
||||
const double PerScriptMiB = 1.66; // measured post-A0 per-script RSS
|
||||
var scripts = await db.Scripts.AsNoTracking().Select(s => s.SourceCode).ToListAsync();
|
||||
var compiled = scripts.Where(src => !IsPassthrough(src)).Distinct(StringComparer.Ordinal).Count();
|
||||
if (compiled > 0)
|
||||
{
|
||||
var estMiB = compiled * PerScriptMiB;
|
||||
_log.Warning("StartDeployment: {Compiled} script(s) will compile (~{EstMiB:F0} MiB RSS per node); ensure node mem_limit covers it", compiled, estMiB);
|
||||
// append advisory to the Accepted message below
|
||||
}
|
||||
```
|
||||
On the `Accepted` `StartDeploymentResult`, include the advisory in `Message` (e.g. `Message = compiled > 0 ? $"{compiled} scripts will compile (~{estMiB:F0} MiB/node)" : null`). **Never reject.** Keep the existing Accepted/seal/dispatch flow otherwise unchanged.
|
||||
|
||||
**Step 4 — Run, confirm pass** + the AdminOperations suite green + Host builds.
|
||||
|
||||
**Step 5 — Commit.** `git add … && git commit -m "feat(deploy): warn-only script-compile-cost advisory on deploy"`
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Live verification — the real acceptance gate
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min (+ deploy/settle)
|
||||
**Parallelizable with:** none (needs T1, T3, T4)
|
||||
|
||||
**Steps (no new code — the proof the outage is gone):**
|
||||
1. Build the docker-dev image off this branch: `cd docker-dev && docker compose build central-1`.
|
||||
2. Ensure the canonical company overlay is loaded (galaxy mirror 396 + 40 canonical equipment + 1036 vtags) — from `scadaproj/otopcua-uns-loader`: `populate` + `populate-equipment`. (The DB may already have it; verify `SELECT COUNT(*) FROM VirtualTag` = 1036.)
|
||||
3. Recreate the admin nodes on the new image: `docker compose up -d --no-deps --force-recreate central-1 central-2` (they have a 2g `mem_limit` from `master` `89c07fc`).
|
||||
4. Headless deploy: `curl -s -X POST http://localhost:9200/api/deployments -H 'X-Api-Key: docker-dev-deploy-key'` → expect **202 Accepted** with the compile-cost advisory (passthrough mirrors → ~0 compiled, so the advisory should be ~0/small).
|
||||
5. **The gate:** watch the central node through materialise + value-stream and confirm it **stays under its 2g `mem_limit`** and is NOT `OOMKilled`: `docker inspect otopcua-dev-central-1-1 --format 'OOM={{.State.OOMKilled}}'` = false; `docker stats --no-stream otopcua-dev-central-1-1` well under 2 GiB; `:9200` healthy, `:4840` serving.
|
||||
6. (Optional) `otopcua_uns.py verify-equipment --require-good 396 --wait` to confirm live values still flow.
|
||||
|
||||
**Acceptance:** the full 1036-vtag overlay deploys Accepted AND the central node survives materialisation under its 2g limit (no OOM) — the outage is gone. If it still OOMs, report `docker stats` peak + the central-1 log around the death (A may not be wired into the live evaluator path, or A0 didn't take in the built image).
|
||||
|
||||
**This task changes the running docker-dev stack (the user's active env).** Coordinate: recreate only the admin nodes; don't disrupt the site nodes.
|
||||
|
||||
---
|
||||
|
||||
## After all tasks
|
||||
|
||||
Run the affected suites (`Core.Scripting`, `Core.VirtualTags`, `Core.ScriptedAlarms`, Host-integration evaluator, ControlPlane) + build the solution, then use **superpowers-extended-cc:finishing-a-development-branch**: verify green, present merge/PR/keep/discard for `exp/vtag-script-memory`. Merge/push only on the user's explicit go (the user manages this repo's integration).
|
||||
@@ -0,0 +1,11 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-07-virtualtag-script-memory.md",
|
||||
"tasks": [
|
||||
{"id": 1, "subject": "Task 1: A0 — extract script types into lean Roslyn-free assembly", "status": "pending", "classification": "high-risk"},
|
||||
{"id": 2, "subject": "Task 2: A0 measurement gate — re-run probe, confirm ~11x drop", "status": "pending", "classification": "small", "blockedBy": [1]},
|
||||
{"id": 3, "subject": "Task 3: A — passthrough fast-path in evaluator", "status": "pending", "classification": "small", "blockedBy": [1]},
|
||||
{"id": 4, "subject": "Task 4: warn-only deploy guardrail", "status": "pending", "classification": "standard", "blockedBy": [1]},
|
||||
{"id": 5, "subject": "Task 5: live docker-dev verification (1036-vtag overlay, no OOM)", "status": "pending", "classification": "standard", "blockedBy": [1, 3, 4]}
|
||||
],
|
||||
"lastUpdated": "2026-06-07"
|
||||
}
|
||||
Reference in New Issue
Block a user