diff --git a/docs/plans/2026-06-18-m9-templates-authoring-design.md b/docs/plans/2026-06-18-m9-templates-authoring-design.md new file mode 100644 index 00000000..fc669e1f --- /dev/null +++ b/docs/plans/2026-06-18-m9-templates-authoring-design.md @@ -0,0 +1,160 @@ +# Design: M9 — Templates & Authoring (T22–T26, T28, T30–T32 + CLI cached-call Retry/Discard) + +**Date:** 2026-06-18 +**Status:** Approved (brainstorming session) — ready for writing-plans +**Milestone:** M9 of the system-completion roadmap (`docs/plans/2026-06-15-stillpending-completion-design.md` line 105) +**Branch:** `worktree-m9-templates-authoring` off `origin/main` @ `72aec3b4` +**Source backlog:** `stillpending.md` Tier 3 — "Templates / Data Connections / Triggers UI" + "Cached-call tracking" + +## Goal + +Deliver the in-scope authoring and templates backlog: make the template tree searchable and reorderable, let operators move and live-monitor data connections, surface multi-level inheritance + base-change staleness in the template editor, add an opt-in strict trigger-analysis mode, build schema-driven value entry (nested forms + Monaco hover/completion + a reusable `$ref` schema library), and put cached-call Retry/Discard on the CLI. One unified-outbox page item is explicitly deferred. + +## Scope + +### In scope (10 deliverables) +- **T22** — Template tree search/filter. +- **T23** — Folder sibling reorder + root-level context menu (menu-based; **no drag-drop**). +- **T24** — Move a data connection between sites. +- **T25** — Connection live-status indicators on the design page. +- **T26** — Base-template versioning *authoring*: multi-level inherited-member resolution in the editor + a read-only staleness banner. +- **T28** — Strict expression-trigger analysis *kind* (opt-in escalation). +- **T30** — Schema-driven nested value-entry forms. +- **T31** — Monaco JSON-Schema hover/completion on value-entry. +- **T32** — JSON Schema `$ref` resolver + a template-level schema library. +- **CLI** — `cached-call retry|discard` for site-local cached calls. + +### Deferred (logged follow-ups, not in M9) +- **Unified notifications + site-calls outbox page** — the two data models diverge hard (enum vs string status lifecycles, offset vs keyset pagination, `string` vs strong-typed GUID ids, asymmetric provenance). A true union view-model is high-risk for marginal operator gain. Deferred; the CLI Retry/Discard ships instead. +- **Folder drag-drop** (HTML5 DnD via JS interop) — `[PERM]`-tagged; menu-based reorder delivers the capability without the Blazor-Server interop fragility. +- **T27** (promote-derived-to-base, cross-tenant libraries) and **T29** (WhileTrue alarm trigger) remain excluded per the roadmap. + +## Locked decisions + +- **D1 — T26 is authoring-only.** Instance flattening already re-walks the full inheritance chain fresh on every deploy (`TemplateResolver.BuildInheritanceChain` is arbitrary-depth; `CycleDetector` covers inheritance/composition/cross-graph). The only real gap is in the *editor*: it loads the immediate base only, derived templates carry stale `IsInherited` placeholder rows, and there is no staleness signal. M9 closes that with a **read-only** resolve + staleness banner. **No stored-row mutation, no `RefreshDerivedTemplate` command.** +- **D2 — T32 is fully in scope.** Build a template-level schema library (new entity + idempotent migration + repo) and a custom `$ref` resolver. **No new NuGet package** — `Directory.Packages.props` has no JSON-schema library and CLAUDE.md forbids adding one; everything uses `System.Text.Json` and extends the existing `InboundApiSchema` parser. +- **D3 — Unified outbox page deferred;** ship the CLI Retry/Discard from this cluster. +- **D4 — T23 is menu-based:** sibling Move-up/Move-down (uses the existing `TemplateFolder.SortOrder`) + a root-level context menu (New Folder / New Template at root) + completing the folder context menu. No drag-drop. +- **D5 — Move-connection is guarded.** A `MoveDataConnectionCommand` succeeds only when: (a) the target site exists; (b) no name collision with an existing connection at the target site; (c) **no `InstanceConnectionBinding` references the connection** (instances are site-scoped — a bound connection cannot leave its site without orphaning the binding). On block, return a clear error naming the blocking instances. Also re-point/validate name-based references (`TemplateNativeAlarmSource.ConnectionName`, `InstanceNativeAlarmSourceOverride.ConnectionNameOverride`) for collisions. Every move emits an audit-log row. +- **D6 — T25 reuses existing health transport.** Health already flows DCL → `ISiteHealthCollector.UpdateConnectionHealth` → `SiteHealthReport.DataConnectionStatuses` (name→`ConnectionHealth`) → `ICentralHealthAggregator` → the Health page renders badges. M9 surfaces the same data on the *design* `DataConnections` page (per-node badge + ~10s poll). No new transport, no SignalR. +- **D7 — T28 is an opt-in escalation layer.** Expression triggers already get a real Roslyn semantic compile + forbidden-API + undefined-attribute analysis that **blocks deploy** (delivered in M2/M3). T28 adds a per-trigger `AnalysisKind` (default **Advisory** = today's behavior; **Strict** escalates the currently-advisory findings — blank expression, ambiguous coercion — to deploy-blocking errors). The increment is the toggle + the escalation branch; implementation right-sizes after confirming exact current behavior. + +## Current-state map (reconnaissance evidence) + +### Cluster A — Template tree UI +- `CentralUI/Components/Shared/TreeView.razor` — generic tree; external-filter model (R8), `ContextMenu` render-fragment (R15); **no built-in DnD**. +- `CentralUI/Components/Shared/TemplateFolderTree.razor:68` — already exposes a `Filter` parameter with recursive substring match + ancestor auto-expand (`ApplyFilter`/`CopyMatching`). +- `CentralUI/Components/Pages/Design/Templates.razor` — uses `TemplateFolderTree` but **wires no search box**; folder context menu present (New Folder/Template, Rename, Move…, Delete); `MoveFolderDialog.razor` exists. +- `Commons/Entities/Templates/TemplateFolder.cs:12` — has `SortOrder`. `TemplateEngine/Services/TemplateFolderService.cs` — Create/Rename/Move (cycle + collision checks)/Delete; **no sort-order update method**. +- `Commons/Messages/Management/TemplateFolderCommands.cs` — Create/Move/Rename/Delete commands; **no reorder command**. + +### Cluster B — Data connections +- `Commons/Entities/Sites/DataConnection.cs:8` — `SiteId` FK. `Commons/Messages/Management/DataConnectionCommands.cs` — Create/Update/Delete; `Update` does **not** change `SiteId`; **no move command**. +- `CentralUI/Components/Pages/Design/DataConnectionForm.razor` — site locked after creation. `DataConnections.razor` — site→connection tree, search box present, Edit/Delete actions; **no move, no health badge**. +- FK/blockers for move: `Instances/InstanceConnectionBinding.cs:12` (`DataConnectionId` FK), `Templates/TemplateNativeAlarmSource.cs:21` (`ConnectionName`, name-based), `Instances/InstanceNativeAlarmSourceOverride.cs:22` (`ConnectionNameOverride`, name-based). +- Health: `HealthMonitoring/ISiteHealthCollector.cs:58`, `Commons/Messages/Health/SiteHealthReport.cs:10` (`DataConnectionStatuses`), `ICentralHealthAggregator` (`GetSiteState`), `CentralUI/.../Monitoring/Health.razor` (existing badge render + `GetConnectionHealthBadge`). + +### Cluster C — Inheritance authoring +- `Commons/Entities/Templates/Template.cs` — `ParentTemplateId` (inheritance), `IsDerived` + `OwnerCompositionId` (composition-materialized slots). Members carry `IsInherited` + `LockedInDerived` flags (`TemplateAttribute`/`TemplateAlarm`/`TemplateScript`/`TemplateNativeAlarmSource`). +- `TemplateEngine/TemplateResolver.cs:119` — `BuildInheritanceChain` walks arbitrary depth (root-first), cycle-guarded. `TemplateEngine/Flattening/FlatteningService.cs` — derived wins, `IsInherited` placeholders skip in favor of the live base value. `CycleDetector.cs` — inheritance/composition/cross-graph checks on save. +- `TemplateEngine/Flattening/RevisionHashService.cs` — deterministic SHA-256 of flattened config (already used for staleness in Transport/M8 via `IStaleInstanceProbe`). +- `CentralUI/Components/Pages/Design/TemplateEdit.razor:58` — loads only the **immediate** base (`_baseTemplate`, `_baseAttributesByName`, …); no multi-level resolution, no staleness banner. +- `ManagementService/ManagementActor.cs:178` — template command block; **no resolve/update-derived command**. + +### Cluster D — Triggers + schema entry +- `TemplateEngine/Validation/ValidationService.cs:263` (`CheckExpressionTrigger`) — real Roslyn compile + forbidden-API + undefined-attribute checks; blank expression = warning; errors block deploy. Separate error/warning lists make selective escalation a clean seam. +- JSON Schema is canonical storage (migration `20260512211204_MigrateParametersToJsonSchema`); `Commons/Types/InboundApi/InboundApiSchema.cs` (`Parse`/`ParseSchema` recursive, depth-capped; `Validate`). `CentralUI/Components/Shared/SchemaBuilder.razor` authors schemas; `ParameterValueForm.razor:52` renders scalars but falls back to a **JSON textarea** for object/list. Monaco already integrated (`MonacoEditor.razor`). **No `$ref` resolution anywhere; no schema-library entity.** `Directory.Packages.props` — no JSON-schema package (System.Text.Json only). + +### Cluster E — CLI cached-call Retry/Discard +- Backend relay fully exists: `ManagementActor.cs:220,380` (`RetryParkedMessageCommand`/`DiscardParkedMessageCommand`, Deployer-gated) → `SiteCallAuditActor.cs:877,909` (`HandleRetrySiteCall`/`HandleDiscardSiteCall` → `RetryParkedOperation`/`DiscardParkedOperation` relay) → site. Central UI Site Calls page already uses it. +- CLI pattern: `CLI/Commands/CommandHelpers.cs:34` (`ExecuteCommandAsync` → `ManagementHttpClient.SendCommandAsync`), command-name via `ManagementCommandRegistry.GetCommandName`. Model on `NotificationCommands.cs`. **No cached-call command group today** — must verify the registry maps the two commands. + +## Design by feature + +### T22 — Template tree search (small) +Add a search `` to `Templates.razor`, bound to a local field, passed to `TemplateFolderTree.Filter`. The recursive filter + auto-expand already exist. UI-only — no service/entity/command change. Clear-filter restores the full tree and prior expansion state. + +### T23 — Folder reorder + context menus (standard) +- `TemplateFolderService.ReorderFolderAsync(folderId, direction, user)` (or `MoveUp`/`MoveDown`) — swap `SortOrder` with the adjacent sibling under the same parent; no-op at the ends. New `ReorderTemplateFolderCommand` + ManagementActor handler (Designer-gated, matching the other folder commands). +- Sibling loads ordered by `SortOrder` (then Name) everywhere the folder tree is built. +- `Templates.razor` — Move-up/Move-down items in the folder context menu; a **root-level** context menu (right-click empty/root → New Folder, New Template at root). Complete any missing folder-menu items. + +### T24 — Move connection between sites (high-risk) +- `MoveDataConnectionCommand(DataConnectionId, TargetSiteId)` + ManagementActor `HandleMoveDataConnection` (Designer-gated). +- Guards (D5), all server-side: target site exists; no name collision at target; **reject if any `InstanceConnectionBinding` references the connection** with an error naming blockers; validate name-based native-alarm-source references won't collide/orphan at the target. +- Persist via the existing `ISiteRepository.UpdateDataConnectionAsync` (sets `SiteId`); emit an audit row. +- UI: a "Move to Site…" action + `MoveDataConnectionDialog` (target-site picker, error surface) on `DataConnections.razor`. + +### T25 — Connection live-status (standard) +- A central-side query (extend the health query service or inject `ICentralHealthAggregator`) returning a `connectionId → ConnectionHealth` map for a site: read the latest `SiteHealthReport.DataConnectionStatuses`, resolve names→ids via the repo. +- `DataConnections.razor` — render a health badge per connection node (reuse `GetConnectionHealthBadge`/`AlarmStateBadges`-style classes), refresh on a ~10s poll timer (mirror the Health page). Register the injected service in the existing `DataConnections` bUnit fixtures. + +### T26 — Inheritance authoring resolve + staleness banner (high-risk) +- A resolve service/method that, given a derived or child template, walks the full inheritance chain (`BuildInheritanceChain`) and returns the **effective inherited member set** — including base members added *after* the derived template was created, across ≥2 inheritance levels — annotated per member with origin (own override / inherited-from-X / locked). +- A new read-only query command (e.g. `GetResolvedTemplateMembersCommand`) + ManagementActor handler returning that set (plus a staleness summary). +- `TemplateEdit.razor` renders the **full** resolved inherited set (not just the immediate base) and a read-only banner when the stored derived rows differ from the freshly-resolved chain ("Base changed — N inherited members differ"). **No mutation** — flattening at deploy is already correct; the banner is informational and the editor's own override actions are unchanged. + +### T28 — Strict expression-trigger kind (small) +- Add `AnalysisKind` (Advisory default / Strict) to the trigger config (carried in the existing `TriggerConfiguration` JSON or a small dedicated field — chosen to stay additive to the flattened model and avoid a migration if feasible). +- `CheckExpressionTrigger` — when Strict, promote the currently-advisory findings to errors (deploy-blocking); Advisory preserves today's behavior exactly. +- Trigger editor selector (alarm/script trigger UI) + CLI flag (`--trigger-kind`/`--strict`). Right-size after confirming exact current advisory set. + +### T30 — Schema-driven nested forms (standard) +- Extend `ParameterValueForm.razor` to recursively render object fields and list items as typed inputs (replacing the JSON textarea for object/list), driven by the parsed `InboundApiSchema` (including `$ref`-resolved schemas from T32). Per-field validation via `InboundApiSchema.Validate`; collect to canonical JSON. Re-register in existing fixtures. + +### T31 — Monaco hover/completion (standard) +- Feed the resolved JSON Schema to the existing Monaco editor's JSON language config so the value-entry JSON surface gets schema-driven hover + completion. Reuses `MonacoEditor.razor`; no new package (Monaco's built-in JSON schema support). + +### T32 — `$ref` resolver + template-level schema library (high-risk; build first) +- New `SharedSchema` entity (Id, Name unique, optional scope, `SchemaJson`) + EF config + **idempotent** migration + repository. +- Custom `$ref` resolver in `InboundApiSchema.Parse` (resolve `{"$ref":"lib:Name"}`-style pointers to library entries; depth/cycle-guarded, System.Text.Json only). +- ManagementActor CRUD commands (Designer-gated) + a Central UI schema-library page (reuse `SchemaBuilder`). +- Deploy-time validation that every `$ref` target exists (block on dangling ref), wired into the existing validation pipeline. + +### CLI — cached-call Retry/Discard (small) +- New `CachedCallCommands.cs` (`cached-call retry|discard --site-id … --tracked-operation-id …`) calling the existing Deployer-gated `RetryParkedMessageCommand`/`DiscardParkedMessageCommand` via `CommandHelpers.ExecuteCommandAsync`. **Verify `ManagementCommandRegistry` maps both command names** (the CLI's `GetCommandName` depends on it). Update `CLI/README.md` + `Component-CLI.md`. + +## Dependencies & wave plan + +Execute **subagent-driven** in the `worktree-m9-templates-authoring` worktree. Implementers do **not** create worktrees; commit **pathspec** form (`-m` before `--`, never `git add -A`); keep ≤2–3 concurrent committers with a post-wave HEAD-presence check; targeted builds/tests per task; full-solution build + docker rebuild only at integration. + +- **Wave 1 (low-risk, parallel — disjoint files):** T22 (`Templates.razor`) ‖ CLI Retry/Discard (CLI) ‖ T28 (Template Engine validation + trigger editor). +- **Wave 2:** T23 (`Templates.razor`, after T22 — same file) ‖ T25 (`DataConnections.razor`, additive). +- **Wave 3:** T24 (`DataConnections.razor`, after T25 — same file) ‖ T32 foundation (entity + migration + `$ref` resolver — Commons/ConfigDB/ManagementActor). +- **Wave 4:** T30 ‖ T31 (consume the resolver) ‖ T26 (`TemplateEdit.razor` + resolve service). +- **Wave 5 — integration.** + +Classifications: T24, T26, T32, integration = **high-risk**; T23, T25, T30, T31 = **standard**; T22, T28, CLI = **small**. + +## Integration (first-class verification phase) + +Per `integration-catches-cross-cutting-gaps`: +- Full-solution `dotnet build ZB.MOM.WW.ScadaBridge.slnx`; **EF model-drift check** for the new `SharedSchema` entity (the M2-pre `PendingModelChangesWarning` lesson — idempotent migration, no pending changes). +- **Trace every new ManagementActor command end-to-end** through the registry + handler routing: `ReorderTemplateFolderCommand`, `MoveDataConnectionCommand`, the `GetResolvedTemplateMembersCommand`, the schema-library CRUD commands, and the CLI's two cached-call commands (confirm `ManagementCommandRegistry` mappings so the CLI resolves names). +- **Re-run the full bUnit suites of every shared component touched** (TreeView, `TemplateFolderTree`, `TemplateEdit`, `DataConnections`, `ParameterValueForm`, `SchemaBuilder`) — register substitutes for any newly-injected service in their existing fixtures. +- `bash docker/deploy.sh` rebuild + `/health/ready` smoke on central-a/central-b/LB; Playwright coverage for the new UI surfaces (search, reorder menu, move dialog, connection health badge, schema library, schema-driven form). + +## Testing strategy + +- **T22:** filter unit/bUnit (match, auto-expand, clear). +- **T23:** reorder swap (ends no-op, ordering persists); root-menu render. +- **T24:** guard tests — binding-blocks (error names instances), name-collision-blocks, success path; audit row asserted. +- **T25:** health map query (name→id resolution, missing report), badge render. +- **T26:** multi-level chain (A→B→C), base member added after derive shows in editor, locked member display, staleness banner true/false; adversarial chain/composition-derived cases. +- **T28:** Advisory preserves current pass/fail; Strict escalates each advisory finding to a deploy-block. +- **T30:** nested object/list render + per-field validation (incl. `$ref`-resolved schema). +- **T31:** schema fed to Monaco (smoke/bUnit where feasible). +- **T32:** `$ref` resolution (valid, dangling→deploy-block, depth/cycle guard); migration idempotency; CRUD round-trip. +- **CLI:** command-name registry mapping; retry/discard happy-path + not-parked/unreachable mapping. + +## Risks + +- **T32 migration ↔ EF model drift** — idempotent migration + a model-drift assertion in integration. +- **T26 resolution semantics** on multi-level + locked + composition-derived templates — adversarial chain tests; keep it strictly read-only to avoid any deploy-path regression. +- **T28 may be near-complete** — confirm the exact current advisory set before sizing; the deliverable is the toggle + escalation, not re-building analysis. +- **Shared-component injection regressions** (T25/T26/T30 inject into reused components) — the integration wave re-runs each touched component's full fixture suite. +- **CLI registry gap** — if `RetryParkedMessageCommand`/`DiscardParkedMessageCommand` aren't registered for name resolution, the CLI call fails; verified in Wave 1 and re-asserted at integration. + +## Next step + +Hand off to the writing-plans skill to produce the bite-sized, per-task implementation plan and `.tasks.json`, then execute subagent-driven wave-by-wave. Finish via finishing-a-development-branch (FF-merge to main + push, no force; docker rebuild to match main).