Files
ScadaBridge/docs/plans/2026-06-18-m9-templates-authoring-design.md
T
Joseph Doherty 4b152958df docs(m9): approved design — Templates & authoring (T22-T26, T28, T30-T32 + CLI cached-call retry/discard)
Locked decisions: T26 authoring-only (resolve + staleness banner, no stored-row
mutation); T32 full ($ref resolver + template-level schema library, no new package);
unified-outbox page deferred (CLI retry/discard ships instead); T23 menu-based
reorder + root context menu (no drag-drop); guarded move-connection; reuse existing
health transport for live status; T28 opt-in strict escalation layer.
2026-06-18 10:01:38 -04:00

161 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design: M9 — Templates & Authoring (T22T26, T28, T30T32 + CLI cached-call Retry/Discard)
**Date:** 2026-06-18
**Status:** Approved (brainstorming session) — ready for writing-plans
**Milestone:** M9 of the system-completion roadmap (`docs/plans/2026-06-15-stillpending-completion-design.md` line 105)
**Branch:** `worktree-m9-templates-authoring` off `origin/main` @ `72aec3b4`
**Source backlog:** `stillpending.md` Tier 3 — "Templates / Data Connections / Triggers UI" + "Cached-call tracking"
## Goal
Deliver the in-scope authoring and templates backlog: make the template tree searchable and reorderable, let operators move and live-monitor data connections, surface multi-level inheritance + base-change staleness in the template editor, add an opt-in strict trigger-analysis mode, build schema-driven value entry (nested forms + Monaco hover/completion + a reusable `$ref` schema library), and put cached-call Retry/Discard on the CLI. One unified-outbox page item is explicitly deferred.
## Scope
### In scope (10 deliverables)
- **T22** — Template tree search/filter.
- **T23** — Folder sibling reorder + root-level context menu (menu-based; **no drag-drop**).
- **T24** — Move a data connection between sites.
- **T25** — Connection live-status indicators on the design page.
- **T26** — Base-template versioning *authoring*: multi-level inherited-member resolution in the editor + a read-only staleness banner.
- **T28** — Strict expression-trigger analysis *kind* (opt-in escalation).
- **T30** — Schema-driven nested value-entry forms.
- **T31** — Monaco JSON-Schema hover/completion on value-entry.
- **T32** — JSON Schema `$ref` resolver + a template-level schema library.
- **CLI** — `cached-call retry|discard` for site-local cached calls.
### Deferred (logged follow-ups, not in M9)
- **Unified notifications + site-calls outbox page** — the two data models diverge hard (enum vs string status lifecycles, offset vs keyset pagination, `string` vs strong-typed GUID ids, asymmetric provenance). A true union view-model is high-risk for marginal operator gain. Deferred; the CLI Retry/Discard ships instead.
- **Folder drag-drop** (HTML5 DnD via JS interop) — `[PERM]`-tagged; menu-based reorder delivers the capability without the Blazor-Server interop fragility.
- **T27** (promote-derived-to-base, cross-tenant libraries) and **T29** (WhileTrue alarm trigger) remain excluded per the roadmap.
## Locked decisions
- **D1 — T26 is authoring-only.** Instance flattening already re-walks the full inheritance chain fresh on every deploy (`TemplateResolver.BuildInheritanceChain` is arbitrary-depth; `CycleDetector` covers inheritance/composition/cross-graph). The only real gap is in the *editor*: it loads the immediate base only, derived templates carry stale `IsInherited` placeholder rows, and there is no staleness signal. M9 closes that with a **read-only** resolve + staleness banner. **No stored-row mutation, no `RefreshDerivedTemplate` command.**
- **D2 — T32 is fully in scope.** Build a template-level schema library (new entity + idempotent migration + repo) and a custom `$ref` resolver. **No new NuGet package**`Directory.Packages.props` has no JSON-schema library and CLAUDE.md forbids adding one; everything uses `System.Text.Json` and extends the existing `InboundApiSchema` parser.
- **D3 — Unified outbox page deferred;** ship the CLI Retry/Discard from this cluster.
- **D4 — T23 is menu-based:** sibling Move-up/Move-down (uses the existing `TemplateFolder.SortOrder`) + a root-level context menu (New Folder / New Template at root) + completing the folder context menu. No drag-drop.
- **D5 — Move-connection is guarded.** A `MoveDataConnectionCommand` succeeds only when: (a) the target site exists; (b) no name collision with an existing connection at the target site; (c) **no `InstanceConnectionBinding` references the connection** (instances are site-scoped — a bound connection cannot leave its site without orphaning the binding). On block, return a clear error naming the blocking instances. Also re-point/validate name-based references (`TemplateNativeAlarmSource.ConnectionName`, `InstanceNativeAlarmSourceOverride.ConnectionNameOverride`) for collisions. Every move emits an audit-log row.
- **D6 — T25 reuses existing health transport.** Health already flows DCL → `ISiteHealthCollector.UpdateConnectionHealth``SiteHealthReport.DataConnectionStatuses` (name→`ConnectionHealth`) → `ICentralHealthAggregator` → the Health page renders badges. M9 surfaces the same data on the *design* `DataConnections` page (per-node badge + ~10s poll). No new transport, no SignalR.
- **D7 — T28 is an opt-in escalation layer.** Expression triggers already get a real Roslyn semantic compile + forbidden-API + undefined-attribute analysis that **blocks deploy** (delivered in M2/M3). T28 adds a per-trigger `AnalysisKind` (default **Advisory** = today's behavior; **Strict** escalates the currently-advisory findings — blank expression, ambiguous coercion — to deploy-blocking errors). The increment is the toggle + the escalation branch; implementation right-sizes after confirming exact current behavior.
## Current-state map (reconnaissance evidence)
### Cluster A — Template tree UI
- `CentralUI/Components/Shared/TreeView.razor` — generic tree; external-filter model (R8), `ContextMenu` render-fragment (R15); **no built-in DnD**.
- `CentralUI/Components/Shared/TemplateFolderTree.razor:68` — already exposes a `Filter` parameter with recursive substring match + ancestor auto-expand (`ApplyFilter`/`CopyMatching`).
- `CentralUI/Components/Pages/Design/Templates.razor` — uses `TemplateFolderTree` but **wires no search box**; folder context menu present (New Folder/Template, Rename, Move…, Delete); `MoveFolderDialog.razor` exists.
- `Commons/Entities/Templates/TemplateFolder.cs:12` — has `SortOrder`. `TemplateEngine/Services/TemplateFolderService.cs` — Create/Rename/Move (cycle + collision checks)/Delete; **no sort-order update method**.
- `Commons/Messages/Management/TemplateFolderCommands.cs` — Create/Move/Rename/Delete commands; **no reorder command**.
### Cluster B — Data connections
- `Commons/Entities/Sites/DataConnection.cs:8``SiteId` FK. `Commons/Messages/Management/DataConnectionCommands.cs` — Create/Update/Delete; `Update` does **not** change `SiteId`; **no move command**.
- `CentralUI/Components/Pages/Design/DataConnectionForm.razor` — site locked after creation. `DataConnections.razor` — site→connection tree, search box present, Edit/Delete actions; **no move, no health badge**.
- FK/blockers for move: `Instances/InstanceConnectionBinding.cs:12` (`DataConnectionId` FK), `Templates/TemplateNativeAlarmSource.cs:21` (`ConnectionName`, name-based), `Instances/InstanceNativeAlarmSourceOverride.cs:22` (`ConnectionNameOverride`, name-based).
- Health: `HealthMonitoring/ISiteHealthCollector.cs:58`, `Commons/Messages/Health/SiteHealthReport.cs:10` (`DataConnectionStatuses`), `ICentralHealthAggregator` (`GetSiteState`), `CentralUI/.../Monitoring/Health.razor` (existing badge render + `GetConnectionHealthBadge`).
### Cluster C — Inheritance authoring
- `Commons/Entities/Templates/Template.cs``ParentTemplateId` (inheritance), `IsDerived` + `OwnerCompositionId` (composition-materialized slots). Members carry `IsInherited` + `LockedInDerived` flags (`TemplateAttribute`/`TemplateAlarm`/`TemplateScript`/`TemplateNativeAlarmSource`).
- `TemplateEngine/TemplateResolver.cs:119``BuildInheritanceChain` walks arbitrary depth (root-first), cycle-guarded. `TemplateEngine/Flattening/FlatteningService.cs` — derived wins, `IsInherited` placeholders skip in favor of the live base value. `CycleDetector.cs` — inheritance/composition/cross-graph checks on save.
- `TemplateEngine/Flattening/RevisionHashService.cs` — deterministic SHA-256 of flattened config (already used for staleness in Transport/M8 via `IStaleInstanceProbe`).
- `CentralUI/Components/Pages/Design/TemplateEdit.razor:58` — loads only the **immediate** base (`_baseTemplate`, `_baseAttributesByName`, …); no multi-level resolution, no staleness banner.
- `ManagementService/ManagementActor.cs:178` — template command block; **no resolve/update-derived command**.
### Cluster D — Triggers + schema entry
- `TemplateEngine/Validation/ValidationService.cs:263` (`CheckExpressionTrigger`) — real Roslyn compile + forbidden-API + undefined-attribute checks; blank expression = warning; errors block deploy. Separate error/warning lists make selective escalation a clean seam.
- JSON Schema is canonical storage (migration `20260512211204_MigrateParametersToJsonSchema`); `Commons/Types/InboundApi/InboundApiSchema.cs` (`Parse`/`ParseSchema` recursive, depth-capped; `Validate`). `CentralUI/Components/Shared/SchemaBuilder.razor` authors schemas; `ParameterValueForm.razor:52` renders scalars but falls back to a **JSON textarea** for object/list. Monaco already integrated (`MonacoEditor.razor`). **No `$ref` resolution anywhere; no schema-library entity.** `Directory.Packages.props` — no JSON-schema package (System.Text.Json only).
### Cluster E — CLI cached-call Retry/Discard
- Backend relay fully exists: `ManagementActor.cs:220,380` (`RetryParkedMessageCommand`/`DiscardParkedMessageCommand`, Deployer-gated) → `SiteCallAuditActor.cs:877,909` (`HandleRetrySiteCall`/`HandleDiscardSiteCall``RetryParkedOperation`/`DiscardParkedOperation` relay) → site. Central UI Site Calls page already uses it.
- CLI pattern: `CLI/Commands/CommandHelpers.cs:34` (`ExecuteCommandAsync``ManagementHttpClient.SendCommandAsync`), command-name via `ManagementCommandRegistry.GetCommandName`. Model on `NotificationCommands.cs`. **No cached-call command group today** — must verify the registry maps the two commands.
## Design by feature
### T22 — Template tree search (small)
Add a search `<input>` to `Templates.razor`, bound to a local field, passed to `TemplateFolderTree.Filter`. The recursive filter + auto-expand already exist. UI-only — no service/entity/command change. Clear-filter restores the full tree and prior expansion state.
### T23 — Folder reorder + context menus (standard)
- `TemplateFolderService.ReorderFolderAsync(folderId, direction, user)` (or `MoveUp`/`MoveDown`) — swap `SortOrder` with the adjacent sibling under the same parent; no-op at the ends. New `ReorderTemplateFolderCommand` + ManagementActor handler (Designer-gated, matching the other folder commands).
- Sibling loads ordered by `SortOrder` (then Name) everywhere the folder tree is built.
- `Templates.razor` — Move-up/Move-down items in the folder context menu; a **root-level** context menu (right-click empty/root → New Folder, New Template at root). Complete any missing folder-menu items.
### T24 — Move connection between sites (high-risk)
- `MoveDataConnectionCommand(DataConnectionId, TargetSiteId)` + ManagementActor `HandleMoveDataConnection` (Designer-gated).
- Guards (D5), all server-side: target site exists; no name collision at target; **reject if any `InstanceConnectionBinding` references the connection** with an error naming blockers; validate name-based native-alarm-source references won't collide/orphan at the target.
- Persist via the existing `ISiteRepository.UpdateDataConnectionAsync` (sets `SiteId`); emit an audit row.
- UI: a "Move to Site…" action + `MoveDataConnectionDialog` (target-site picker, error surface) on `DataConnections.razor`.
### T25 — Connection live-status (standard)
- A central-side query (extend the health query service or inject `ICentralHealthAggregator`) returning a `connectionId → ConnectionHealth` map for a site: read the latest `SiteHealthReport.DataConnectionStatuses`, resolve names→ids via the repo.
- `DataConnections.razor` — render a health badge per connection node (reuse `GetConnectionHealthBadge`/`AlarmStateBadges`-style classes), refresh on a ~10s poll timer (mirror the Health page). Register the injected service in the existing `DataConnections` bUnit fixtures.
### T26 — Inheritance authoring resolve + staleness banner (high-risk)
- A resolve service/method that, given a derived or child template, walks the full inheritance chain (`BuildInheritanceChain`) and returns the **effective inherited member set** — including base members added *after* the derived template was created, across ≥2 inheritance levels — annotated per member with origin (own override / inherited-from-X / locked).
- A new read-only query command (e.g. `GetResolvedTemplateMembersCommand`) + ManagementActor handler returning that set (plus a staleness summary).
- `TemplateEdit.razor` renders the **full** resolved inherited set (not just the immediate base) and a read-only banner when the stored derived rows differ from the freshly-resolved chain ("Base changed — N inherited members differ"). **No mutation** — flattening at deploy is already correct; the banner is informational and the editor's own override actions are unchanged.
### T28 — Strict expression-trigger kind (small)
- Add `AnalysisKind` (Advisory default / Strict) to the trigger config (carried in the existing `TriggerConfiguration` JSON or a small dedicated field — chosen to stay additive to the flattened model and avoid a migration if feasible).
- `CheckExpressionTrigger` — when Strict, promote the currently-advisory findings to errors (deploy-blocking); Advisory preserves today's behavior exactly.
- Trigger editor selector (alarm/script trigger UI) + CLI flag (`--trigger-kind`/`--strict`). Right-size after confirming exact current advisory set.
### T30 — Schema-driven nested forms (standard)
- Extend `ParameterValueForm.razor` to recursively render object fields and list items as typed inputs (replacing the JSON textarea for object/list), driven by the parsed `InboundApiSchema` (including `$ref`-resolved schemas from T32). Per-field validation via `InboundApiSchema.Validate`; collect to canonical JSON. Re-register in existing fixtures.
### T31 — Monaco hover/completion (standard)
- Feed the resolved JSON Schema to the existing Monaco editor's JSON language config so the value-entry JSON surface gets schema-driven hover + completion. Reuses `MonacoEditor.razor`; no new package (Monaco's built-in JSON schema support).
### T32 — `$ref` resolver + template-level schema library (high-risk; build first)
- New `SharedSchema` entity (Id, Name unique, optional scope, `SchemaJson`) + EF config + **idempotent** migration + repository.
- Custom `$ref` resolver in `InboundApiSchema.Parse` (resolve `{"$ref":"lib:Name"}`-style pointers to library entries; depth/cycle-guarded, System.Text.Json only).
- ManagementActor CRUD commands (Designer-gated) + a Central UI schema-library page (reuse `SchemaBuilder`).
- Deploy-time validation that every `$ref` target exists (block on dangling ref), wired into the existing validation pipeline.
### CLI — cached-call Retry/Discard (small)
- New `CachedCallCommands.cs` (`cached-call retry|discard --site-id … --tracked-operation-id …`) calling the existing Deployer-gated `RetryParkedMessageCommand`/`DiscardParkedMessageCommand` via `CommandHelpers.ExecuteCommandAsync`. **Verify `ManagementCommandRegistry` maps both command names** (the CLI's `GetCommandName` depends on it). Update `CLI/README.md` + `Component-CLI.md`.
## Dependencies & wave plan
Execute **subagent-driven** in the `worktree-m9-templates-authoring` worktree. Implementers do **not** create worktrees; commit **pathspec** form (`-m` before `--`, never `git add -A`); keep ≤23 concurrent committers with a post-wave HEAD-presence check; targeted builds/tests per task; full-solution build + docker rebuild only at integration.
- **Wave 1 (low-risk, parallel — disjoint files):** T22 (`Templates.razor`) ‖ CLI Retry/Discard (CLI) ‖ T28 (Template Engine validation + trigger editor).
- **Wave 2:** T23 (`Templates.razor`, after T22 — same file) ‖ T25 (`DataConnections.razor`, additive).
- **Wave 3:** T24 (`DataConnections.razor`, after T25 — same file) ‖ T32 foundation (entity + migration + `$ref` resolver — Commons/ConfigDB/ManagementActor).
- **Wave 4:** T30 ‖ T31 (consume the resolver) ‖ T26 (`TemplateEdit.razor` + resolve service).
- **Wave 5 — integration.**
Classifications: T24, T26, T32, integration = **high-risk**; T23, T25, T30, T31 = **standard**; T22, T28, CLI = **small**.
## Integration (first-class verification phase)
Per `integration-catches-cross-cutting-gaps`:
- Full-solution `dotnet build ZB.MOM.WW.ScadaBridge.slnx`; **EF model-drift check** for the new `SharedSchema` entity (the M2-pre `PendingModelChangesWarning` lesson — idempotent migration, no pending changes).
- **Trace every new ManagementActor command end-to-end** through the registry + handler routing: `ReorderTemplateFolderCommand`, `MoveDataConnectionCommand`, the `GetResolvedTemplateMembersCommand`, the schema-library CRUD commands, and the CLI's two cached-call commands (confirm `ManagementCommandRegistry` mappings so the CLI resolves names).
- **Re-run the full bUnit suites of every shared component touched** (TreeView, `TemplateFolderTree`, `TemplateEdit`, `DataConnections`, `ParameterValueForm`, `SchemaBuilder`) — register substitutes for any newly-injected service in their existing fixtures.
- `bash docker/deploy.sh` rebuild + `/health/ready` smoke on central-a/central-b/LB; Playwright coverage for the new UI surfaces (search, reorder menu, move dialog, connection health badge, schema library, schema-driven form).
## Testing strategy
- **T22:** filter unit/bUnit (match, auto-expand, clear).
- **T23:** reorder swap (ends no-op, ordering persists); root-menu render.
- **T24:** guard tests — binding-blocks (error names instances), name-collision-blocks, success path; audit row asserted.
- **T25:** health map query (name→id resolution, missing report), badge render.
- **T26:** multi-level chain (A→B→C), base member added after derive shows in editor, locked member display, staleness banner true/false; adversarial chain/composition-derived cases.
- **T28:** Advisory preserves current pass/fail; Strict escalates each advisory finding to a deploy-block.
- **T30:** nested object/list render + per-field validation (incl. `$ref`-resolved schema).
- **T31:** schema fed to Monaco (smoke/bUnit where feasible).
- **T32:** `$ref` resolution (valid, dangling→deploy-block, depth/cycle guard); migration idempotency; CRUD round-trip.
- **CLI:** command-name registry mapping; retry/discard happy-path + not-parked/unreachable mapping.
## Risks
- **T32 migration ↔ EF model drift** — idempotent migration + a model-drift assertion in integration.
- **T26 resolution semantics** on multi-level + locked + composition-derived templates — adversarial chain tests; keep it strictly read-only to avoid any deploy-path regression.
- **T28 may be near-complete** — confirm the exact current advisory set before sizing; the deliverable is the toggle + escalation, not re-building analysis.
- **Shared-component injection regressions** (T25/T26/T30 inject into reused components) — the integration wave re-runs each touched component's full fixture suite.
- **CLI registry gap** — if `RetryParkedMessageCommand`/`DiscardParkedMessageCommand` aren't registered for name resolution, the CLI call fails; verified in Wave 1 and re-asserted at integration.
## Next step
Hand off to the writing-plans skill to produce the bite-sized, per-task implementation plan and `.tasks.json`, then execute subagent-driven wave-by-wave. Finish via finishing-a-development-branch (FF-merge to main + push, no force; docker rebuild to match main).