Locked decisions: T26 authoring-only (resolve + staleness banner, no stored-row mutation); T32 full ($ref resolver + template-level schema library, no new package); unified-outbox page deferred (CLI retry/discard ships instead); T23 menu-based reorder + root context menu (no drag-drop); guarded move-connection; reuse existing health transport for live status; T28 opt-in strict escalation layer.
19 KiB
Design: M9 — Templates & Authoring (T22–T26, T28, T30–T32 + CLI cached-call Retry/Discard)
Date: 2026-06-18
Status: Approved (brainstorming session) — ready for writing-plans
Milestone: M9 of the system-completion roadmap (docs/plans/2026-06-15-stillpending-completion-design.md line 105)
Branch: worktree-m9-templates-authoring off origin/main @ 72aec3b4
Source backlog: stillpending.md Tier 3 — "Templates / Data Connections / Triggers UI" + "Cached-call tracking"
Goal
Deliver the in-scope authoring and templates backlog: make the template tree searchable and reorderable, let operators move and live-monitor data connections, surface multi-level inheritance + base-change staleness in the template editor, add an opt-in strict trigger-analysis mode, build schema-driven value entry (nested forms + Monaco hover/completion + a reusable $ref schema library), and put cached-call Retry/Discard on the CLI. One unified-outbox page item is explicitly deferred.
Scope
In scope (10 deliverables)
- T22 — Template tree search/filter.
- T23 — Folder sibling reorder + root-level context menu (menu-based; no drag-drop).
- T24 — Move a data connection between sites.
- T25 — Connection live-status indicators on the design page.
- T26 — Base-template versioning authoring: multi-level inherited-member resolution in the editor + a read-only staleness banner.
- T28 — Strict expression-trigger analysis kind (opt-in escalation).
- T30 — Schema-driven nested value-entry forms.
- T31 — Monaco JSON-Schema hover/completion on value-entry.
- T32 — JSON Schema
$refresolver + a template-level schema library. - CLI —
cached-call retry|discardfor site-local cached calls.
Deferred (logged follow-ups, not in M9)
- Unified notifications + site-calls outbox page — the two data models diverge hard (enum vs string status lifecycles, offset vs keyset pagination,
stringvs strong-typed GUID ids, asymmetric provenance). A true union view-model is high-risk for marginal operator gain. Deferred; the CLI Retry/Discard ships instead. - Folder drag-drop (HTML5 DnD via JS interop) —
[PERM]-tagged; menu-based reorder delivers the capability without the Blazor-Server interop fragility. - T27 (promote-derived-to-base, cross-tenant libraries) and T29 (WhileTrue alarm trigger) remain excluded per the roadmap.
Locked decisions
- D1 — T26 is authoring-only. Instance flattening already re-walks the full inheritance chain fresh on every deploy (
TemplateResolver.BuildInheritanceChainis arbitrary-depth;CycleDetectorcovers inheritance/composition/cross-graph). The only real gap is in the editor: it loads the immediate base only, derived templates carry staleIsInheritedplaceholder rows, and there is no staleness signal. M9 closes that with a read-only resolve + staleness banner. No stored-row mutation, noRefreshDerivedTemplatecommand. - D2 — T32 is fully in scope. Build a template-level schema library (new entity + idempotent migration + repo) and a custom
$refresolver. No new NuGet package —Directory.Packages.propshas no JSON-schema library and CLAUDE.md forbids adding one; everything usesSystem.Text.Jsonand extends the existingInboundApiSchemaparser. - D3 — Unified outbox page deferred; ship the CLI Retry/Discard from this cluster.
- D4 — T23 is menu-based: sibling Move-up/Move-down (uses the existing
TemplateFolder.SortOrder) + a root-level context menu (New Folder / New Template at root) + completing the folder context menu. No drag-drop. - D5 — Move-connection is guarded. A
MoveDataConnectionCommandsucceeds only when: (a) the target site exists; (b) no name collision with an existing connection at the target site; (c) noInstanceConnectionBindingreferences the connection (instances are site-scoped — a bound connection cannot leave its site without orphaning the binding). On block, return a clear error naming the blocking instances. Also re-point/validate name-based references (TemplateNativeAlarmSource.ConnectionName,InstanceNativeAlarmSourceOverride.ConnectionNameOverride) for collisions. Every move emits an audit-log row. - D6 — T25 reuses existing health transport. Health already flows DCL →
ISiteHealthCollector.UpdateConnectionHealth→SiteHealthReport.DataConnectionStatuses(name→ConnectionHealth) →ICentralHealthAggregator→ the Health page renders badges. M9 surfaces the same data on the designDataConnectionspage (per-node badge + ~10s poll). No new transport, no SignalR. - D7 — T28 is an opt-in escalation layer. Expression triggers already get a real Roslyn semantic compile + forbidden-API + undefined-attribute analysis that blocks deploy (delivered in M2/M3). T28 adds a per-trigger
AnalysisKind(default Advisory = today's behavior; Strict escalates the currently-advisory findings — blank expression, ambiguous coercion — to deploy-blocking errors). The increment is the toggle + the escalation branch; implementation right-sizes after confirming exact current behavior.
Current-state map (reconnaissance evidence)
Cluster A — Template tree UI
CentralUI/Components/Shared/TreeView.razor— generic tree; external-filter model (R8),ContextMenurender-fragment (R15); no built-in DnD.CentralUI/Components/Shared/TemplateFolderTree.razor:68— already exposes aFilterparameter with recursive substring match + ancestor auto-expand (ApplyFilter/CopyMatching).CentralUI/Components/Pages/Design/Templates.razor— usesTemplateFolderTreebut wires no search box; folder context menu present (New Folder/Template, Rename, Move…, Delete);MoveFolderDialog.razorexists.Commons/Entities/Templates/TemplateFolder.cs:12— hasSortOrder.TemplateEngine/Services/TemplateFolderService.cs— Create/Rename/Move (cycle + collision checks)/Delete; no sort-order update method.Commons/Messages/Management/TemplateFolderCommands.cs— Create/Move/Rename/Delete commands; no reorder command.
Cluster B — Data connections
Commons/Entities/Sites/DataConnection.cs:8—SiteIdFK.Commons/Messages/Management/DataConnectionCommands.cs— Create/Update/Delete;Updatedoes not changeSiteId; no move command.CentralUI/Components/Pages/Design/DataConnectionForm.razor— site locked after creation.DataConnections.razor— site→connection tree, search box present, Edit/Delete actions; no move, no health badge.- FK/blockers for move:
Instances/InstanceConnectionBinding.cs:12(DataConnectionIdFK),Templates/TemplateNativeAlarmSource.cs:21(ConnectionName, name-based),Instances/InstanceNativeAlarmSourceOverride.cs:22(ConnectionNameOverride, name-based). - Health:
HealthMonitoring/ISiteHealthCollector.cs:58,Commons/Messages/Health/SiteHealthReport.cs:10(DataConnectionStatuses),ICentralHealthAggregator(GetSiteState),CentralUI/.../Monitoring/Health.razor(existing badge render +GetConnectionHealthBadge).
Cluster C — Inheritance authoring
Commons/Entities/Templates/Template.cs—ParentTemplateId(inheritance),IsDerived+OwnerCompositionId(composition-materialized slots). Members carryIsInherited+LockedInDerivedflags (TemplateAttribute/TemplateAlarm/TemplateScript/TemplateNativeAlarmSource).TemplateEngine/TemplateResolver.cs:119—BuildInheritanceChainwalks arbitrary depth (root-first), cycle-guarded.TemplateEngine/Flattening/FlatteningService.cs— derived wins,IsInheritedplaceholders skip in favor of the live base value.CycleDetector.cs— inheritance/composition/cross-graph checks on save.TemplateEngine/Flattening/RevisionHashService.cs— deterministic SHA-256 of flattened config (already used for staleness in Transport/M8 viaIStaleInstanceProbe).CentralUI/Components/Pages/Design/TemplateEdit.razor:58— loads only the immediate base (_baseTemplate,_baseAttributesByName, …); no multi-level resolution, no staleness banner.ManagementService/ManagementActor.cs:178— template command block; no resolve/update-derived command.
Cluster D — Triggers + schema entry
TemplateEngine/Validation/ValidationService.cs:263(CheckExpressionTrigger) — real Roslyn compile + forbidden-API + undefined-attribute checks; blank expression = warning; errors block deploy. Separate error/warning lists make selective escalation a clean seam.- JSON Schema is canonical storage (migration
20260512211204_MigrateParametersToJsonSchema);Commons/Types/InboundApi/InboundApiSchema.cs(Parse/ParseSchemarecursive, depth-capped;Validate).CentralUI/Components/Shared/SchemaBuilder.razorauthors schemas;ParameterValueForm.razor:52renders scalars but falls back to a JSON textarea for object/list. Monaco already integrated (MonacoEditor.razor). No$refresolution anywhere; no schema-library entity.Directory.Packages.props— no JSON-schema package (System.Text.Json only).
Cluster E — CLI cached-call Retry/Discard
- Backend relay fully exists:
ManagementActor.cs:220,380(RetryParkedMessageCommand/DiscardParkedMessageCommand, Deployer-gated) →SiteCallAuditActor.cs:877,909(HandleRetrySiteCall/HandleDiscardSiteCall→RetryParkedOperation/DiscardParkedOperationrelay) → site. Central UI Site Calls page already uses it. - CLI pattern:
CLI/Commands/CommandHelpers.cs:34(ExecuteCommandAsync→ManagementHttpClient.SendCommandAsync), command-name viaManagementCommandRegistry.GetCommandName. Model onNotificationCommands.cs. No cached-call command group today — must verify the registry maps the two commands.
Design by feature
T22 — Template tree search (small)
Add a search <input> to Templates.razor, bound to a local field, passed to TemplateFolderTree.Filter. The recursive filter + auto-expand already exist. UI-only — no service/entity/command change. Clear-filter restores the full tree and prior expansion state.
T23 — Folder reorder + context menus (standard)
TemplateFolderService.ReorderFolderAsync(folderId, direction, user)(orMoveUp/MoveDown) — swapSortOrderwith the adjacent sibling under the same parent; no-op at the ends. NewReorderTemplateFolderCommand+ ManagementActor handler (Designer-gated, matching the other folder commands).- Sibling loads ordered by
SortOrder(then Name) everywhere the folder tree is built. Templates.razor— Move-up/Move-down items in the folder context menu; a root-level context menu (right-click empty/root → New Folder, New Template at root). Complete any missing folder-menu items.
T24 — Move connection between sites (high-risk)
MoveDataConnectionCommand(DataConnectionId, TargetSiteId)+ ManagementActorHandleMoveDataConnection(Designer-gated).- Guards (D5), all server-side: target site exists; no name collision at target; reject if any
InstanceConnectionBindingreferences the connection with an error naming blockers; validate name-based native-alarm-source references won't collide/orphan at the target. - Persist via the existing
ISiteRepository.UpdateDataConnectionAsync(setsSiteId); emit an audit row. - UI: a "Move to Site…" action +
MoveDataConnectionDialog(target-site picker, error surface) onDataConnections.razor.
T25 — Connection live-status (standard)
- A central-side query (extend the health query service or inject
ICentralHealthAggregator) returning aconnectionId → ConnectionHealthmap for a site: read the latestSiteHealthReport.DataConnectionStatuses, resolve names→ids via the repo. DataConnections.razor— render a health badge per connection node (reuseGetConnectionHealthBadge/AlarmStateBadges-style classes), refresh on a ~10s poll timer (mirror the Health page). Register the injected service in the existingDataConnectionsbUnit fixtures.
T26 — Inheritance authoring resolve + staleness banner (high-risk)
- A resolve service/method that, given a derived or child template, walks the full inheritance chain (
BuildInheritanceChain) and returns the effective inherited member set — including base members added after the derived template was created, across ≥2 inheritance levels — annotated per member with origin (own override / inherited-from-X / locked). - A new read-only query command (e.g.
GetResolvedTemplateMembersCommand) + ManagementActor handler returning that set (plus a staleness summary). TemplateEdit.razorrenders the full resolved inherited set (not just the immediate base) and a read-only banner when the stored derived rows differ from the freshly-resolved chain ("Base changed — N inherited members differ"). No mutation — flattening at deploy is already correct; the banner is informational and the editor's own override actions are unchanged.
T28 — Strict expression-trigger kind (small)
- Add
AnalysisKind(Advisory default / Strict) to the trigger config (carried in the existingTriggerConfigurationJSON or a small dedicated field — chosen to stay additive to the flattened model and avoid a migration if feasible). CheckExpressionTrigger— when Strict, promote the currently-advisory findings to errors (deploy-blocking); Advisory preserves today's behavior exactly.- Trigger editor selector (alarm/script trigger UI) + CLI flag (
--trigger-kind/--strict). Right-size after confirming exact current advisory set.
T30 — Schema-driven nested forms (standard)
- Extend
ParameterValueForm.razorto recursively render object fields and list items as typed inputs (replacing the JSON textarea for object/list), driven by the parsedInboundApiSchema(including$ref-resolved schemas from T32). Per-field validation viaInboundApiSchema.Validate; collect to canonical JSON. Re-register in existing fixtures.
T31 — Monaco hover/completion (standard)
- Feed the resolved JSON Schema to the existing Monaco editor's JSON language config so the value-entry JSON surface gets schema-driven hover + completion. Reuses
MonacoEditor.razor; no new package (Monaco's built-in JSON schema support).
T32 — $ref resolver + template-level schema library (high-risk; build first)
- New
SharedSchemaentity (Id, Name unique, optional scope,SchemaJson) + EF config + idempotent migration + repository. - Custom
$refresolver inInboundApiSchema.Parse(resolve{"$ref":"lib:Name"}-style pointers to library entries; depth/cycle-guarded, System.Text.Json only). - ManagementActor CRUD commands (Designer-gated) + a Central UI schema-library page (reuse
SchemaBuilder). - Deploy-time validation that every
$reftarget exists (block on dangling ref), wired into the existing validation pipeline.
CLI — cached-call Retry/Discard (small)
- New
CachedCallCommands.cs(cached-call retry|discard --site-id … --tracked-operation-id …) calling the existing Deployer-gatedRetryParkedMessageCommand/DiscardParkedMessageCommandviaCommandHelpers.ExecuteCommandAsync. VerifyManagementCommandRegistrymaps both command names (the CLI'sGetCommandNamedepends on it). UpdateCLI/README.md+Component-CLI.md.
Dependencies & wave plan
Execute subagent-driven in the worktree-m9-templates-authoring worktree. Implementers do not create worktrees; commit pathspec form (-m before --, never git add -A); keep ≤2–3 concurrent committers with a post-wave HEAD-presence check; targeted builds/tests per task; full-solution build + docker rebuild only at integration.
- Wave 1 (low-risk, parallel — disjoint files): T22 (
Templates.razor) ‖ CLI Retry/Discard (CLI) ‖ T28 (Template Engine validation + trigger editor). - Wave 2: T23 (
Templates.razor, after T22 — same file) ‖ T25 (DataConnections.razor, additive). - Wave 3: T24 (
DataConnections.razor, after T25 — same file) ‖ T32 foundation (entity + migration +$refresolver — Commons/ConfigDB/ManagementActor). - Wave 4: T30 ‖ T31 (consume the resolver) ‖ T26 (
TemplateEdit.razor+ resolve service). - Wave 5 — integration.
Classifications: T24, T26, T32, integration = high-risk; T23, T25, T30, T31 = standard; T22, T28, CLI = small.
Integration (first-class verification phase)
Per integration-catches-cross-cutting-gaps:
- Full-solution
dotnet build ZB.MOM.WW.ScadaBridge.slnx; EF model-drift check for the newSharedSchemaentity (the M2-prePendingModelChangesWarninglesson — idempotent migration, no pending changes). - Trace every new ManagementActor command end-to-end through the registry + handler routing:
ReorderTemplateFolderCommand,MoveDataConnectionCommand, theGetResolvedTemplateMembersCommand, the schema-library CRUD commands, and the CLI's two cached-call commands (confirmManagementCommandRegistrymappings so the CLI resolves names). - Re-run the full bUnit suites of every shared component touched (TreeView,
TemplateFolderTree,TemplateEdit,DataConnections,ParameterValueForm,SchemaBuilder) — register substitutes for any newly-injected service in their existing fixtures. bash docker/deploy.shrebuild +/health/readysmoke on central-a/central-b/LB; Playwright coverage for the new UI surfaces (search, reorder menu, move dialog, connection health badge, schema library, schema-driven form).
Testing strategy
- T22: filter unit/bUnit (match, auto-expand, clear).
- T23: reorder swap (ends no-op, ordering persists); root-menu render.
- T24: guard tests — binding-blocks (error names instances), name-collision-blocks, success path; audit row asserted.
- T25: health map query (name→id resolution, missing report), badge render.
- T26: multi-level chain (A→B→C), base member added after derive shows in editor, locked member display, staleness banner true/false; adversarial chain/composition-derived cases.
- T28: Advisory preserves current pass/fail; Strict escalates each advisory finding to a deploy-block.
- T30: nested object/list render + per-field validation (incl.
$ref-resolved schema). - T31: schema fed to Monaco (smoke/bUnit where feasible).
- T32:
$refresolution (valid, dangling→deploy-block, depth/cycle guard); migration idempotency; CRUD round-trip. - CLI: command-name registry mapping; retry/discard happy-path + not-parked/unreachable mapping.
Risks
- T32 migration ↔ EF model drift — idempotent migration + a model-drift assertion in integration.
- T26 resolution semantics on multi-level + locked + composition-derived templates — adversarial chain tests; keep it strictly read-only to avoid any deploy-path regression.
- T28 may be near-complete — confirm the exact current advisory set before sizing; the deliverable is the toggle + escalation, not re-building analysis.
- Shared-component injection regressions (T25/T26/T30 inject into reused components) — the integration wave re-runs each touched component's full fixture suite.
- CLI registry gap — if
RetryParkedMessageCommand/DiscardParkedMessageCommandaren't registered for name resolution, the CLI call fails; verified in Wave 1 and re-asserted at integration.
Next step
Hand off to the writing-plans skill to produce the bite-sized, per-task implementation plan and .tasks.json, then execute subagent-driven wave-by-wave. Finish via finishing-a-development-branch (FF-merge to main + push, no force; docker rebuild to match main).