Files
ScadaBridge/docs/plans/2026-06-18-m9-templates-authoring-design.md
T
Joseph Doherty 4b152958df docs(m9): approved design — Templates & authoring (T22-T26, T28, T30-T32 + CLI cached-call retry/discard)
Locked decisions: T26 authoring-only (resolve + staleness banner, no stored-row
mutation); T32 full ($ref resolver + template-level schema library, no new package);
unified-outbox page deferred (CLI retry/discard ships instead); T23 menu-based
reorder + root context menu (no drag-drop); guarded move-connection; reuse existing
health transport for live status; T28 opt-in strict escalation layer.
2026-06-18 10:01:38 -04:00

19 KiB
Raw Blame History

Design: M9 — Templates & Authoring (T22T26, T28, T30T32 + CLI cached-call Retry/Discard)

Date: 2026-06-18 Status: Approved (brainstorming session) — ready for writing-plans Milestone: M9 of the system-completion roadmap (docs/plans/2026-06-15-stillpending-completion-design.md line 105) Branch: worktree-m9-templates-authoring off origin/main @ 72aec3b4 Source backlog: stillpending.md Tier 3 — "Templates / Data Connections / Triggers UI" + "Cached-call tracking"

Goal

Deliver the in-scope authoring and templates backlog: make the template tree searchable and reorderable, let operators move and live-monitor data connections, surface multi-level inheritance + base-change staleness in the template editor, add an opt-in strict trigger-analysis mode, build schema-driven value entry (nested forms + Monaco hover/completion + a reusable $ref schema library), and put cached-call Retry/Discard on the CLI. One unified-outbox page item is explicitly deferred.

Scope

In scope (10 deliverables)

  • T22 — Template tree search/filter.
  • T23 — Folder sibling reorder + root-level context menu (menu-based; no drag-drop).
  • T24 — Move a data connection between sites.
  • T25 — Connection live-status indicators on the design page.
  • T26 — Base-template versioning authoring: multi-level inherited-member resolution in the editor + a read-only staleness banner.
  • T28 — Strict expression-trigger analysis kind (opt-in escalation).
  • T30 — Schema-driven nested value-entry forms.
  • T31 — Monaco JSON-Schema hover/completion on value-entry.
  • T32 — JSON Schema $ref resolver + a template-level schema library.
  • CLIcached-call retry|discard for site-local cached calls.

Deferred (logged follow-ups, not in M9)

  • Unified notifications + site-calls outbox page — the two data models diverge hard (enum vs string status lifecycles, offset vs keyset pagination, string vs strong-typed GUID ids, asymmetric provenance). A true union view-model is high-risk for marginal operator gain. Deferred; the CLI Retry/Discard ships instead.
  • Folder drag-drop (HTML5 DnD via JS interop) — [PERM]-tagged; menu-based reorder delivers the capability without the Blazor-Server interop fragility.
  • T27 (promote-derived-to-base, cross-tenant libraries) and T29 (WhileTrue alarm trigger) remain excluded per the roadmap.

Locked decisions

  • D1 — T26 is authoring-only. Instance flattening already re-walks the full inheritance chain fresh on every deploy (TemplateResolver.BuildInheritanceChain is arbitrary-depth; CycleDetector covers inheritance/composition/cross-graph). The only real gap is in the editor: it loads the immediate base only, derived templates carry stale IsInherited placeholder rows, and there is no staleness signal. M9 closes that with a read-only resolve + staleness banner. No stored-row mutation, no RefreshDerivedTemplate command.
  • D2 — T32 is fully in scope. Build a template-level schema library (new entity + idempotent migration + repo) and a custom $ref resolver. No new NuGet packageDirectory.Packages.props has no JSON-schema library and CLAUDE.md forbids adding one; everything uses System.Text.Json and extends the existing InboundApiSchema parser.
  • D3 — Unified outbox page deferred; ship the CLI Retry/Discard from this cluster.
  • D4 — T23 is menu-based: sibling Move-up/Move-down (uses the existing TemplateFolder.SortOrder) + a root-level context menu (New Folder / New Template at root) + completing the folder context menu. No drag-drop.
  • D5 — Move-connection is guarded. A MoveDataConnectionCommand succeeds only when: (a) the target site exists; (b) no name collision with an existing connection at the target site; (c) no InstanceConnectionBinding references the connection (instances are site-scoped — a bound connection cannot leave its site without orphaning the binding). On block, return a clear error naming the blocking instances. Also re-point/validate name-based references (TemplateNativeAlarmSource.ConnectionName, InstanceNativeAlarmSourceOverride.ConnectionNameOverride) for collisions. Every move emits an audit-log row.
  • D6 — T25 reuses existing health transport. Health already flows DCL → ISiteHealthCollector.UpdateConnectionHealthSiteHealthReport.DataConnectionStatuses (name→ConnectionHealth) → ICentralHealthAggregator → the Health page renders badges. M9 surfaces the same data on the design DataConnections page (per-node badge + ~10s poll). No new transport, no SignalR.
  • D7 — T28 is an opt-in escalation layer. Expression triggers already get a real Roslyn semantic compile + forbidden-API + undefined-attribute analysis that blocks deploy (delivered in M2/M3). T28 adds a per-trigger AnalysisKind (default Advisory = today's behavior; Strict escalates the currently-advisory findings — blank expression, ambiguous coercion — to deploy-blocking errors). The increment is the toggle + the escalation branch; implementation right-sizes after confirming exact current behavior.

Current-state map (reconnaissance evidence)

Cluster A — Template tree UI

  • CentralUI/Components/Shared/TreeView.razor — generic tree; external-filter model (R8), ContextMenu render-fragment (R15); no built-in DnD.
  • CentralUI/Components/Shared/TemplateFolderTree.razor:68 — already exposes a Filter parameter with recursive substring match + ancestor auto-expand (ApplyFilter/CopyMatching).
  • CentralUI/Components/Pages/Design/Templates.razor — uses TemplateFolderTree but wires no search box; folder context menu present (New Folder/Template, Rename, Move…, Delete); MoveFolderDialog.razor exists.
  • Commons/Entities/Templates/TemplateFolder.cs:12 — has SortOrder. TemplateEngine/Services/TemplateFolderService.cs — Create/Rename/Move (cycle + collision checks)/Delete; no sort-order update method.
  • Commons/Messages/Management/TemplateFolderCommands.cs — Create/Move/Rename/Delete commands; no reorder command.

Cluster B — Data connections

  • Commons/Entities/Sites/DataConnection.cs:8SiteId FK. Commons/Messages/Management/DataConnectionCommands.cs — Create/Update/Delete; Update does not change SiteId; no move command.
  • CentralUI/Components/Pages/Design/DataConnectionForm.razor — site locked after creation. DataConnections.razor — site→connection tree, search box present, Edit/Delete actions; no move, no health badge.
  • FK/blockers for move: Instances/InstanceConnectionBinding.cs:12 (DataConnectionId FK), Templates/TemplateNativeAlarmSource.cs:21 (ConnectionName, name-based), Instances/InstanceNativeAlarmSourceOverride.cs:22 (ConnectionNameOverride, name-based).
  • Health: HealthMonitoring/ISiteHealthCollector.cs:58, Commons/Messages/Health/SiteHealthReport.cs:10 (DataConnectionStatuses), ICentralHealthAggregator (GetSiteState), CentralUI/.../Monitoring/Health.razor (existing badge render + GetConnectionHealthBadge).

Cluster C — Inheritance authoring

  • Commons/Entities/Templates/Template.csParentTemplateId (inheritance), IsDerived + OwnerCompositionId (composition-materialized slots). Members carry IsInherited + LockedInDerived flags (TemplateAttribute/TemplateAlarm/TemplateScript/TemplateNativeAlarmSource).
  • TemplateEngine/TemplateResolver.cs:119BuildInheritanceChain walks arbitrary depth (root-first), cycle-guarded. TemplateEngine/Flattening/FlatteningService.cs — derived wins, IsInherited placeholders skip in favor of the live base value. CycleDetector.cs — inheritance/composition/cross-graph checks on save.
  • TemplateEngine/Flattening/RevisionHashService.cs — deterministic SHA-256 of flattened config (already used for staleness in Transport/M8 via IStaleInstanceProbe).
  • CentralUI/Components/Pages/Design/TemplateEdit.razor:58 — loads only the immediate base (_baseTemplate, _baseAttributesByName, …); no multi-level resolution, no staleness banner.
  • ManagementService/ManagementActor.cs:178 — template command block; no resolve/update-derived command.

Cluster D — Triggers + schema entry

  • TemplateEngine/Validation/ValidationService.cs:263 (CheckExpressionTrigger) — real Roslyn compile + forbidden-API + undefined-attribute checks; blank expression = warning; errors block deploy. Separate error/warning lists make selective escalation a clean seam.
  • JSON Schema is canonical storage (migration 20260512211204_MigrateParametersToJsonSchema); Commons/Types/InboundApi/InboundApiSchema.cs (Parse/ParseSchema recursive, depth-capped; Validate). CentralUI/Components/Shared/SchemaBuilder.razor authors schemas; ParameterValueForm.razor:52 renders scalars but falls back to a JSON textarea for object/list. Monaco already integrated (MonacoEditor.razor). No $ref resolution anywhere; no schema-library entity. Directory.Packages.props — no JSON-schema package (System.Text.Json only).

Cluster E — CLI cached-call Retry/Discard

  • Backend relay fully exists: ManagementActor.cs:220,380 (RetryParkedMessageCommand/DiscardParkedMessageCommand, Deployer-gated) → SiteCallAuditActor.cs:877,909 (HandleRetrySiteCall/HandleDiscardSiteCallRetryParkedOperation/DiscardParkedOperation relay) → site. Central UI Site Calls page already uses it.
  • CLI pattern: CLI/Commands/CommandHelpers.cs:34 (ExecuteCommandAsyncManagementHttpClient.SendCommandAsync), command-name via ManagementCommandRegistry.GetCommandName. Model on NotificationCommands.cs. No cached-call command group today — must verify the registry maps the two commands.

Design by feature

T22 — Template tree search (small)

Add a search <input> to Templates.razor, bound to a local field, passed to TemplateFolderTree.Filter. The recursive filter + auto-expand already exist. UI-only — no service/entity/command change. Clear-filter restores the full tree and prior expansion state.

T23 — Folder reorder + context menus (standard)

  • TemplateFolderService.ReorderFolderAsync(folderId, direction, user) (or MoveUp/MoveDown) — swap SortOrder with the adjacent sibling under the same parent; no-op at the ends. New ReorderTemplateFolderCommand + ManagementActor handler (Designer-gated, matching the other folder commands).
  • Sibling loads ordered by SortOrder (then Name) everywhere the folder tree is built.
  • Templates.razor — Move-up/Move-down items in the folder context menu; a root-level context menu (right-click empty/root → New Folder, New Template at root). Complete any missing folder-menu items.

T24 — Move connection between sites (high-risk)

  • MoveDataConnectionCommand(DataConnectionId, TargetSiteId) + ManagementActor HandleMoveDataConnection (Designer-gated).
  • Guards (D5), all server-side: target site exists; no name collision at target; reject if any InstanceConnectionBinding references the connection with an error naming blockers; validate name-based native-alarm-source references won't collide/orphan at the target.
  • Persist via the existing ISiteRepository.UpdateDataConnectionAsync (sets SiteId); emit an audit row.
  • UI: a "Move to Site…" action + MoveDataConnectionDialog (target-site picker, error surface) on DataConnections.razor.

T25 — Connection live-status (standard)

  • A central-side query (extend the health query service or inject ICentralHealthAggregator) returning a connectionId → ConnectionHealth map for a site: read the latest SiteHealthReport.DataConnectionStatuses, resolve names→ids via the repo.
  • DataConnections.razor — render a health badge per connection node (reuse GetConnectionHealthBadge/AlarmStateBadges-style classes), refresh on a ~10s poll timer (mirror the Health page). Register the injected service in the existing DataConnections bUnit fixtures.

T26 — Inheritance authoring resolve + staleness banner (high-risk)

  • A resolve service/method that, given a derived or child template, walks the full inheritance chain (BuildInheritanceChain) and returns the effective inherited member set — including base members added after the derived template was created, across ≥2 inheritance levels — annotated per member with origin (own override / inherited-from-X / locked).
  • A new read-only query command (e.g. GetResolvedTemplateMembersCommand) + ManagementActor handler returning that set (plus a staleness summary).
  • TemplateEdit.razor renders the full resolved inherited set (not just the immediate base) and a read-only banner when the stored derived rows differ from the freshly-resolved chain ("Base changed — N inherited members differ"). No mutation — flattening at deploy is already correct; the banner is informational and the editor's own override actions are unchanged.

T28 — Strict expression-trigger kind (small)

  • Add AnalysisKind (Advisory default / Strict) to the trigger config (carried in the existing TriggerConfiguration JSON or a small dedicated field — chosen to stay additive to the flattened model and avoid a migration if feasible).
  • CheckExpressionTrigger — when Strict, promote the currently-advisory findings to errors (deploy-blocking); Advisory preserves today's behavior exactly.
  • Trigger editor selector (alarm/script trigger UI) + CLI flag (--trigger-kind/--strict). Right-size after confirming exact current advisory set.

T30 — Schema-driven nested forms (standard)

  • Extend ParameterValueForm.razor to recursively render object fields and list items as typed inputs (replacing the JSON textarea for object/list), driven by the parsed InboundApiSchema (including $ref-resolved schemas from T32). Per-field validation via InboundApiSchema.Validate; collect to canonical JSON. Re-register in existing fixtures.

T31 — Monaco hover/completion (standard)

  • Feed the resolved JSON Schema to the existing Monaco editor's JSON language config so the value-entry JSON surface gets schema-driven hover + completion. Reuses MonacoEditor.razor; no new package (Monaco's built-in JSON schema support).

T32 — $ref resolver + template-level schema library (high-risk; build first)

  • New SharedSchema entity (Id, Name unique, optional scope, SchemaJson) + EF config + idempotent migration + repository.
  • Custom $ref resolver in InboundApiSchema.Parse (resolve {"$ref":"lib:Name"}-style pointers to library entries; depth/cycle-guarded, System.Text.Json only).
  • ManagementActor CRUD commands (Designer-gated) + a Central UI schema-library page (reuse SchemaBuilder).
  • Deploy-time validation that every $ref target exists (block on dangling ref), wired into the existing validation pipeline.

CLI — cached-call Retry/Discard (small)

  • New CachedCallCommands.cs (cached-call retry|discard --site-id … --tracked-operation-id …) calling the existing Deployer-gated RetryParkedMessageCommand/DiscardParkedMessageCommand via CommandHelpers.ExecuteCommandAsync. Verify ManagementCommandRegistry maps both command names (the CLI's GetCommandName depends on it). Update CLI/README.md + Component-CLI.md.

Dependencies & wave plan

Execute subagent-driven in the worktree-m9-templates-authoring worktree. Implementers do not create worktrees; commit pathspec form (-m before --, never git add -A); keep ≤23 concurrent committers with a post-wave HEAD-presence check; targeted builds/tests per task; full-solution build + docker rebuild only at integration.

  • Wave 1 (low-risk, parallel — disjoint files): T22 (Templates.razor) ‖ CLI Retry/Discard (CLI) ‖ T28 (Template Engine validation + trigger editor).
  • Wave 2: T23 (Templates.razor, after T22 — same file) ‖ T25 (DataConnections.razor, additive).
  • Wave 3: T24 (DataConnections.razor, after T25 — same file) ‖ T32 foundation (entity + migration + $ref resolver — Commons/ConfigDB/ManagementActor).
  • Wave 4: T30 ‖ T31 (consume the resolver) ‖ T26 (TemplateEdit.razor + resolve service).
  • Wave 5 — integration.

Classifications: T24, T26, T32, integration = high-risk; T23, T25, T30, T31 = standard; T22, T28, CLI = small.

Integration (first-class verification phase)

Per integration-catches-cross-cutting-gaps:

  • Full-solution dotnet build ZB.MOM.WW.ScadaBridge.slnx; EF model-drift check for the new SharedSchema entity (the M2-pre PendingModelChangesWarning lesson — idempotent migration, no pending changes).
  • Trace every new ManagementActor command end-to-end through the registry + handler routing: ReorderTemplateFolderCommand, MoveDataConnectionCommand, the GetResolvedTemplateMembersCommand, the schema-library CRUD commands, and the CLI's two cached-call commands (confirm ManagementCommandRegistry mappings so the CLI resolves names).
  • Re-run the full bUnit suites of every shared component touched (TreeView, TemplateFolderTree, TemplateEdit, DataConnections, ParameterValueForm, SchemaBuilder) — register substitutes for any newly-injected service in their existing fixtures.
  • bash docker/deploy.sh rebuild + /health/ready smoke on central-a/central-b/LB; Playwright coverage for the new UI surfaces (search, reorder menu, move dialog, connection health badge, schema library, schema-driven form).

Testing strategy

  • T22: filter unit/bUnit (match, auto-expand, clear).
  • T23: reorder swap (ends no-op, ordering persists); root-menu render.
  • T24: guard tests — binding-blocks (error names instances), name-collision-blocks, success path; audit row asserted.
  • T25: health map query (name→id resolution, missing report), badge render.
  • T26: multi-level chain (A→B→C), base member added after derive shows in editor, locked member display, staleness banner true/false; adversarial chain/composition-derived cases.
  • T28: Advisory preserves current pass/fail; Strict escalates each advisory finding to a deploy-block.
  • T30: nested object/list render + per-field validation (incl. $ref-resolved schema).
  • T31: schema fed to Monaco (smoke/bUnit where feasible).
  • T32: $ref resolution (valid, dangling→deploy-block, depth/cycle guard); migration idempotency; CRUD round-trip.
  • CLI: command-name registry mapping; retry/discard happy-path + not-parked/unreachable mapping.

Risks

  • T32 migration ↔ EF model drift — idempotent migration + a model-drift assertion in integration.
  • T26 resolution semantics on multi-level + locked + composition-derived templates — adversarial chain tests; keep it strictly read-only to avoid any deploy-path regression.
  • T28 may be near-complete — confirm the exact current advisory set before sizing; the deliverable is the toggle + escalation, not re-building analysis.
  • Shared-component injection regressions (T25/T26/T30 inject into reused components) — the integration wave re-runs each touched component's full fixture suite.
  • CLI registry gap — if RetryParkedMessageCommand/DiscardParkedMessageCommand aren't registered for name resolution, the CLI call fails; verified in Wave 1 and re-asserted at integration.

Next step

Hand off to the writing-plans skill to produce the bite-sized, per-task implementation plan and .tasks.json, then execute subagent-driven wave-by-wave. Finish via finishing-a-development-branch (FF-merge to main + push, no force; docker rebuild to match main).