259 Commits

Author SHA1 Message Date
Joseph Doherty 6fb313cf58 feat(ui/templates): structured trigger editor for template scripts
The script add/edit modal exposed a script's trigger as two raw free-text
inputs — a type string and hand-written config JSON — with no validation
and no parity with the alarm trigger UI.

Replace them with a ScriptTriggerEditor component (mirroring
AlarmTriggerEditor): a trigger-type dropdown plus type-specific panels for
Interval, ValueChange, Conditional, and Call, a grouped attribute picker,
and an auto-generated hint. A ScriptTriggerConfigCodec round-trips the
TriggerConfiguration JSON the site runtime's ScriptActor consumes, tolerant
of legacy keys; an unrecognized stored type is preserved untouched in a
read-only panel.
2026-05-16 04:03:42 -04:00
Joseph Doherty 295150751f feat(scripts): realign Test Run with runtime API, add anonymous-object calls and instance binding
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.

- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
  accept an anonymous object instead of a hand-built dictionary, via a
  shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
  Attributes/CallScript route to it cross-site; adds site-side
  RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
  never existed.

Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
2026-05-16 03:37:56 -04:00
Joseph Doherty d7b05b40e9 fix(host): drop UseStaticFiles so MapStaticAssets controls caching
UseStaticFiles middleware ran before the MapStaticAssets endpoints and
served static assets (monaco-init.js, site.css, etc.) with no
Cache-Control header. Browsers then heuristically cached them and kept
serving stale copies across deploys — e.g. the Monaco editor ran an old
monaco-init.js that did not send the script kind, so inbound API method
scripts were analysed against the wrong globals and 'Route' was flagged
as undefined.

MapStaticAssets alone now serves every static asset, tagging
non-fingerprinted files with Cache-Control: no-cache so the browser
always revalidates via ETag.
2026-05-15 12:29:14 -04:00
Joseph Doherty e54c4a6c2e feat(ui/auth): use a minimal layout for the login page
The login page previously rendered inside MainLayout, showing the full
nav sidebar and the authenticated-user footer. It now uses a bare
LoginLayout (no nav, no session-expiry watchdog, no dialog host) and
just renders its own centred card.
2026-05-15 12:16:36 -04:00
Joseph Doherty fc18239b97 fix(ui/auth): stop /login redirect loop when the session is expired
SessionExpiry renders inside MainLayout, which also wraps the login
page. For a user with a still-present auth cookie but an expired
expires_at claim, it redirected /login back to /login indefinitely.
It now skips the redirect when already on the login page.
2026-05-15 12:14:57 -04:00
Joseph Doherty 1d5465f31c fix(deployment): instance delete fully removes the record
Deleting an instance only undeployed it from the site and set the state
to NotDeployed, leaving an orphan record that could never be removed —
the state-transition matrix rejected delete from NotDeployed.

Delete now removes the instance record entirely (deployment history,
snapshot, attribute/alarm overrides, and connection bindings go with
it), and is permitted from any state.
2026-05-15 12:05:13 -04:00
Joseph Doherty 17e24ddd20 fix(site-event-log): record script errors and route queries to the active node
Script execution failures were only written to Serilog, never to the
site event log — SiteRuntime did not reference the SiteEventLogging
project. ScriptExecutionActor now resolves ISiteEventLogger and emits a
'script'/'Error' event on timeout and exception.

The event-log query handler was a per-node actor bound to that node's
local SQLite. A ClusterClient query could land on the standby (which
records no events) and return nothing. The handler is now a cluster
singleton with a proxy, so queries always reach the active node.
2026-05-15 12:04:59 -04:00
Joseph Doherty 80ec16a6d0 feat(ui/auth): redirect to /login when the session times out
Previously a user idling past the 30-minute cookie expiry stayed parked
on a stale page until they tried to navigate. The auth cookie's UTC
expiry is now also stamped onto an expires_at claim at sign-in, and a
SessionExpiry component mounted in MainLayout schedules a delay until
expiry + 2s grace, then force-loads /login — at which point the standard
cookie middleware confirms the session is gone and serves the login page.
2026-05-13 16:13:53 -04:00
Joseph Doherty 3f37584728 feat(ui/topology): open instance in Debug View from context menu
Adds a Debug View item to the instance context menu on /deployment/topology
that navigates to /deployment/debug-view with siteId and instanceId query
parameters; the page now auto-connects when those are present (falling
back to the existing localStorage auto-reconnect otherwise). Disabled for
non-Enabled instances since debug streaming only targets enabled ones.

Also fixes a latent NRE in DebugView.OnInitializedAsync: the toast ref
isn't bound yet during init, so transient load failures are now stashed
and surfaced from OnAfterRenderAsync where the toast is ready.
2026-05-13 13:41:20 -04:00
Joseph Doherty 733679a376 feat(ui/api-keys): grant API method access on edit page
Admins can now check/uncheck which API methods this key is approved to
invoke directly on /admin/api-keys/{id}/edit, instead of having to bounce
through the Design role's API method editor. Membership is diffed against
the initial state and applied by mutating ApprovedApiKeyIds on each
affected ApiMethod in the same SaveChangesAsync.
2026-05-13 13:41:13 -04:00
Joseph Doherty 7044791a55 docs(plans): scrub LmxProxy references from design plans
Remove the LmxProxy work package (WP-8) from phase-3b, the CD-DCL-1..6
protocol details, Q9/Q-P3B-2 from the questions log, the LmxProxy
component-design rows in requirements-traceability, and the inline
mentions across phase-0, phase-4, the gRPC streaming plans, and the
primary/backup data-connection plans.
2026-05-13 13:30:07 -04:00
Joseph Doherty 72e7bbe968 chore: remove deprecated LmxProxy reference implementation
Delete the standalone ZB.MOM.WW.LmxProxy solution and loose adapter
stubs under deprecated/, plus the lone LmxProxy mention in
deprecated/windev.md. The protocol was never wired into the active
codebase and the runtime artifact has been removed from the cluster.
2026-05-13 13:29:58 -04:00
Joseph Doherty f66dc031a4 fix(health): route site heartbeats into the aggregator
CentralCommunicationActor.HandleHeartbeat was forwarding each incoming
HeartbeatMessage to Context.Parent, which resolves to the /user
guardian — a non-actor. Every site heartbeat went straight to dead
letters (~1026 per central node per 30 minutes at the default ~2s
interval across three sites).

The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which
bumps LastReportReceivedAt on already-known sites (and clears IsOnline
if it had flipped) without touching LatestReport. Heartbeats from
unregistered sites are dropped — first registration still happens on
the first full report. CentralCommunicationActor calls this in place
of the no-op Tell.

The result: heartbeats now serve their stated health-monitoring
purpose (per CLAUDE.md) by keeping a site marked online between the
30s full reports if a single report is briefly delayed, and the dead
letter noise disappears entirely.
2026-05-13 08:11:43 -04:00
Joseph Doherty 7bba48a14a feat(ui/monitoring): redesign Parked Messages page with filters, drawer, and bulk actions
Triage was painful on the old layout: a lone Site dropdown sat on a sparse
row, errors were truncated mid-sentence with a per-row View/Hide toggle
that on expand pushed an unwrapped <pre> through the table and shoved the
Actions column off-screen, all rows looked the same regardless of age or
attempt count, and OriginInstance — which tells you which instance
produced the failure — wasn't displayed at all even though the data was
on the entity.

This pass:

- Adds a real filter bar: Site, Category, Target system, Origin instance,
  Age window, free-text search. Category/Target/Origin/Age/Search filter
  the loaded page client-side; Site still drives the server query (and
  changing site now auto-queries — one fewer click).
- Replaces the in-table expansion with an Offcanvas detail drawer.
  Clicking a row slides in a side panel with full message ID + copy,
  category label, origin, attempts, both timestamps in relative + absolute
  form, the complete error (pre-wrap, scrollable), and big Retry / Discard
  buttons. The table never overflows.
- Stacks Target + Method into one column (target in semibold, method
  small/muted below) and surfaces Origin as a code-styled chip in a new
  column ("—" muted when null).
- Severity left-border on each row, derived client-side from
  AttemptCount/MaxAttempts and age of the last attempt: red when retries
  are exhausted and last attempt was in the past hour, amber when
  exhausted but stale, muted grey otherwise.
- Mini attempt progress bar under the n/max count, red when fully
  exhausted and amber while partial.
- Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on
  hover via the title attribute — applies in both the table and the drawer.
- Bulk select: header checkbox selects the filtered set, per-row
  checkboxes. When ≥1 selected, a sticky action strip slides in below the
  filter bar offering Retry selected / Discard selected with the usual
  confirm dialog. Toast reports per-item success/failure counts.
- Summary line next to the title: "N parked · K target systems · oldest
  Xh ago" (and "(showing M of N)" when filters are active).
- ParkedMessageEntry contract extended additively with MaxAttempts,
  Category, and OriginInstance so the UI has the data it needs for
  severity, the category filter, and the new column.
- Bumped page size from 25 to 50 to better match the dense layout.
2026-05-13 08:05:22 -04:00
Joseph Doherty 1c2dc45803 feat(ui/api-methods): pick approved API keys when editing a method
The ApiMethod entity had an ApprovedApiKeyIds column and ApiKeyValidator
read it, but no UI/CLI/seed code ever wrote to it. Result: any inbound
POST /api/{method} was rejected with 403 "API key not approved for this
method" regardless of which key was sent.

Add an "Approved API Keys" subsection to the method form, between
Timeout and Parameters: vertical list of checkboxes, one per ApiKey
row (with a "Disabled" badge for disabled keys, and a link to
/admin/api-keys when none exist). OnInitializedAsync loads all keys and
parses the existing comma-separated IDs; Save() serializes the selected
set back to the entity on both create and edit paths.

Re-uses IInboundApiRepository.GetAllApiKeysAsync — no repo or migration
changes needed.
2026-05-13 07:12:44 -04:00
Joseph Doherty 1822e3c76f fix(store-and-forward): wire up parked-message handler and start S&F service on sites
The Parked Messages page returned "Parked message handler not available"
because no actor was ever registered for ParkedMessages, and Retry/Discard
requests had no Receive at all (would have hit deadletters). On top of
that, StoreAndForwardService.StartAsync() was never called anywhere, so
the sf_messages SQLite table was never created and the retry timer never
ran — silently breaking all of S&F.

- New ParkedMessageHandlerActor bridges StoreAndForwardService.{Get,Retry,Discard}
  using the Sender→Task→PipeTo pattern already used in DeploymentManagerActor.
- SiteCommunicationActor now routes ParkedMessageRetryRequest and
  ParkedMessageDiscardRequest the same way as the existing Query handler.
- AkkaHostedService.RegisterSiteActors() resolves StoreAndForwardService,
  calls StartAsync() to create the schema and start the timer, then
  creates and registers the handler actor.
2026-05-13 07:12:37 -04:00
Joseph Doherty 6f1f6b8467 fix(health): replicate site health reports between central nodes
CentralHealthAggregator is a per-node hosted singleton, but site health
reports flow through ClusterClient which round-robins each report to one
central node only. The other node's aggregator never saw those reports
and marked sites offline at the 60s threshold — sites constantly flapped
between online and offline on the monitoring page.

On receive, the active CentralCommunicationActor now republishes a
SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both
central nodes subscribe to the topic and process replicas through a
dedicated path that updates the local aggregator without re-broadcasting
(avoids fan-out loops). The aggregator's existing sequence-number
idempotency makes self-delivery a cheap no-op.

DistributedPubSubExtensionProvider is now listed in the HOCON
`akka.extensions` block so the mediator is initialised at cluster
start, eliminating a race where the first Subscribe arrived before the
extension was loaded.
2026-05-13 06:20:07 -04:00
Joseph Doherty d9caa3dd7e fix(ui/shared-scripts): show real param count and return type on cards
The card badges were stuck on the pre-migration data shape: the param
counter only handled flat arrays (now JSON Schema objects), and the
return badge said "returns" regardless of the actual type. Count
`properties` for object schemas with array fallback, and label the
return badge with the schema's `type` (or `T[]` for arrays).
2026-05-13 05:52:53 -04:00
Joseph Doherty 352c93d5a2 fix(alarms): surface composed-member attributes across flatten/validate/UI
Three layers were each blind to nested composition in different ways:

- FlatteningPipeline only loaded compositions for templates in the parent's
  inheritance chain, so depth-2 composed attributes (e.g.
  Pump.AlarmSensor.SensorReading) never materialized. Walk composed chains
  breadth-first so the flattener's nested step has the data it needs.

- InstanceConfigure's alarm trigger picker was fed only direct, non-locked
  attributes, hiding inherited and composed-member paths. Feed it the full
  flattened attribute list via FlatteningPipeline.

- ValidationService.ExtractAttributeNameFromTriggerConfig only recognized
  "attributeName", silently passing alarms still using the legacy
  "attribute" key. Accept both keys, matching FlatteningService,
  AlarmActor, and AlarmTriggerConfigCodec.
2026-05-13 05:33:32 -04:00
Joseph Doherty 164d914ba8 feat(ui): rich AlarmTriggerEditor in instance override modal
Replaces the per-row JSON textbox with an Edit button that opens a modal
hosting the full AlarmTriggerEditor. The editor pre-populates with the
merged inherited + override config so the operator sees the effective
state, not the override delta.

On Save:
  - HiLo: diff against inherited, store only changed keys
  - Binary trigger types: whole-replace if the edited config differs

Value comparison in the diff is type-aware (decoded strings, numeric
GetDouble) so JSON-escape differences (e.g., literal em-dash vs —)
don't produce false-positive diffs that pollute the override JSON.

FlatteningService.MergeHiLoConfig is now public so the UI can pre-merge
the editor seed; new public DiffHiLoConfig handles the symmetric
direction. +2 encoding tests cover the new equivalence behavior.

The override row's summary column shows the diff'd keys + priority chip
so operators see what's overridden at a glance.
2026-05-13 04:05:08 -04:00
Joseph Doherty 4e446a7170 feat(ui): instance alarm override editor in InstanceConfigure
Adds an Alarm Overrides card to the per-instance Configure page (next to
the existing Attribute Overrides and Connection Bindings cards). Each
non-locked template alarm gets a row showing its trigger type, inherited
config, and inputs for an override JSON + priority override. A Clear
button removes the override; the Save Alarm Overrides button upserts
all dirty rows.

The HiLo merge / binary whole-replace semantics are surfaced via the JSON
placeholder hint per trigger type. Wired to the existing
InstanceService.SetAlarmOverrideAsync / DeleteAlarmOverrideAsync flow.
2026-05-13 03:28:39 -04:00
Joseph Doherty 751248feb6 feat(alarms): HiLo trigger type with per-band level, hysteresis, messages, overrides
Adds a new HiLo alarm trigger type with four configurable setpoints
(LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority,
deadband (for hysteresis), and operator message. The site runtime emits
AlarmStateChanged with an AlarmLevel field so consumers can differentiate
warning vs critical bands.

Plumbing:
  - new AlarmLevel enum + AlarmStateChanged.Level/Message init properties
  - AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting
  - AlarmTriggerConfigCodec extracted from the editor for testability
  - sitestream.proto carries level + message over gRPC
  - SemanticValidator enforces numeric attribute, setpoint ordering,
    non-negative deadband
  - on-trigger scripts get an Alarm global (Name/Level/Priority/Message)
    so notification routing can branch by severity
  - per-instance InstanceAlarmOverride entity + EF migration + flattening
    step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary
    types whole-replace
  - DebugView shows a Level badge + per-band message tooltip
  - App.razor auto-reloads on permanent Blazor circuit failure
  - docker/regen-proto.sh automates the proto regen workflow (the linux/arm64
    protoc segfault means generated files are checked in for now)
2026-05-13 03:23:32 -04:00
Joseph Doherty 783da8e21a feat(ui): structured editors for script schemas and alarm triggers
Replace raw-JSON text inputs with rich UI: script parameter/return types use
a JSON Schema builder (SchemaBuilder + JsonSchemaShapeParser, with a migration
to convert existing definitions); alarm trigger config uses a type-aware
editor with a flattened attribute picker (AlarmTriggerEditor). AlarmActor
gains optional direction (rising/falling/either) on RateOfChange triggers.
2026-05-13 00:33:00 -04:00
Joseph Doherty 57f477fd28 fix(templates): cascade delete through nested derived templates
DeleteCompositionAsync only dropped the top-level derived template — the
cascaded inner derived rows (created when composing a composite source)
were left orphaned with dangling OwnerCompositionId references. Any
subsequent attempt to recompose the same source hit the name-collision
guard ('Motor Controller.Pump.TempSensor' already exists).

New CascadeDeleteDerivedAsync walks each composition on the derived
template, recursively removes the slot-owned child derived first, then
the composition row, then the derived itself. Mirrors the recursive
shape of CreateCascadedCompositionAsync.
2026-05-12 10:34:55 -04:00
Joseph Doherty 85769486df fix(ui/templates): expand composition leaves to show cascaded slots
Composition leaves were rendered flat — the cascaded inner derived
templates existed in the DB but the tree only showed the outer slot
name (e.g. "Tank Monitor > DrivePump") with no way to see DrivePump's
own TempSensor + AlarmSensor slots.

BuildCompositionLeaves now recurses: for each composition under a
template, look up the composed template (which after derive-on-compose
is a derived row carrying its own Compositions) and build its slot
leaves as children. HasChildrenSelector loses the
"not a composition" guard so nested leaves render with the expand
chevron.
2026-05-12 10:29:52 -04:00
Joseph Doherty 4f90f952d0 fix(templates): cascade child compositions when composing a composite
When the user composes a template that already has compositions of its
own (e.g. \$Sensor → Probe1 slot), only the outer derived was created
— the source's children weren't replicated. AddCompositionAsync now
walks the source's composition graph and creates a parallel derived for
every slot it encounters, each linked back through ParentTemplateId so
override chains stay intact (\$Probe → \$Sensor.Probe1 → \$Pump.TempSensor.Probe1).

The cascade pre-flights every name it would create — a deep collision
aborts before any rows mutate. Internal helper
CreateCascadedCompositionAsync skips the "base templates only" check
since it operates on the source side which may legitimately reference
derived rows.
2026-05-12 09:57:07 -04:00
Joseph Doherty 1f86945d46 refactor(ui/templates): drop row kebabs; double-click opens templates
The right-click context menu is now the single entry point for every
per-row action — folders, templates, and composition leaves. Drop the
⋮ kebab buttons that duplicated the menu and the click-to-open
behavior that was easy to trigger by accident while navigating the
tree. Templates and composition slots open on double-click instead.

- RenderNodeKebab removed entirely.
- Selectable / SelectedKeyChanged / OnTreeNodeSelected dropped from
  the TreeView wiring — single-click no longer navigates.
- New OpenTemplate(id) helper bound to @ondblclick on Template and
  Composition labels.
2026-05-12 09:50:22 -04:00
Joseph Doherty 54338abdce refactor(ui/templates): drop the "Show derived" toggle
Derived templates are slot-owned and reached only via their owning
parent's composition leaf in the tree — there's no scenario where
listing them as standalone root nodes is useful, so the toggle was
dead UI. Remove the form-switch, the _showDerived state, and the
OnToggleShowDerived handler; BuildTemplateTree filters derived
templates out unconditionally.
2026-05-12 09:46:26 -04:00
Joseph Doherty 78de4a6492 fix(ui/treeview): dismiss right-click context menu when a menu item runs
The custom right-click context menu didn't close after a menu item
opened a modal dialog (e.g. "Compose into…"), leaving the menu
floating behind the modal until the user clicked elsewhere or hit
Escape. Add @onclick="DismissContextMenu" on the menu container so
any click inside it (button, divider, padding) closes the menu after
the button's own handler bubbles up.
2026-05-12 09:30:26 -04:00
Joseph Doherty 5c3dc79b8a feat(templates/ui): manage compositions from the tree
Move composition CRUD off the TemplateEdit page and onto the tree
context menu, matching Aveva's Template Toolbox flow.

- New ComposeIntoDialog: pick a parent template, slot name (defaults
  to the source template's name).
- "Compose into…" on every base template's context menu (kebab + right
  click) opens the dialog and calls AddCompositionAsync.
- "Rename…" on composition leaves opens a prompt and calls
  TemplateService.RenameCompositionAsync. The owning composition row
  AND its owned derived template are renamed atomically; duplicate
  slot names or derived-name collisions abort with a clear error.
- "Delete" on composition leaves confirms + cascade-deletes the
  composition (and its derived template via DeleteCompositionAsync).
- "New Derived Template" menu item renamed to "New Inheriting Template"
  to disambiguate from the new derive-on-compose meaning.

TemplateEdit's Compositions tab, Add Composition form, and
Add/DeleteComposition handlers + state fields are deleted — the tree
is now the single source of truth.
2026-05-12 09:22:55 -04:00
Joseph Doherty 552c9e4065 docs(templates): record phase 4-9 completion + verification TODOs
All nine derive-on-compose phases are now implemented. The status doc
captures what shipped per phase, what was deferred (LockedInDerived
override warning toast, SCADA008 base-Parent hint), and the live-DB /
UI smoke checks worth running before merge.
2026-05-12 08:59:19 -04:00
Joseph Doherty a965d4a5bd feat(templates/ui): phase 9 — single-parent editor context
Derive-on-compose guarantees at most one slot owner per template, so the
Parent.* context in the Monaco editor resolves directly via
OwnerCompositionId without a picker. Base templates suppress Parent.*
assistance entirely (empty context).

Removed the multi-parent <select> dropdown from the Add Script form and
the now-redundant _selectedParentIndex / OnParentContextChanged plumbing.
ActiveEditorParent collapses to _editorParents.FirstOrDefault().
2026-05-12 08:57:42 -04:00
Joseph Doherty f05b03f1cc feat(templates/ui): phase 6-8 — derived template UX
Templates tree hides IsDerived templates by default. A "Show derived"
form-switch in the page header toggles them into the listing so users
can reach orphaned derived templates when they need to.

TemplateEdit:
- Banner on derived templates: links to the base + the composing owner /
  slot name pulled from OwnerCompositionId.
- Attributes/Scripts tables grew a context-aware column:
  * On derived templates: a Source badge (Inherited / Override / Local)
    plus a 🔒 Base-locked badge when the base marks LockedInDerived.
  * On base templates: a switch that flips LockedInDerived through
    UpdateAttribute/UpdateScript.
- Effective Value / Code now resolves from the base when an inherited row
  carries a stale snapshot — matches the flatten-time behavior so the UI
  doesn't lie.
- Override / Revert-to-base actions added to the row kebab; delete is
  hidden on inherited rows (the base owns those).
2026-05-12 08:55:20 -04:00
Joseph Doherty f599809486 feat(templates): phase 4+5 — inherit/override resolution + lock enforcement
FlatteningService now treats IsInherited rows as placeholders: when a
derived template carries an inherited attribute or script, the live base
value resolves through the ParentTemplateId chain instead of the
(possibly stale) copy. An IsInherited=false row is a real override and
wins as before.

ValidateLockedInDerived runs once per chain (main + composed) and returns
a flatten-time failure if a derived template overrides a base row that
the base marked LockedInDerived.

TemplateService.Update{Attribute,Script}Async reject mid-flight when a
derived target tries to override a LockedInDerived base member, and now
persist IsInherited/LockedInDerived from the proposed payload so the UI
can flip override state or set base-locks via the same endpoints.
2026-05-12 08:50:49 -04:00
Joseph Doherty 8b8b85c839 docs(templates): record phase 2+3 completion in status doc
Phase 1 → 3 marked done; remaining work is phases 4-9. Sanity script now
targets the post-Phase-3 commit (03a8c4a) and notes the pre-existing
NU1608 build error in IntegrationTests / Host.Tests so future sessions
don't chase a phantom regression.
2026-05-12 08:31:20 -04:00
Joseph Doherty 03a8c4a632 feat(templates): phase 3 — migrate existing compositions to derived
EF migration MigrateCompositionsToDerived. Aborts with a clear error if
any '<parent>.<slot>' derived name would collide with an existing
template. Otherwise it cursor-walks every TemplateComposition that still
points at a non-derived template:

  1. Insert a derived Template (name "<parent>.<slot>",
     ParentTemplateId=base, IsDerived=1, OwnerCompositionId=composition).
  2. Copy base attributes / scripts into the derived row with
     IsInherited=1, LockedInDerived=0.
  3. Repoint TemplateComposition.ComposedTemplateId at the new derived.

Idempotent: only touches compositions whose target is IsDerived=0, so
re-runs and freshly-created Phase 2 compositions are skipped.

Down() reverses by repointing compositions back to derived.ParentTemplateId
and dropping all derived templates (with cascade copy rows).
2026-05-12 08:30:17 -04:00
Joseph Doherty fa86750717 feat(templates): phase 2 — derive-on-compose for new compositions
AddCompositionAsync creates a derived Template ("<parent>.<slot>") that
inherits from the base via ParentTemplateId. Base attributes and scripts
are copied with IsInherited=true so the derived template carries its own
override-able rows. The composition row points at the derived template,
and the derived's OwnerCompositionId back-refs the composition for cascade
delete.

DeleteCompositionAsync cascade-deletes the owned derived template.
DeleteTemplateAsync blocks direct deletion of derived templates and
distinguishes derivatives from regular children, listing slot owners
("'Pump' (as 'TempSensor')") in the error.

Composing a derived template is rejected — only bases can be composed.
Existing compositions still resolve until phase 3 migrates them.
2026-05-12 08:27:13 -04:00
Joseph Doherty 91b786eb1c docs(templates): derive-on-compose phase status + resume plan
Companion to the design doc — captures current state, the four
decisions already made, what's done (phase 1, commit 5615f3d),
and a full play-by-play for phases 2 through 9 with exact files,
methods, and tests to touch. Written so a future session after
context compaction can pick up cleanly.
2026-05-12 08:18:43 -04:00
Joseph Doherty 5615f3d0c7 feat(templates): phase 1 — derived-template schema (additive)
Phase 1 of the design at
docs/plans/2026-05-12-derive-on-compose-design.md.

Additive schema only — no behavior changes. Existing data and code
paths continue to work; subsequent phases will start writing the
new fields.

Template gains:
  IsDerived            true when this row was auto-created to back
                       a composition slot
  OwnerCompositionId   back-ref to the owning TemplateComposition
                       (plain int, not an EF nav property — managed
                       by TemplateService for cascade-delete)

TemplateAttribute / TemplateScript each gain:
  IsInherited          row copied from base and not yet overridden;
                       changes to the base flow downward
  LockedInDerived      on a base, blocks derived from overriding;
                       enforced at the service layer in later phases

EF Core migration AddDerivedTemplateFields adds four columns:
  Templates.IsDerived              bit NOT NULL DEFAULT 0
  Templates.OwnerCompositionId     int NULL
  TemplateAttributes.IsInherited   bit NOT NULL DEFAULT 0
  TemplateAttributes.LockedInDerived bit NOT NULL DEFAULT 0
  TemplateScripts.IsInherited      bit NOT NULL DEFAULT 0
  TemplateScripts.LockedInDerived  bit NOT NULL DEFAULT 0

Existing rows get the defaults. Tests across SiteRuntime / TemplateEngine
/ CentralUI suites stay green (129 / 199 / 159).

Next: phase 2 — wire AddCompositionAsync to derive on compose for
new compositions. Old data still flows the direct-reference path
until phase 3's migration script.
2026-05-12 08:16:24 -04:00
Joseph Doherty a968cefbc2 docs(templates): record derive-on-compose decisions (naming, migration, tree UX) 2026-05-12 08:13:11 -04:00
Joseph Doherty 68548432b3 docs(templates): design for derive-on-compose specialization
Aveva-style composition: composing $Sensor into $Pump creates a
derived template Pump.TempSensor that inherits from $Sensor and can
override values, override script bodies, add new fields, with
LockedInDerived on the base preventing specific overrides.

Schema sketch: Template gains IsDerived + OwnerCompositionId;
TemplateAttribute/Script gain IsInherited + LockedInDerived.
TemplateComposition.ComposedTemplateId pivots to point at the
derived template (the base is reachable via derived.ParentTemplateId).

Phased rollout (9 phases), starting from additive schema, then
flow change for new compositions, then EF Core migration of
existing data, then resolution, lock semantics, tree UI, derived
template edit UI, base template lock-toggle UI, editor metadata
simplification (multi-parent picker becomes mostly obsolete —
derived templates always have a single owner).

Open questions captured at the end for review before phase 1.
2026-05-12 08:12:12 -04:00
Joseph Doherty 0139c9ca83 refactor(scripts): scoped parent query + parent picker for multi-parent templates
Two caveats from the script-scope rollout addressed:

1. ITemplateEngineRepository.GetTemplatesComposingAsync — a scoped
   query that returns only the templates referencing a given template
   via Compositions, eager-loaded with their Attributes / Scripts /
   Compositions. Replaces the GetAllTemplatesAsync + filter pattern
   in TemplateEdit so the Monaco metadata fetch doesn't pull the
   entire template catalog to find one parent.

2. Multi-parent picker. The previous implementation suppressed Parent
   assistance entirely when more than one template composes the open
   one. Now TemplateEdit collects every parent into _editorParents
   and renders a small `select` above the script editor when there
   are >1, letting the user choose which parent's metadata drives
   Parent.Attributes / Parent.CallScript completion + diagnostics.
   Single-parent templates skip the picker (no UI change). Zero
   parents (root template) hide the picker and surface no Parent
   assistance.

Browser-verified on the Sensor Module template (composed by both Pump
and Variable Speed Motor): picker shows both options, switching
updates the editor's parent metadata immediately via the existing
GetContext callback.

Test counts unchanged (159 / 199); the new repo method is exercised
end-to-end by the parent-picker browser path.
2026-05-12 06:00:02 -04:00
Joseph Doherty 0b24b4537d feat(ui/scripts): editor support for self/child/parent accessors
Phases 3+4 of the script-scope rollout. Wires the runtime accessors
landed in efba01d through to Monaco completion, diagnostics, and
hover.

New analyzer surface in ScriptAnalysisService:

  String-literal completion contexts (added to TryStringLiteralCompletions):
    Attributes["..."]                       -> SelfAttributes
    Children["..."]                         -> composition names
    Children["X"].Attributes["..."]         -> child template's attributes
    Children["X"].CallScript("...")         -> child template's scripts
    Parent.Attributes["..."]                -> parent template's attributes
    Parent.CallScript("...")                -> parent template's scripts

  Diagnostics:
    SCADA006   Attribute "Typo" is not declared on {this template,
               child composition 'X', the parent}.  (warning)
    SCADA007   Composition "Unknown" is not declared on this template.
               (warning)

  CallShared / CallScript snippet-expansion now routes through the
  child / parent shape catalogs when invoked on Children["X"] /
  Parent — picking a child script accepts `Sample", ${1:count})`.

Contract additions:
  - AttributeShape (Name, Type) record
  - CompositionContext (Name, Attributes, Scripts) record
  - SelfAttributes / Children / Parent fields on DiagnoseRequest,
    CompletionsRequest, HoverRequest, SignatureHelpRequest

ScriptHost (analyzer-side globals) gains stub AttributeBag /
ChildrenBag / CompositionBag types so Roslyn doesn't emit CS0103 on
Attributes / Children / Parent. The stubs are never invoked — only
their signatures are read by the analyzer's compilation pass.

MonacoEditor.razor exposes SelfAttributes / Children / Parent
parameters; GetContext returns them; monaco-init.js forwards all
three on completion / hover / signature-help / diagnostics requests.

TemplateEdit fetches each composition's resolved child template
shape via GetTemplateWithChildrenAsync, and queries GetAllTemplatesAsync
for any single parent that composes the open template. Multi-parent
or no-parent → Parent is suppressed.

11 new xUnit tests on the new completion / diagnostic paths. Total:
149 -> 159.

Browser-verified via curl:
  - Children["..."] suggests composition names
  - Attributes["..."] suggests attributes with type detail
  - Attributes["Typo"] squiggles SCADA006
  - Children["Unknown"] squiggles SCADA007
  - No spurious CS0103 on the new accessors

Hover, signature help, and inlay hints for the new accessors keep
working because they reuse the same dispatch logic.
2026-05-12 05:53:13 -04:00
Joseph Doherty efba01d10a feat(scripts): self/child/parent attribute and script accessors
Phases 1+2 of the design at
docs/plans/2026-05-12-script-scope-access-design.md.

Adds ergonomic scope-aware accessors to compiled scripts. A script
on a composed TempSensor reads its own attribute via
Attributes["Temperature"]; reaches up to the parent via
Parent.Attributes["SpeedRPM"]; invokes a child script via
Children["TempSensor"].CallScript("Sample"). All resolve to the
existing flat Instance.GetAttribute / SetAttribute / CallScript
delegates by prepending the script's canonical path prefix.

Runtime types (SiteRuntime.Scripts.ScopeAccessors):
  AttributeAccessor   sync indexer + GetAsync / SetAsync
  CompositionAccessor Attributes + CallScript
  ChildrenAccessor    Children["name"] => CompositionAccessor

ScriptGlobals gains Scope, Attributes, Children, Parent properties.
Sync indexer blocks on the Instance Actor Ask; explicit GetAsync /
SetAsync are also available for callers that want to await.

Plumbing:
  - Commons.Types.Scripts.ScriptScope record (SelfPath / ParentPath).
  - ResolvedScript.Scope (defaults to ScriptScope.Root for back-compat).
  - FlatteningService emits new ScriptScope(prefix, "") for each
    composed script so a script defined on TempSensor composed under
    a parent gets SelfPath = "TempSensor".
  - ScriptActor reads the Scope from its ResolvedScript and forwards
    it through ScriptExecutionActor into ScriptGlobals on each call.

RevisionHashService not touched: the per-script canonical name
already encodes the composition path, so any structural change
already flips the hash.

10 new unit tests on the path arithmetic. Site/Template engine
suites stay green (129 + 199).

Editor surface (Phase 3: metadata fetch, Phase 4: completion +
SCADA006 / SCADA007 diagnostics) follows in the next commits.
2026-05-12 05:45:24 -04:00
Joseph Doherty 3ed05f0595 docs(scripts): design for template-script scope access
Self / Children / Parent accessors with sync-indexer + async-method
shape. Flattening pipeline emits ScriptScope per resolved script;
ScriptCompilationService seeds the accessors at execution time with
no new actor messages or lookup paths.

Phased: (1) runtime accessors + Scope on ResolvedScript, (2)
flattening + deploy round-trip, (3) editor metadata fetch for child
+ parent shapes, (4) Monaco completion / hover / diagnostics
(SCADA006 unknown attribute, SCADA007 unknown composition).

Out of scope: per-template Roslyn-generated typed accessors,
locking-aware writes (covered by lock-enforcement pass), and
sibling-of-sibling chained navigation.
2026-05-12 05:38:58 -04:00
Joseph Doherty 0528c65cba feat(ui/scripts): format, inlay hints, problems panel, type diagnostic
Three more editor features rolled in:

1. Roslyn Format command.
   New POST /api/script-analysis/format runs Formatter.Format() from
   Microsoft.CodeAnalysis.CSharp.Workspaces on the parsed script
   tree. monaco-init.js registers a DocumentFormattingEditProvider
   so Ctrl/Cmd-Shift-F and the toolbar "Format" button both work.

2. Inlay hints with parameter names.
   New POST /api/script-analysis/inlay-hints walks CallShared /
   CallScript invocations and emits InlayHint records positioned at
   each argument with the matching parameter's name (e.g. "name:").
   Ghost text appears via Monaco's InlayHintsProvider.

3. SCADA005 argument-type diagnostic.
   Literal type vs. declared parameter type check on every
   CallShared/CallScript argument. Float accepts Integer literals;
   Object/List accept anything; null only matches reference-ish
   types. Legacy lowercase types ("string" etc) from the DB are
   normalized to the canonical set before comparison so existing
   data doesn't false-negative. Non-literal args (variables,
   expressions) are skipped — out of scope for a cheap pass.

4. Parameters["name"] hover.
   Hover endpoint now also resolves Parameters["X"] element-access
   keys against the form's DeclaredParameterShapes and returns
   "parameter `name: String`"-style markdown. MonacoEditor surfaces
   the new DeclaredParameterShapes parameter; ScriptParameterNames
   gets a ParseShapes companion.

5. Problems panel.
   Bootstrap card under the editor listing every marker with
   severity badge, line number, message, and SCADA / CS code. Click
   a row to scroll the editor to that line and focus. JS now
   invokes OnMarkersChanged on the .NET side whenever
   setModelMarkers fires, so the panel stays in sync with the
   editor.

6. Editor toolbar.
   Small top-right strip on each editor with Format / Wrap /
   Minimap / Theme toggles. New MonacoBlazor.format,
   setEditorOption, and revealLine JS APIs back the buttons and the
   problems-panel scroll-to-line.

Contracts:
  - FormatRequest / FormatResponse
  - InlayHintsRequest / InlayHintsResponse / InlayHint
  - HoverRequest.DeclaredParameters
  - MonacoEditor.DeclaredParameterShapes parameter
  - MonacoEditor.MarkersChanged callback
  - ScadaContext.DeclaredParameterShapes

10 new xUnit tests covering format, inlay hints, SCADA005 (string-
expects-integer, integer-expects-string, float-accepts-integer,
object-accepts-anything, non-literal-skipped), and Parameters key
hover. Total: 139 -> 149.

Microsoft.CodeAnalysis.CSharp.Workspaces 4.13.0 added to pull in
Formatter and AdhocWorkspace.

Browser-verified: typing `CallShared("Greet", 42)` now shows the
"name:" inlay hint and a SCADA005 squiggle on `42`; Parameters["typo"]
shows SCADA003 as before; the toolbar buttons all work.
2026-05-12 05:28:13 -04:00
Joseph Doherty 004c5da582 feat(ui/scripts): shape-aware Monaco features for script calls
Now that the form holds parameter + return shapes for declared
parameters, sibling scripts (template Scripts tab), and shared
scripts (via SharedScriptCatalog), the editor leverages them four
ways:

1. Snippet expansion on accept.
   Picking a CallShared or CallScript completion inserts the full
   call template with tabstops, e.g. `Greet", ${1:name})`. The JS
   provider extends the completion range over Monaco's auto-closed
   `")` so the snippet replaces the closing pair cleanly. Items
   carry insertTextRules=4 (InsertAsSnippet) and a command to
   immediately trigger parameter hints after acceptance.

2. Hover info.
   Hovering the script name token inside CallShared("X") or
   CallScript("Y") shows a markdown tooltip with the call signature
   and return type. New endpoint POST /api/script-analysis/hover.

3. Signature help.
   Inside CallShared(...) / CallScript(...) Monaco shows the
   parameter strip with the active parameter highlighted. The
   service walks up from the cursor to the nearest enclosing
   InvocationExpression and resolves which argument index the
   cursor is on. New endpoint POST /api/script-analysis/signature-help.

4. Argument-count diagnostic (SCADA004) and unknown-Parameters-key
   diagnostic (SCADA003). The Diagnose pipeline now consults the
   declared parameters and sibling/shared shapes to flag:
     - Parameters["typo"] when "typo" isn't on the form        (warn)
     - CallScript("Calc", 1) when Calc declares 2 required args (err)
     - CallShared("Greet", 1, 2, 3) when Greet declares 1 arg   (err)
   Optional parameters relax the required-count bound.

Contract changes:
  - ScriptShape / ParameterShape records
  - ISharedScriptCatalog.GetShapesAsync (replaces GetNamesAsync)
  - new HoverRequest/Response, SignatureHelpRequest/Response
  - CompletionsRequest.SiblingScripts: string[] -> ScriptShape[]
  - DiagnoseRequest gains DeclaredParameters + SiblingScripts
  - CompletionItem gains InsertTextRules (Monaco snippet rule)

Form wiring:
  - TemplateEdit passes ScriptShapeParser.Parse(...) per sibling
  - MonacoEditor surfaces SiblingScripts: IReadOnlyList<ScriptShape>
  - GetContext returns shapes to JS on each completion/hover/sig
    request

10 new ScriptAnalysisServiceTests covering all four features plus
optional-parameter edge cases. Existing tests updated for the
contract changes. Total: 113 -> 139.

Browser-verified via direct curl + Monaco marker readback:
  - SCADA003 squiggle on Parameters["typo"]
  - Snippet item Greet", ${1:name}) with insertTextRules=4
  - Hover markdown shape signature
  - Signature help parameter strip
2026-05-12 05:17:59 -04:00
Joseph Doherty cd0ec583e1 refactor(ui/scripts): cache diagnostics + semantic forbidden-API check
Two pre-flagged follow-ups from the Monaco integration:

1. IMemoryCache for diagnostics keyed by SHA256 of the script body.
   Same-code Diagnose() now short-circuits the Roslyn compile and
   forbidden-API walk. SizeLimit 200 entries with 5-minute sliding
   expiration. Completions aren't cached — position + form context
   vary too much for a useful hit rate.

2. Forbidden-API analyzer now resolves identifiers through the
   SemanticModel instead of matching names. A user identifier
   named File / Thread / Process / etc. no longer false-positives
   — only references that resolve to a NamedTypeSymbol whose
   containing namespace is on the banned list are flagged. The
   diagnostic message now names the offending namespace, e.g.
   "Type 'File' from forbidden namespace 'System.IO' is not
   allowed in scripts."

Refactor: extracted ISharedScriptCatalog so ScriptAnalysisService
can be unit-tested without standing up SharedScriptService's EF
chain. Concrete SharedScriptCatalog wraps the existing service.

16 new xUnit tests in ScriptAnalysisServiceTests:
  - Empty / clean / missing-semicolon paths
  - SCADA001 on each banned using namespace (theory)
  - SCADA002 on real File.ReadAllText through System.IO
  - No-false-positive checks for user-defined File / Thread locals
  - Cache returns the same response instance on repeat
  - Different code → different cache entries
  - String-literal completions for Parameters / CallScript / CallShared
  - General completion at file scope returns ScriptHost members

Total CentralUI test count: 113 -> 129.
2026-05-12 05:05:35 -04:00
Joseph Doherty 225817eac9 feat(ui/scripts): SCADA-specific Monaco extensions
Wave 3 of the Monaco/Roslyn integration. Adds the four extensions
agreed in the design Q&A:

  1. Parameters["..."] keys — when the cursor is inside a string
     literal that's the index of a Parameters[] element-access,
     completions return the parameter names declared in the form's
     ParameterListEditor.
  2. CallShared("...") names — when the cursor is inside a string
     literal argument to a CallShared(...) invocation, completions
     return the names of all shared scripts (resolved server-side
     via SharedScriptService).
  3. CallScript("...") names — same shape, but uses sibling-script
     names passed from the form (TemplateEdit's _scripts list).
  4. Forbidden-API diagnostic — squiggles uses of the documented
     script trust model bans: System.IO / Diagnostics / Reflection /
     Net / Threading.Thread namespaces, plus the named types File,
     Directory, Process, Thread, Socket, etc. New diagnostic codes
     SCADA001 (using directive) and SCADA002 (type identifier).

ScriptAnalysisService gains a SharedScriptService dependency
(scoped, hence the analyzer is now scoped too); CompletionsRequest
carries DeclaredParameters and SiblingScripts; Complete is now async.

MonacoEditor.razor exposes DeclaredParameters / SiblingScripts
parameters plus a [JSInvokable] GetContext() so the JS side asks
for the latest form state on every completion request. The
provider in monaco-init.js looks up the owning editor from the
internal editors map and forwards the context.

ScriptParameterNames helper parses the ParameterListEditor JSON
into a name list — used by SharedScriptForm, ApiMethodForm, and
TemplateEdit's Add-Script form to populate the Monaco context.

Smoke-verified via direct fetch + Monaco trigger:
  - var x = Parameters["  →  popup: "name" (declared parameter)
  - var y = CallShared("  →  popup: GetWeather, Greet
  - using System.IO;      →  SCADA001 squiggle
  - Process.Start(...)    →  SCADA002 squiggle
  - File.ReadAllText(...) →  SCADA002 squiggle

Also fixed: ScriptAnalysisService scoped (was singleton, broke DI
because SharedScriptService is scoped); JS normalizes Pascal-case
context keys from Blazor's record serialization to camel-case for
the request body.
2026-05-12 04:56:56 -04:00
Joseph Doherty cf9548e9ed feat(ui/scripts): Roslyn-backed C# completions + diagnostics for Monaco
Adds Microsoft.CodeAnalysis.CSharp.Scripting (4.13.0). Scripts are
compiled as C# script fragments against a ScriptHost globals type
that mirrors what the runtime exposes (Parameters bag, CallShared,
CallScript) — Roslyn reads the signatures so those identifiers are
in scope for analysis without executing anything.

ScriptAnalysisService:
  - Diagnose(code): Compilation.GetDiagnostics() projected to
    Monaco-shaped DiagnosticMarker records (severity 8/4/2/1).
  - Complete(code, line, col): dot-member lookup via SemanticModel
    when the token at position is part of a MemberAccessExpression;
    falls back to LookupSymbols at position for the general case.

Two endpoints exposed by the existing CentralUI endpoint pipeline,
both behind RequireDesign policy:
  POST /api/script-analysis/diagnostics
  POST /api/script-analysis/completions

monaco-init.js registers a csharp CompletionItemProvider with dot/
paren/quote trigger chars, plus a 500 ms debounced diagnostics pass
on every keystroke that pushes markers via setModelMarkers. Initial
pass fires on editor create so existing scripts surface errors right
away. Auth uses the existing cookie via credentials: same-origin.

Smoke-verified:
  - Typing `DateTimeOffset.UtcNow` (no semicolon) shows the missing
    semicolon squiggle in real time.
  - Ctrl-Space at file scope returns the full type universe
    (AccessViolationException, Action, Akka, AppDomain, ...).

Wave 2 of three. SCADA-specific extensions (declared param keys,
shared/sibling script names, forbidden-API diagnostic) follow.
2026-05-12 04:40:07 -04:00
Joseph Doherty 7f01c5547a feat(ui/design): Monaco editor for script code fields
Vendors Monaco 0.55.1 min/vs/ (~15 MB) at
wwwroot/lib/monaco/vs/. No CDN dependency; works on air-gapped
deployments. Loaded lazily on first script-edit via the AMD loader.

wwwroot/js/monaco-init.js exposes window.MonacoBlazor with
createEditor / setValue / getValue / setMarkers / dispose. Handles
loader bootstrap, DotNet round-trip on content change, and marker
sets for later diagnostic wiring.

Components/Shared/MonacoEditor.razor is a Blazor wrapper with
Value / ValueChanged / Language / Height / ReadOnly parameters and
IAsyncDisposable teardown. Bidirectional binding tracks
_lastSentValue to avoid push/pull loops.

Replaces the plain textareas in SharedScriptForm, TemplateEdit's
Add-Script form, and ApiMethodForm. Default height 320px ≈ the
previous rows=10. Build / tests / dialog flow unaffected.

Wave 1 of three. Roslyn-backed completions and SCADA-specific
extensions follow in subsequent commits.
2026-05-12 04:34:41 -04:00
Joseph Doherty e667ea2b50 test(ui/design): roundtrip tests + normalization notice for IO editors
Editors now set a _normalized flag when ParseFromJson coalesces a
legacy type name (lowercase "string", "Int32", "Double", etc.) to the
canonical set. When flagged, render a small alert-info inline:
"Some parameter types were normalized... Save to persist the
canonical form." The flag clears on any user edit so the notice
doesn't linger after Emit overwrites the JSON.

31 new bUnit tests in tests/.../Shared/:
  - ParameterListEditorTests: null/empty rendering, row count per
    JSON entry, legacy type normalization across .NET names +
    lowercase, the normalized notice trigger, add/remove emission,
    List/non-List item-type column visibility, required-flag round
    trip, invalid JSON + non-array error paths.
  - ReturnTypeEditorTests: null vs simple vs List shape, legacy type
    normalization, change-type / clear-type emission, invalid JSON
    + non-object error paths.

Total CentralUI test count: 82 -> 113.
2026-05-12 04:27:00 -04:00
Joseph Doherty 1b98d37919 refactor(ui/design): replace JSON inputs with structured editors
Two new shared components in Components/Shared:
  - ParameterListEditor: table of rows (name + type + item type + required + remove)
  - ReturnTypeEditor: single type (+ item type when List)

Both round-trip the same JSON shape already stored on the entity:
  parameters: [{"name":"x","type":"String","required":true},...]
  return:     {"type":"List","itemType":"Integer"} | null

Type set follows the Inbound API validator (Boolean, Integer, Float,
String, Object, List). Legacy values normalize on read — Int32 / int64
/ Double / Decimal / lowercase string / etc all coalesce to the new
set so existing rows render correctly. Re-saving persists the
normalized form.

Applied to:
  - SharedScriptForm
  - TemplateEdit Add Script form (also surfaces ParameterDefinitions
    + ReturnDefinition which the entity supported but the form was
    never wiring through)
  - ApiMethodForm

Graceful degradation: invalid JSON is shown with a "Start fresh"
escape hatch instead of crashing the form.
2026-05-12 04:22:58 -04:00
Joseph Doherty eb1d6872ef refactor(ui/shared): migrate sidebar CSS to Bootstrap variables
Replaces hardcoded sidebar / nav-link hex colors with Bootstrap CSS
custom properties (var(--bs-dark), var(--bs-primary), var(--bs-gray-*),
var(--bs-white)). Visual parity preserved; rebrand/dark-mode work
later can override the variables without touching this file.

Only the reconnect overlay rgba(0,0,0,0.5) is left as a literal —
Bootstrap doesn't ship a backdrop-overlay token.
2026-05-12 03:57:45 -04:00
Joseph Doherty 8038aa7cb5 refactor(ui/shared): introduce IDialogService + DialogHost
Eliminates the per-page <ConfirmDialog @ref="_confirmDialog"
ConfirmButtonClass="btn-danger" /> boilerplate. Pages now inject
IDialogService and call ConfirmAsync(title, message, danger: true)
programmatically.

New scoped service holds a single active dialog (throws on nested
calls), with a global DialogHost mounted once in MainLayout that
renders the modal markup, owns body scroll-lock via Bootstrap's
modal-open class, traps focus on the modal element, and handles
Escape-to-cancel.

Same service also exposes PromptAsync, used to replace the bespoke
NewFolderDialog. Both ConfirmDialog and NewFolderDialog components
are deleted — their callers (~13 pages across Admin/Design/Deployment
/Monitoring) now go through the service.

DiffDialog stays as-is — different use case (before/after content).

bUnit tests in TopologyPageTests, DataConnectionsPageTests, and
TemplatesPageTests register IDialogService in their service
collection.

Also: a top-of-file Razor comment on Sites.razor pointing future
implementers at it as the reference list-page pattern.
2026-05-12 03:57:37 -04:00
Joseph Doherty e21791adb0 refactor(ui/monitoring): KPI dashboard, message expand, copy, pagination fix
Dashboard: user-info card demoted; 4 KPI cards (Sites, Data
connections, Templates, API keys) sourced from existing repositories;
3 Quick-action link cards (Health, Audit Log, Templates). Inline
max-width style replaced with Bootstrap utilities.

Health: KPI row condensed to Online / Offline / Sites with active
errors (Total Sites and Total Script Errors dropped). Per-site cards
re-laid out 2-column with each subsection (Data Connections,
Instances & Queues, Errors & Parked Messages) inside Bootstrap
collapse panels collapsed by default. Online / Offline / Primary /
Standby badges paired with shape glyphs (o / * / triangle) plus
aria-label.

EventLogs: filter row wrapped in a Bootstrap collapse toggled by
"Filter options (n active)"; per-row View toggle reveals the full
message in a collapse row; "Keyword" relabeled "Message contains";
all filter inputs gain id+label-for+aria-label; severity badges paired
with a leading glyph; explicit "End of results" terminator on
Load more.

ParkedMessages: Message ID rendered as <code>{first 12}...</code>
plus a clipboard button; per-row View toggle reveals full error;
action buttons get aria-label="{Retry|Discard} message {id}";
in-flight spinner inside the active button.

AuditLog: pagination Next-disabled now uses
_page * _pageSize >= _totalCount via HasMore helper (fixes the
exactly-page-size edge case). Clear filters button added. Entity ID
rendered as code + clipboard button. View/Hide buttons gain
aria-label referencing the entry id. State JSON larger than 1 KB
renders a "View in modal" button instead of the inline overflow.
2026-05-12 03:33:06 -04:00
Joseph Doherty 321ca0bbbf refactor(ui/deployment): live-updates toggle, DebugView guardrails
New shared DiffDialog mirroring ConfirmDialog's API
(ShowAsync(title, before, after)) so live-data pages stop
hand-rolling Bootstrap modal markup.

Topology: <h4> in flex header, aria-labels on Expand/Collapse/Refresh
and the inline rename input, Live-updates toggle (suppresses the 15s
timer when off), instance/area counts moved into a summary alert
above the tree, Stale badge paired with bi-exclamation-triangle icon
+ aria-label, hand-rolled Diff modal replaced with <DiffDialog @ref>.

Deployments: pause/resume auto-refresh button replaces the static
"Auto-refresh: 10s" text; summary cards switch to
col-lg-3 col-md-6 col-12; InProgress spinner gets role="status" +
aria-label; failed rows pick up a bi-x-circle icon next to the
Status badge; Deployment ID + Revision folded into one
{id}@{revision[..8]} cell; inline Error column collapses behind a
per-row "View error" toggle; bare empty-state text upgraded to the
centered muted block.

DebugView: status-strip card at the top showing instance / connection
state / last snapshot timestamp plus a "Start fresh" button when the
page auto-reconnected from localStorage. Per-table filter input,
scroll-lock toggle, Clear button, and a 200-row queue-style cap.
<tbody> elements gain aria-live="polite" aria-atomic="false" for
screen-reader announcements. Quality and Alarm-State badges get
aria-labels; timestamps display HH:mm:ss with full ms in a hover
tooltip. Auto-reconnect surfaces a toast with autoDismissMs: 8000.
2026-05-12 03:32:53 -04:00
Joseph Doherty b6e2ec8a50 refactor(ui/design): card grid, SMTP split, TemplateEdit vertical-stack
Templates: <h4> in flex header, Expand/Collapse moved into a Bulk
actions dropdown, hover-visible kebab on tree nodes with aria-labels.
TreeView CSS gets a .tv-kebab opacity-on-hover utility.

TemplateCreate: form-control (not -sm) for primary inputs; accessible
Back button.

TemplateEdit: Properties card vertical-stacked with Save at the
bottom-right and Parent rendered as readonly plaintext. Add-member
forms (Attributes, Alarms, Scripts, Compositions) reflowed from
horizontal row g-2 align-items-end into cards with stacked col-12
inputs (Scripts gets rows=10). Lock/Unlock badges show full words.
Per-row Delete moved into a kebab dropdown. Tab nav gains
role="tablist" / role="tab" / aria-selected / aria-controls and panels
get role="tabpanel". Validation entries get consistent strong-and-
muted styling.

SharedScripts: migrated from table to card grid (col-lg-6) matching
Sites; cards show code preview + param/return badges + Edit + kebab.
Search filter, empty state CTA, @key.

SharedScriptForm: small ?-icon tooltips next to Parameters and Return
Definition labels.

ExternalSystems: SMTP split out to its own page; remaining tabs (
External Systems, DB Connections, Notification Lists, API Methods,
API Keys) unified as card grids with per-tab search + empty-state CTA.
Tab nav gets full ARIA instrumentation. Header gains a link to the
new SMTP page.

New page SmtpConfiguration.razor at /design/smtp: vertical-stacked
form using the existing Credentials field on the entity.

ExternalSystemForm: AuthConfig placeholder updates based on the
selected AuthType (None / ApiKey / BasicAuth).

DbConnectionForm: form-text below Connection String noting that the
value is stored in plain text and is admin-only.

ApiMethodForm: Script textarea rows=10; JSON example placeholders
for Params and Returns.

NotificationListForm: form-control sizing on Name/email inputs;
thead.table-dark -> table-light on the recipients table.
2026-05-12 03:32:39 -04:00
Joseph Doherty da2c0d714e refactor(ui/admin): card grid, search, kebab; LDAP scope-rule chips
LdapMappings: flex header, search filter, per-row Edit + kebab Delete,
@key, dropped Site-Scope-Rules cell in favor of a {n rule(s)} badge.

LdapMappingForm: two stacked cards (Mapping then Site Scope Rules);
scope rules render as removable chips with an inline "Add scope rule"
form; create-mode disables the scope card with an explainer; role
select gets form-text help.

DataConnections: <h4> in flex header, Bulk actions dropdown holding
Expand/Collapse, hover-visible kebab on tree nodes mirroring the
right-click context menu, aria-labels, "No connections match the
filter." inline empty state.

DataConnectionForm: Site rendered as readonly plaintext + lock-after-
creation note in edit mode; parallel Primary endpoint / Backup endpoint
headings; "Optional" badge on Backup when null; form-text on
FailoverRetryCount.

ApiKeys: search filter, Status column dropped (state now lives in the
kebab menu label "Disable"/"Enable"), Edit + kebab actions, @key,
aria-labels.

ApiKeyForm: nested card removed; fixed-text Back header; real
clipboard copy via IJSRuntime + toast confirmation.

Test selector fix in DataConnectionFormTests for the new Site
readonly-plaintext rendering.
2026-05-12 03:32:17 -04:00
Joseph Doherty f7b10f2ff7 refactor(ui/shared): scroll-lock, escape, aria-live, responsive sidebar
ConfirmDialog locks body scroll via IJSRuntime + Bootstrap's
modal-open class on show, restores on hide. Escape key now closes
the dialog; default ConfirmButtonClass flipped from btn-danger to
btn-primary so non-destructive confirms aren't red. Destructive
callsites (Delete, Discard) get explicit ConfirmButtonClass="btn-danger".

ToastNotification adds aria-live="polite" + aria-atomic="true" on the
container and an optional autoDismissMs parameter on every Show* method.

LoadingSpinner text-muted -> text-secondary for contrast.

DataTable gains a clear (x) button on the search input and applies
disabled / aria-disabled directly to the pagination buttons.

NewFolderDialog splits backdrop and modal markup to match ConfirmDialog.

NavMenu wraps the nav list in an overflow-y scroll container so the
username/sign-out footer stays anchored, and section headers convert
from <li> to <div role="presentation">.

MainLayout adds a hamburger toggle for <lg viewports; sidebar collapses
via Bootstrap collapse data attributes.

App.razor extracts inline <style> block to a shared site.css; adds a
left-border accent on the active nav link; switches the reconnect
modal to modal-dialog-centered.

Login uses d-flex / min-vh-100 centering. NotAuthorizedView gets the
same centered layout plus the ScadaLink brand heading.

Sites.razor: only the new ConfirmButtonClass="btn-danger" follow-up.
2026-05-12 03:32:07 -04:00
Joseph Doherty ff5f5a10ef docs(ui): UI audit findings (2026-05-12)
Audit of every page in CentralUI against the Sites.razor card-grid
pattern, the no-third-party-UI-libs constraint, and accessibility
basics. Findings + per-page severity + suggested implementation
order live in docs/plans/. Implementation follows in subsequent commits.
2026-05-12 03:31:54 -04:00
Joseph Doherty 0805e18e9c refactor(ui/sites): replace 10-col table with card grid + collapsible cluster panel
The dense table buried high-signal fields (name, identifier, connections)
under four 80-character Akka/gRPC URLs truncated mid-string. Replace with
a 2-column responsive card grid; cluster-node addresses now live in a
collapsed disclosure with copy-to-clipboard. Adds client-side filter,
empty/no-match states, kebab menu for less-frequent actions, and
@key=site.Id to keep Bootstrap collapse state from leaking across cards
when the filter changes.
2026-05-12 02:55:37 -04:00
Joseph Doherty 22d91c858a feat(ui): Layer E2 OpcUaEndpointEditor gains Authentication / Advanced / Deadband sections
Three new sections inserted into <OpcUaEndpointEditor>:

1. Authentication (between the existing Connection row and Timing)
   - 'Enable Authentication' button when Config.UserIdentity is null
   - TokenType select (Anonymous / UsernamePassword / X509Certificate)
   - Conditional Username + Password inputs for UsernamePassword
   - Conditional Certificate path + Certificate password for X509Certificate
   - 'Remove Authentication' button

2. Advanced subscription (after the existing Subscription row)
   - Subscription display name (text)
   - Subscription priority (number 0-255)
   - Timestamps to return (Source / Server / Both select)
   - Discard oldest (checkbox)

3. Deadband filter (after Advanced subscription)
   - 'Enable Deadband' button when Config.Deadband is null
   - Type select (Absolute / Percent), Value number input
   - 'Remove Deadband' button

EnableAuthentication and EnableDeadband helpers complement EnableHeartbeat.
All new fields use the existing RenderFieldError helper for validator errors.

82/82 CentralUI tests pass (the 10 new editor tests drove the design).
2026-05-12 02:30:06 -04:00
Joseph Doherty f89f234558 test(ui): failing bUnit tests for OpcUaEndpointEditor new sections
Adds 10 new tests covering:
- Authentication section label + Enable/Remove toggle (creates/nulls UserIdentity)
- TokenType conditional rendering: UsernamePassword shows Username/Password,
  X509Certificate shows Certificate path/password, Anonymous shows no extras
- Deadband Enable/Remove toggle
- Advanced Subscription section labels (Discard oldest, Subscription display
  name, Subscription priority, Timestamps to return)
- UserIdentity per-field error rendering under Username

9 new tests fail because the editor component hasn't been extended yet
(TDD red phase). Layer E2 implements the sections.
2026-05-12 02:28:47 -04:00
Joseph Doherty 8faaa8fe2b feat(dcl): Layer D OpcUaGlobalOptions for app-wide identity + cert paths
New deployment-wide options bound from the "OpcUa" section of appsettings.json:
- ApplicationName (default "ScadaLink-DCL")
- TrustedIssuerStorePath / TrustedPeerStorePath / RejectedCertificateStorePath

Empty paths fall back to Path.GetTempPath()/ScadaLink/pki/* so dev runs work
without explicit config — same defaults the hardcoded values previously used.

Wiring:
- ServiceCollectionExtensions binds OpcUaGlobalOptions to the OpcUa section.
- DataConnectionFactory takes IOptions<OpcUaGlobalOptions> and constructs
  RealOpcUaClientFactory with the snapshot.
- RealOpcUaClient(globalOptions) replaces the hardcoded ApplicationName and
  the three CertificateTrustList store paths in ApplicationConfiguration.
- Parameterless ctors on factory and client preserved for the existing test
  suite (32/32 DCL tests still green).
2026-05-12 02:27:58 -04:00
Joseph Doherty e6a5b558f3 feat(dcl): Layer C runtime wires new OPC UA settings through to OPC SDK
OpcUaConnectionOptions record gains DiscardOldest, SubscriptionPriority,
SubscriptionDisplayName, TimestampsToReturn, plus OpcUaDeadbandOptions and
OpcUaUserIdentityOptions nullable sub-records.

OpcUaDataConnection.ConnectAsync copies all new fields from the typed
OpcUaEndpointConfig (including the Deadband and UserIdentity sub-objects)
into the OpcUaConnectionOptions record.

RealOpcUaClient:
- BuildUserIdentity translates TokenType into Opc.Ua.UserIdentity:
  Anonymous → null, UsernamePassword → new UserIdentity(name, utf8(pass)),
  X509Certificate → new UserIdentity(X509CertificateLoader.LoadPkcs12FromFile(...)).
- Subscription uses opts.SubscriptionDisplayName and opts.SubscriptionPriority.
- MonitoredItem.DiscardOldest is opts.DiscardOldest (was hardcoded true).
- BuildDataChangeFilter materializes a DataChangeFilter when Deadband is set.
- ReadAsync uses MapTimestampsToReturn for opts.TimestampsToReturn (was hardcoded Source).

X509CertificateLoader replaces obsolete X509Certificate2(string,string) ctor
(SYSLIB0057 on .NET 10). UserIdentity(string,byte[]) ctor used because the
(string,string) overload was removed in OPC Foundation 1.5.378.106.
2026-05-12 02:26:15 -04:00
Joseph Doherty b60a8ef409 feat(commons): Layer B serializer + validator handle new OPC UA settings
OpcUaEndpointConfigSerializer:
- ToFlatDict emits new scalar keys (DiscardOldest, SubscriptionPriority,
  SubscriptionDisplayName, TimestampsToReturn).
- ToFlatDict emits dotted sub-object keys (UserIdentity.TokenType / Username /
  Password / CertificatePath / CertificatePassword, Deadband.Type / Value)
  when those sub-objects are non-null.
- FromFlatDict reads the same keys back; missing keys preserve POCO defaults.
- Deadband.Value uses InvariantCulture for double parsing/formatting.

OpcUaEndpointConfigValidator:
- SubscriptionDisplayName required (non-empty).
- UserIdentity.UsernamePassword requires Username.
- UserIdentity.X509Certificate requires CertificatePath.
- Deadband.Value must be > 0 when Deadband is set.
- fieldPrefix propagates through sub-object error EntityNames.

Drives the 11 previously-failing tests green; 51/51 in the suite now pass.
2026-05-12 02:22:51 -04:00
Joseph Doherty 91450ec390 test(commons): failing tests for Layer B serializer + validator extensions
Adds 11 new tests covering:
- Roundtrip of DiscardOldest/SubscriptionPriority/SubscriptionDisplayName/TimestampsToReturn
- Roundtrip of UserIdentity sub-object across all three TokenTypes
- Roundtrip of Deadband sub-object
- ToFlatDict/FromFlatDict for UserIdentity.* and Deadband.* dotted keys
- Validator rules: empty SubscriptionDisplayName, UsernamePassword w/o Username,
  X509 w/o CertificatePath, Deadband Value <= 0, prefix propagation

Build passes; tests fail because serializer/validator have not been extended yet
(TDD red phase). Task B2 will implement the changes to drive them green.
2026-05-12 02:21:33 -04:00
Joseph Doherty 16f7ab0d0a feat(commons): extend OpcUaEndpointConfig with auth, subscription tuning, read/filter knobs
Adds POCOs and enums for upcoming OPC UA editor expansion:
- OpcUaUserTokenType (Anonymous | UsernamePassword | X509Certificate)
- OpcUaUserIdentityConfig (TokenType + Username/Password + CertificatePath/Password)
- OpcUaDeadbandType (Absolute | Percent) + OpcUaDeadbandConfig
- OpcUaTimestampsToReturn (Source | Server | Both)

OpcUaEndpointConfig grows three new scalars (DiscardOldest, SubscriptionPriority,
SubscriptionDisplayName) plus optional UserIdentity and Deadband sub-objects.
Defaults preserve current runtime behavior (anonymous, no deadband, DiscardOldest=true).
2026-05-12 02:20:12 -04:00
Joseph Doherty 084da55ad6 fix(commons): LoadLegacy handles mixed-type JSON values (number/bool/string) 2026-05-12 02:08:32 -04:00
Joseph Doherty cfb90d2078 fix(ui/admin): always clear _loading in DataConnectionForm.OnInitializedAsync 2026-05-12 01:14:18 -04:00
Joseph Doherty 9916aeaa47 refactor(ui/admin): DataConnectionForm uses OpcUaEndpointEditor and typed model 2026-05-12 01:11:49 -04:00
Joseph Doherty 505731fcef test(ui): drive DataConnectionForm tests via NavigationManager for SupplyParameterFromQuery 2026-05-12 01:09:31 -04:00
Joseph Doherty 46260f30ee test(ui): failing tests for DataConnectionForm refactor 2026-05-12 01:07:55 -04:00
Joseph Doherty 1c71d3342a feat(ui): OpcUaEndpointEditor Blazor component 2026-05-12 01:05:32 -04:00
Joseph Doherty 304ebec121 test(ui): failing bUnit tests for OpcUaEndpointEditor 2026-05-12 01:02:41 -04:00
Joseph Doherty 496d2a68e3 refactor(site-runtime): route OPC UA connection JSON through serializer 2026-05-12 00:59:25 -04:00
Joseph Doherty f98d29fc36 refactor(dcl): OpcUaDataConnection uses OpcUaEndpointConfig via FromFlatDict 2026-05-12 00:57:09 -04:00
Joseph Doherty 80d4d3e252 feat(commons): OpcUaEndpointConfigValidator 2026-05-12 00:52:55 -04:00
Joseph Doherty b53221e44a test(commons): failing tests for OpcUaEndpointConfigValidator 2026-05-12 00:50:28 -04:00
Joseph Doherty 4608adcd53 refactor(commons): defensive legacy-parse + FromFlatDict starts from POCO defaults 2026-05-12 00:48:17 -04:00
Joseph Doherty 8fbf167389 feat(commons): OpcUaEndpointConfigSerializer with legacy fallback + flat-dict interop 2026-05-12 00:44:21 -04:00
Joseph Doherty 90b252047e test(commons): decouple serializer tests from JSON whitespace and verify defaults symmetrically 2026-05-12 00:41:55 -04:00
Joseph Doherty 2220bfcf58 test(commons): failing tests for OpcUaEndpointConfigSerializer 2026-05-12 00:38:56 -04:00
Joseph Doherty b16606d97e feat(commons): OpcUaEndpointConfig POCOs + ConnectionConfig ValidationCategory 2026-05-12 00:35:27 -04:00
Joseph Doherty a9c4c2c655 docs(plans): implementation plan for OPC UA config model refactor
14 bite-sized tasks (TDD pattern) covering:
- Commons foundation: POCOs, serializer, validator
- Runtime adoption: OpcUaDataConnection + DeploymentManagerActor swap
- UI build: <OpcUaEndpointEditor> + DataConnectionForm rewrite
- Verification: build/test green + Docker browser smoke + push

Tasks #45-#58 created with blocking dependencies; companion
.tasks.json sidecar persists the plan for executing-plans skill.
2026-05-12 00:33:51 -04:00
Joseph Doherty c906e73441 docs(plans): OPC UA endpoint config model & form refactor design
Captures the design decisions from the brainstorming session:
- OpcUaEndpointConfig POCO + validator + serializer in Commons
- Single source of truth: both UI and site runtime consume the model
- Typed nested JSON storage (camelCase), legacy flat-dict fallback
- Shared <OpcUaEndpointEditor> Blazor component used twice
- Custom protocol removed from dropdown; Protocol field hidden
- Validation timing on Save only; per-field red text via ValidationEntry
2026-05-12 00:27:35 -04:00
Joseph Doherty da5fdf0e63 feat(ui/admin): Topology-style refresh of Data Connections page
Brings the Data Connections admin page up to the same UX standard as the
Topology page:
- Search box with dim non-matches (opacity 0.4, shape preserved)
- Toolbar: + Connection (disabled until a site is selected), Refresh,
  Expand, Collapse
- Site context menu gains "Add Connection here" that navigates with
  ?siteId= so the form preselects + locks the Site field
- Form gains "Primary Endpoint" / "Backup Endpoint" h6 subsection
  headers matching the SiteForm convention; Failover Retry Count moved
  inside the Backup subsection
- URL renamed: /admin/connections (primary) + /admin/data-connections
  (legacy secondary @page). Same dual-route treatment on the form
- Nav label: "Data Connections" -> "Connections"
- Adds DataConnectionsPageTests bUnit suite (6 tests)
2026-05-11 22:42:48 -04:00
Joseph Doherty f3386d0278 feat(ui/deployment): consolidate sites/areas/instances into Topology page
Single /deployment/topology page replaces /deployment/instances (legacy URL
preserved as a secondary @page directive) and the /admin/areas* CRUD pages.
TreeView with Site → Area → Instance, V1–V7 visual guide (bi-building /
bi-diagram-3 / bi-box), always-visible empty containers, search dim, F2
inline area rename, and right-click context menus per node kind (Add Area,
Move to Area…, lifecycle actions, etc.).

Adds AreaService.MoveAreaAsync with cycle prevention, same-site enforcement,
and name-collision check at the new parent. Instance rename intentionally
out of scope — UniqueName is the site-side actor identity, requires its own
design pass.
2026-05-11 22:03:55 -04:00
Joseph Doherty b2eddd9713 feat(ui/templates): derived-template action and slimmer composition row
Right-click a template now offers "New Derived Template" — opens
TemplateCreate with the parent pre-selected via a new ?parentId query
parameter. Composition rows in the tree drop the trailing
"→ TargetName" muted text; the kind glyph plus the instance name carry
enough meaning, and the composed template is one click away from the
row's right-click menu.
2026-05-11 21:29:32 -04:00
Joseph Doherty b4cb7e6f5f feat(templates): lock ParentTemplateId after creation
Template inheritance is set once at create time and immutable on update.
UpdateTemplateAsync now returns "Parent template cannot be changed after
creation." when the caller sends a parent that differs from the stored
value — server-side enforcement covers UI, ManagementService, and CLI.
TemplateEdit renders the parent as static plaintext rather than an
editable dropdown; TemplateCreate's parent picker is unchanged.
2026-05-11 21:29:21 -04:00
Joseph Doherty 8e388a89c5 feat(ui/templates): adopt TreeView design guide; split editor to /design/templates/{id}
Templates page is now a tree-only browser; editing happens on a dedicated
TemplateEdit page. Drag-drop is replaced by context-menu Move-to-Folder.
TreeView gains Bootstrap Icons (chevron + per-kind glyphs), ancestor guide
lines, defined hover/selected/focus tokens, and Escape-dismisses-menu per
the new Visual Design Guide (V1-V7) in Component-TreeView.md.
2026-05-11 20:52:34 -04:00
Joseph Doherty f3b33e7e1d fix(ui/treeview): union sessionStorage keys instead of overwriting
The previous fix tried to defer page-side RevealNode to the second
render so TreeView's async sessionStorage load could finish first. In
practice Blazor Server didn't always fire a second OnAfterRenderAsync
on the page after the deep-link load, so the reveal never ran.

Real fix: change TreeView's storage-load to UNION the restored keys
with whatever's already in _expandedKeys, instead of REPLACING. That
way the page can call RevealNode whenever it wants and the storage
restore can't clobber the reveal regardless of completion order. The
page-side guard simplifies back to a one-shot reveal on first render.

Semantic note: if a deep-link reveal expands an ancestor that the user
had previously collapsed, the deep link wins. Intentional — the URL
expresses the navigation intent.
2026-05-11 12:42:38 -04:00
Joseph Doherty d8e6f44616 fix(ui/templates): defer deep-link reveal until TreeView restores sessionStorage
Both page.OnAfterRenderAsync(firstRender=true) and
TreeView.OnAfterRenderAsync(firstRender=true) ran concurrently:
- Page called RevealNode → added ancestor keys to _expandedKeys
- TreeView awaited treeviewStorage.load → replaced _expandedKeys with
  the persisted set (often empty if user collapsed before navigating)

Whichever JS interop completed second won. When TreeView won, the deep-link
reveal silently lost. Gate the reveal on firstRender==false so it runs
strictly after TreeView's restore is done.
2026-05-11 12:39:21 -04:00
Joseph Doherty ca164dca03 fix(ui/templates): stop drop propagation on folder nodes
Without stopPropagation, dropping a template onto a folder fires both
OnDrop(folder) and OnDropOnRoot via event bubbling. The two async handlers
race on the same scoped DbContext, which is not thread-safe — the second
throws ObjectDisposedException and tears down the Blazor circuit. Surfaced
during browser smoke testing via JS-dispatched DragEvent sequence.
2026-05-11 12:28:05 -04:00
Joseph Doherty acead212b2 fix(ui/templates): dereference string params with @ and stack toolbar below title
Smoke testing revealed two issues introduced by the modal extraction commit:

1. ErrorMessage / InitialName / TemplateName parameters on the dialog
   components were passed as bare strings (e.g. ErrorMessage="_newFolderError")
   instead of dereferenced C# expressions (ErrorMessage="@_newFolderError").
   Razor treats unquoted-but-not-@-prefixed values to string parameters as
   string literals — so the error block rendered the literal field name in
   red whenever the modal opened. Non-string parameters (int/IEnumerable)
   were fine since Razor treats those as C# expressions by default.

2. The Templates header + 4-button toolbar shared one flex row, but at
   col-md-4 / col-lg-3 width the buttons overflowed into the right-column
   empty-state area. Stack title above a full-width btn-group instead.
2026-05-11 12:20:40 -04:00
Joseph Doherty 3587ab4fcb refactor(ui/templates): extract dialog modals into shared components 2026-05-11 12:03:35 -04:00
Joseph Doherty 17e690f6ef test(ui/templates): cover drag-template-to-root via bUnit DragEventArgs 2026-05-11 12:00:07 -04:00
Joseph Doherty 8155dbc411 docs(templates): describe folder hierarchy and management commands 2026-05-11 11:28:09 -04:00
Joseph Doherty d54013cb88 test(ui/templates): bUnit rendering tests for folder tree 2026-05-11 11:25:15 -04:00
Joseph Doherty ca3b34223d feat(ui/templates): reveal deep-linked template on initial render 2026-05-11 11:21:53 -04:00
Joseph Doherty c60aad9df4 feat(ui/templates): native HTML5 drag-drop reorganization 2026-05-11 11:20:42 -04:00
Joseph Doherty fc105acd7c feat(ui/templates): new-folder, new-template, move-template dialogs 2026-05-11 11:18:36 -04:00
Joseph Doherty 39e6e0a525 feat(ui/templates): per-kind context menus + folder rename/delete 2026-05-11 11:15:25 -04:00
Joseph Doherty 4977f99a74 feat(ui/templates): split-pane layout with folder + composition tree 2026-05-11 11:12:40 -04:00
Joseph Doherty 78165b3d99 feat(ui/templates): replace flat tree model with TmplNode discriminated by kind 2026-05-11 11:10:39 -04:00
Joseph Doherty 20f60c88f9 feat(ui/templates): load folders alongside templates 2026-05-11 11:09:16 -04:00
Joseph Doherty 3d28f0d2eb feat(management): handler + authorization for TemplateFolder commands 2026-05-11 11:07:19 -04:00
Joseph Doherty a293f5a365 feat(management): add TemplateFolder command records 2026-05-11 11:05:32 -04:00
Joseph Doherty 2c301c6fe1 feat(di): register TemplateFolderService 2026-05-11 11:04:26 -04:00
Joseph Doherty e3315781cb refactor(template-engine): align MoveTemplateAsync audit/save order with sibling methods 2026-05-11 11:03:44 -04:00
Joseph Doherty 72b9f7e66e feat(template-engine): TemplateService.MoveTemplateAsync 2026-05-11 11:02:03 -04:00
Joseph Doherty 723ab61bd8 feat(template-folder): delete folder blocked if non-empty 2026-05-11 10:59:29 -04:00
Joseph Doherty e44bbc0caf fix(template-folder): bound cycle-walk to defend against malformed graphs 2026-05-11 10:58:02 -04:00
Joseph Doherty 1269054651 feat(template-folder): move with cycle detection and sibling uniqueness 2026-05-11 10:55:52 -04:00
Joseph Doherty 3dfc7180c5 feat(template-folder): rename folder with sibling uniqueness check 2026-05-11 10:53:43 -04:00
Joseph Doherty ff23f64cf8 feat(template-folder): add TemplateFolderService.CreateFolderAsync with validation 2026-05-11 10:50:28 -04:00
Joseph Doherty 44c6e4a553 refactor(repo): align TemplateFolder methods with sibling repository conventions 2026-05-11 10:48:18 -04:00
Joseph Doherty 4b1077d686 feat(repo): add TemplateFolder repository methods 2026-05-11 10:45:20 -04:00
Joseph Doherty 978ac79ad8 feat(db): EF migration AddTemplateFolders 2026-05-11 10:43:04 -04:00
Joseph Doherty e0b098d200 feat(db): map TemplateFolder entity and Template.FolderId 2026-05-11 10:42:19 -04:00
Joseph Doherty 1d27ec3b85 feat(templates): add TemplateFolder entity and Template.FolderId 2026-05-11 10:42:19 -04:00
Joseph Doherty 80f407ae0d fix(db): EF migration AddPrimaryBackupDataConnections
Captures the pre-existing entity drift from commit 04af039 (rename
Configuration to PrimaryConfiguration, add BackupConfiguration and
FailoverRetryCount), which was committed without a corresponding
migration. Generating this here unblocks the upcoming AddTemplateFolders
migration on the templates-folder-hierarchy branch.
2026-05-11 10:42:12 -04:00
Joseph Doherty 18387df8cb plan(templates-page): use ScadaLink.slnx (repo uses slnx, not sln) 2026-05-11 10:30:15 -04:00
Joseph Doherty 892204ea3a plan(templates-page): implementation plan for folder hierarchy 2026-05-11 10:27:39 -04:00
Joseph Doherty daa01261f3 design(templates-page): folder hierarchy and split-pane tree layout
Replaces the current /design/templates list view with a Wonderware-style
template toolbox: nested TemplateFolder entity, FolderId on Template,
composition children as inline tree leaves, persistent split-pane with
editor on the right, context menus + drag-drop reorg.
2026-05-11 10:20:50 -04:00
Joseph Doherty 872d358ad3 chore(docker): drop stale lmxproxy paths from Dockerfile
The lmxproxy workspace was relocated to deprecated/ in 9dccf8e but the
Dockerfile still tried to COPY lmxproxy/src/ZB.MOM.WW.LmxProxy.Client/,
breaking docker build. Remove the two stale COPY lines.
2026-05-08 09:34:23 -04:00
Joseph Doherty ec1d8f1393 chore(deps): bump packages flagged by NU190x advisories
Restore inside the docker build was failing because TreatWarningsAsErrors
promotes NU1902/NU1903/NU1904 (vulnerable package warnings) to errors.
Bump the flagged packages to advisory-free versions:

- MailKit                                          4.15.1 -> 4.16.0    (GHSA-9j88-vvj5-vhgr)
- Microsoft.AspNetCore.DataProtection.EFCore       10.0.5 -> 10.0.7    (GHSA-9mv3-2cwr-p262, transitively pulls fixed System.Security.Cryptography.Xml — GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf)
- OpenTelemetry.Api  (transitive via Akka.Hosting) 1.9.0  -> 1.15.3    (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6) — added as a direct PackageReference in ScadaLink.Host to override the Akka.Hosting pin

To resolve the NU1605 downgrade chain triggered by DataProtection.EFCore
10.0.7 (which transitively requires Microsoft.EntityFrameworkCore >= 10.0.7
and friends), bump every Microsoft.* 10.0.5 reference across src/ and
tests/ to 10.0.7 in lockstep.
2026-05-08 09:34:17 -04:00
Joseph Doherty 5da779db17 fix(host): wait for configuration database before applying migrations
Central nodes crashed at startup with `CREATE DATABASE permission denied`
when MSSQL accepted connections before recovering user databases —
DB_ID(@db) returned null, so EF Core's MigrateAsync fell through to
SqlServerDatabaseCreator.CreateAsync. The non-privileged app login then
failed CREATE DATABASE and the host terminated with FTL, leaving Traefik's
/health/active probe unable to find an upstream ("no available server" at
localhost:9000).

Add MigrationHelper.WaitForDatabaseReadyAsync that polls
Database.CanConnectAsync() for up to 60s before invoking MigrateAsync, and
thread an ILogger through so retry attempts surface in normal logs. This
removes the startup race without requiring depends_on across compose stacks
or granting dbcreator to the app login.
2026-05-08 09:33:59 -04:00
Joseph Doherty 9dccf8e72f deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
2026-04-08 15:56:23 -04:00
Joseph Doherty 8423915ba1 fix(site-runtime): publish quality changes to site stream for real-time debug view updates
HandleConnectionQualityChanged now publishes AttributeValueChanged events
to the SiteStreamManager for all affected attributes. This ensures the
central UI debug view updates in real-time when a data connection
disconnects and attributes go bad quality.

Only publishes to the stream — does NOT notify script or alarm actors,
since the value hasn't changed and firing scripts/alarms on quality-only
changes would cause spurious evaluations.
2026-03-24 16:32:00 -04:00
Joseph Doherty 6df2cbdf90 fix(lmxproxy): support multiple subscriptions per session
Key subscriptions by unique subscriptionId instead of sessionId to prevent
overwrites when the same session calls Subscribe multiple times (e.g. DCL
StaleTagMonitor). Add session-to-subscription reverse lookup for cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:30:06 -04:00
Joseph Doherty b3076e18db docs(lmxproxy): add stale session subscription fix plan 2026-03-24 16:19:39 -04:00
Joseph Doherty de7c4067e4 feat(dcl): add debug-level logging for heartbeat subscription callbacks 2026-03-24 16:19:39 -04:00
Joseph Doherty 5fdeaf613f feat(dcl): failover on repeated unstable connections (connect-then-stale pattern)
Previously, failover only triggered when ConnectAsync failed consecutively.
If a connection succeeded but went stale quickly (e.g., heartbeat timeout),
the failure counter reset on each successful connect and failover never
triggered.

Added a separate _consecutiveUnstableDisconnects counter that increments
when a connection lasts less than StableConnectionThreshold (60s) before
disconnecting. When this counter reaches failoverRetryCount, the actor
fails over to the backup endpoint. Stable connections (lasting >60s)
reset this counter.

The original connection-failure failover path is unchanged.
2026-03-24 16:19:39 -04:00
Joseph Doherty ff2784b862 fix(site-runtime): add SQLite schema migration for backup_configuration column
Existing site databases created before the primary/backup data connections
feature lack the backup_configuration and failover_retry_count columns.
Added TryAddColumnAsync migration that runs on startup after table creation.
2026-03-24 16:19:39 -04:00
Joseph Doherty 0d03aec4f2 feat(dcl): log connection disconnect events to site event log 2026-03-24 16:19:39 -04:00
Joseph Doherty d4397910f0 feat(dcl): add StaleTagMonitor for heartbeat-based disconnect detection
Composable StaleTagMonitor class in Commons fires a Stale event when no
value is received within a configurable max silence period. Integrated
into both LmxProxyDataConnection and OpcUaDataConnection adapters via
optional HeartbeatTagPath/HeartbeatMaxSilence connection config keys.
When stale, the adapter fires Disconnected triggering the standard
reconnect cycle. 10 unit tests cover timer behavior.
2026-03-24 16:19:39 -04:00
Joseph Doherty 02a7e8abc6 feat(health): show all cluster nodes (online/offline, primary/standby) in health dashboard
Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider
that queries Akka cluster membership for all site-role nodes. HealthReportSender
populates ClusterNodes before each report. UI shows a row per node with
hostname, Online/Offline badge, and Primary/Standby badge. Falls back to
single-node display if ClusterNodes is not populated.
2026-03-24 16:19:39 -04:00
Joseph Doherty 65cc7b69cd feat(health): wire up NodeHostname, ConnectionEndpoint, TagQuality, ParkedMessageCount collectors
- AkkaHostedService: SetNodeHostname from NodeOptions
- DataConnectionActor: UpdateConnectionEndpoint on state transitions,
  track per-tag quality counts and UpdateTagQuality on value changes
- HealthReportSender: query StoreAndForwardStorage for parked message count
- StoreAndForwardStorage: add GetParkedMessageCountAsync()
2026-03-24 16:19:39 -04:00
Joseph Doherty e84a831a02 feat(health): redesign health dashboard with 4-column layout and new metrics
New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints
(primary/secondary), DataConnectionTagQuality (good/bad/uncertain),
ParkedMessageCount. New collector methods to populate them.

Health dashboard redesigned to match mockup: Nodes | Data Connections
(with per-connection tag quality) | Instances + S&F Buffers | Error
Counts + Parked Messages. Site names resolved from repository.
2026-03-24 16:19:39 -04:00
Joseph Doherty 5e2a4c9080 fix(ui): align TreeView node text by giving toggle and spacer equal fixed width 2026-03-24 16:19:39 -04:00
Joseph Doherty 0abaa47de2 fix(ui): normalize TreeView expanded keys to strings for sessionStorage compatibility
Keys from KeySelector (e.g. boxed int) were compared against string keys
restored from sessionStorage, causing expansion state to be lost on
navigation. All keys are now normalized to strings internally.
2026-03-24 16:19:39 -04:00
Joseph Doherty a0a6bb4986 refactor(ui): replace manual template inheritance tree with TreeView component 2026-03-24 16:19:39 -04:00
Joseph Doherty 2b5dabb336 refactor(ui): redesign Areas page with TreeView and dedicated Add/Edit/Delete pages
Areas page now shows a single TreeView with sites as roots and areas as
children. Context menus: sites get "Add Area", areas get "Add Child Area",
"Edit Area", "Delete Area" — each navigating to a dedicated page.

The Delete Area page shows a TreeView of the area and all recursive children
with assigned instances. Deletion is blocked if any instances are assigned
to the area or its descendants.
2026-03-24 16:19:39 -04:00
Joseph Doherty 968fc4adc7 fix(ui): disable site and instance dropdowns while debug view is connected 2026-03-24 16:19:39 -04:00
Joseph Doherty 4c7fa03c07 fix(ui): remove default list-style bullets from TreeView ul elements 2026-03-24 16:19:39 -04:00
Joseph Doherty addbb6ffeb fix(ui): move treeview-storage.js to Host wwwroot where static files are served 2026-03-24 16:19:39 -04:00
Joseph Doherty f1537b62ca refactor(ui): replace instances table with hierarchical TreeView (Site → Area → Instance) 2026-03-24 16:19:39 -04:00
Joseph Doherty 71894f4ba9 refactor(ui): replace manual area tree rendering with TreeView component 2026-03-24 16:19:39 -04:00
Joseph Doherty 4426f3e928 refactor(ui): replace data connections table with TreeView grouped by site 2026-03-24 16:19:39 -04:00
Joseph Doherty 08d511f609 test(ui): add external filtering tests for TreeView (R8) 2026-03-24 16:19:39 -04:00
Joseph Doherty 4e5b5facec feat(ui): add right-click context menu to TreeView (R15) 2026-03-24 16:19:39 -04:00
Joseph Doherty f127efe6ea feat(ui): add ExpandAll, CollapseAll, RevealNode to TreeView (R12, R13) 2026-03-24 16:19:39 -04:00
Joseph Doherty d3a6ed5f68 feat(ui): add sessionStorage persistence for TreeView expansion state (R11) 2026-03-24 16:19:39 -04:00
Joseph Doherty da4f29f6ee feat(ui): add selection support to TreeView (R5) 2026-03-24 16:19:39 -04:00
Joseph Doherty 75648c0c76 feat(ui): add TreeView<TItem> component with core rendering, expand/collapse, ARIA (R1-R4, R14) 2026-03-24 16:19:39 -04:00
Joseph Doherty 4db93cae2b fix(lmxproxy): fix orphaned tag subscriptions when client subscribes per-tag
When a client calls Subscribe multiple times with the same session ID
(one tag per RPC), each call overwrites the ClientSubscription entry.
UnsubscribeClient only cleaned up tags from the last entry, leaving
earlier tags orphaned in _tagSubscriptions. Now scans all tag
subscriptions for client references during cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:43:29 -04:00
Joseph Doherty eecd82b787 fix(lmxproxy): clean up stale session subscriptions on scavenge and add stream timeout
Grpc.Core doesn't reliably fire CancellationToken on client disconnect,
so Subscribe RPCs can hang forever and leak session subscriptions. Bridge
SessionManager scavenging to SubscriptionManager cleanup, and add a
30-second periodic session validity check in the Subscribe loop so stale
streams exit within 30s of session scavenge rather than hanging until
process restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:21:06 -04:00
Joseph Doherty b74e139a85 fix(lmxproxy): reset probe timer after reconnect to prevent false stale triggers
Without this, the staleness check could fire immediately after reconnect
before the first OnDataChange callback arrives, causing a reconnect loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:06:42 -04:00
Joseph Doherty 488a7b534b feat(lmxproxy): add Connected Since and Reconnect Count to status page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 13:32:46 -04:00
Joseph Doherty 73fe618953 fix(lmxproxy): protect probe subscription from ReadAsync teardown, add instance configs
ReadAsync internally subscribes/unsubscribes the same ScanTime tag used
by the persistent probe, which was tearing down the probe subscription
and triggering false reconnects every ~5s. Guard UnsubscribeInternal and
stored subscription state so the probe tag is never removed by other
callers. Also removes DetailedHealthCheckService (redundant with the
persistent probe), adds per-instance config files (appsettings.v2.json,
appsettings.v2b.json) loaded via LMXPROXY_INSTANCE env var so deploys
no longer overwrite port settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:20:05 -04:00
Joseph Doherty 95168253fc feat(lmxproxy): replace subscribe/unsubscribe health probe with persistent subscription
The old probe did a subscribe-read-unsubscribe cycle every 5 seconds to
check connection health. This created unnecessary churn and didn't detect
the failure mode where long-lived subscriptions silently stop receiving
COM callbacks (e.g. stalled STA message pump). The new approach keeps a
persistent subscription on the health check tag and forces reconnect if
no value update arrives within a configurable threshold (ProbeStaleThresholdMs,
default 5s). Also adds STA message pump debug logging (5-min heartbeat with
message counters) and fixes log file path resolution for Windows services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:57:35 -04:00
Joseph Doherty b3222cf30b fix(site-runtime): wire EventLogHandlerActor so site event log queries work
The SiteCommunicationActor expected an event log handler but none was
registered, causing "Event log handler not available" on the Event Logs
page and CLI. Bridge IEventLogQueryService to Akka via a simple actor.
2026-03-23 00:37:33 -04:00
Joseph Doherty 64c914019d feat(lmxproxy): always show RPC Operations table, rename from 'Operations'
Table now displays all 5 RPC types (Read, ReadBatch, Write, WriteBatch,
Subscribe) with dashes for zero-count operations instead of hiding the
table entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:12:19 -04:00
Joseph Doherty 7f74b660b3 feat(lmxproxy): add delivered/dropped message counts to subscription stats
Subscription metrics (totalDelivered, totalDropped) now visible in
/api/status JSON and HTML dashboard. Card turns yellow if drops > 0.
Aggregated from per-client counters in SubscriptionManager.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:07:58 -04:00
Joseph Doherty 59d143e4c8 docs(lmxproxy): update deviations for STA resolution, OnWriteComplete, subscribe fix
- Deviation #2: document three STA iterations (failed → Task.Run → StaComThread)
- Deviation #7: mark resolved — OnWriteComplete now works via STA message pump
- Deviation #8: note awaited subscription creation fixes flaky subscribe test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:52:09 -04:00
Joseph Doherty b218773ab0 fix(lmxproxy): await COM subscription creation to fix Subscribe flakiness
SubscriptionManager.Subscribe was fire-and-forgetting the MxAccess COM
subscription creation. The initial OnDataChange callback could fire
before the subscription was established, losing the first (and possibly
only) value update. Changed to async SubscribeAsync that awaits
CreateMxAccessSubscriptionsAsync before returning the channel reader.

Subscribe_ReceivesUpdates now passes 5/5 consecutive runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:48:01 -04:00
Joseph Doherty 84b7b6a7a9 feat(lmxproxy): re-enable OnWriteComplete callback via STA message pump
With StaComThread's GetMessage loop in place, OnWriteComplete callbacks
are now delivered properly. Write flow: dispatch Write() on STA thread,
await OnWriteComplete via TCS, clean up on STA thread. Falls back to
fire-and-forget on timeout as safety net. OnWriteComplete now resolves
or rejects the TCS with MxStatus error details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:09 -04:00
Joseph Doherty a326a8cbde fix(lmxproxy): make MxAccess client name unique per instance
Multiple instances registering with the same name may cause MxAccess to
conflict on callback routing. ClientName is now configurable via
appsettings.json, defaulting to a GUID-suffixed name if not set.
Instances A and B use "LmxProxy-A" and "LmxProxy-B" respectively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:18:09 -04:00
Joseph Doherty a59d4ad76c fix(lmxproxy): use raw Win32 message pump instead of WinForms Application.Run
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:18:09 -04:00
Joseph Doherty b6408726bc feat(lmxproxy): add STA thread with message pump for MxAccess COM callbacks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:18:09 -04:00
Joseph Doherty c96e71c83c Revert "fix(lmxproxy): resolve subscribe/unsubscribe race condition on client reconnect"
This reverts commit 9e9efbecab399fd7dcfb3e7e14e8b08418c3c3fc.
2026-03-22 23:18:09 -04:00
Joseph Doherty fa33e1acf1 fix(lmxproxy): resolve subscribe/unsubscribe race condition on client reconnect
Three fixes for the SubscriptionManager/MxAccessClient subscription pipeline:

1. Serialize Subscribe and UnsubscribeClient with a SemaphoreSlim gate to prevent
   race where old-session unsubscribe removes new-session COM subscriptions.
   CreateMxAccessSubscriptionsAsync is now awaited instead of fire-and-forget.

2. Fix dual VTQ delivery in MxAccessClient.OnDataChange — each update was delivered
   twice (once via stored callback, once via OnTagValueChanged property). Now uses
   stored callback as the single delivery path.

3. Store pending tag addresses when CreateMxAccessSubscriptionsAsync fails (MxAccess
   down) and retry them on reconnect via NotifyReconnection/RetryPendingSubscriptionsAsync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:18:08 -04:00
Joseph Doherty bc4fc97652 refactor(ui): extract instance bindings and overrides to dedicated Configure page
Move connection bindings, attribute overrides, and area assignment from
inline expandable rows on the Instances table to a separate page at
/deployment/instances/{id}/configure for a cleaner, less cramped UX.
2026-03-22 15:58:32 -04:00
Joseph Doherty 161dc406ed feat(scripts): add typed Parameters.Get<T>() helpers for script API
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
2026-03-22 15:47:18 -04:00
Joseph Doherty a0e036fb6b chore(lmxproxy): switch health probe tag to DevPlatform.Scheduler.ScanTime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:21:00 -04:00
Joseph Doherty ecf4b434c2 refactor(dcl): simplify ValueFormatter now that SDK returns native .NET arrays
The LmxProxy client's ExtractArrayValue now returns proper .NET arrays
(bool[], int[], DateTime[], etc.) instead of ArrayValue objects. Removed
the reflection-based FormatArrayContainer logic — IEnumerable handling
is sufficient for all array types.
2026-03-22 15:15:38 -04:00
Joseph Doherty af7335f9e2 docs(dcl): update protocol and type mapping docs to reflect v2 TypedValue and SDK integration 2026-03-22 15:11:58 -04:00
Joseph Doherty ce3942990e feat(lmxproxy): add DatetimeArray proto type for DateTime[] round-trip fidelity
Added DatetimeArray message (repeated int64, UTC ticks) to proto and
code-first contracts. Host serializes DateTime[] → DatetimeArray.
Client deserializes DatetimeArray → DateTime[] (not raw long[]).
Client ExtractArrayValue now unpacks all array types including DateTime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:08:15 -04:00
Joseph Doherty b050371dd5 fix(lmxproxy): handle DateTime[] COM arrays in TypedValueConverter
DateTime[] from MxAccess was falling through to ToString() fallback,
producing "System.DateTime[]" instead of actual values. Now converts
each DateTime to UTC ticks and stores in Int64Array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:56:08 -04:00
Joseph Doherty dcdf79afdc fix(dcl): format ArrayValue objects as comma-separated strings for display
ArrayValue from LmxProxy client was showing as type name in debug views.
Added ValueFormatter utility and NormalizeValue in LmxProxyDataConnection
to convert arrays at the adapter boundary. DateTime arrays remain as
"System.DateTime[]" due to server-side v1 string serialization.
2026-03-22 14:46:15 -04:00
Joseph Doherty ea9c2857a7 fix(docker,cli): add LmxProxy.Client to Docker build, fix set-bindings JSON parsing
Docker: include lmxproxy/src/ZB.MOM.WW.LmxProxy.Client in build context
so the project reference resolves during container image build.

CLI: fix set-bindings JSON parsing — use JsonElement.GetString()/GetInt32()
instead of object.ToString() which returned null for deserialized elements.
2026-03-22 14:25:09 -04:00
Joseph Doherty 847302e297 test(dcl): add failover state machine tests for DataConnectionActor 2026-03-22 08:47:44 -04:00
Joseph Doherty 5de6c8d052 docs(dcl): document primary/backup endpoint redundancy across requirements and test infra 2026-03-22 08:43:59 -04:00
Joseph Doherty e8df71ea64 feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands
Thread backup data connection fields through management command messages,
ManagementActor handlers, SiteService, site-side SQLite storage, and
deployment/replication actors. The old --configuration CLI flag is kept
as a hidden alias for backwards compatibility.
2026-03-22 08:41:57 -04:00
Joseph Doherty ab4e88f17f feat(ui): add primary/backup endpoint fields to data connection form 2026-03-22 08:36:18 -04:00
Joseph Doherty 801c0c1df2 feat(dcl): add active endpoint to health reports and log failover events
Add ActiveEndpoint field to DataConnectionHealthReport showing which
endpoint is active (Primary, Backup, or Primary with no backup configured).
Log failover transitions and connection restoration events to the site
event log via ISiteEventLogger, passed as an optional parameter through
the actor hierarchy for backwards compatibility.
2026-03-22 08:34:05 -04:00
Joseph Doherty da290fa4f8 feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching 2026-03-22 08:30:03 -04:00
Joseph Doherty 46304678da feat(dcl): extend CreateConnectionCommand with backup config and failover retry count
Update CreateConnectionCommand to carry PrimaryConnectionDetails,
BackupConnectionDetails, and FailoverRetryCount. Update all callers:
DataConnectionManagerActor, DataConnectionActor, DeploymentManagerActor,
FlatteningService, and ConnectionConfig. The actor stores both configs
but continues using primary only — failover logic comes in Task 3.
2026-03-22 08:24:39 -04:00
Joseph Doherty 04af03980e feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount 2026-03-22 08:18:31 -04:00
Joseph Doherty 5ca1be328c docs(dcl): add primary/backup data connections implementation plan
8 tasks with TDD steps, exact file paths, and code samples.
Covers entity model, failover state machine, health reporting,
UI, CLI, management API, deployment, and documentation.
2026-03-22 08:13:23 -04:00
Joseph Doherty 6267ff882c docs(dcl): add primary/backup data connection endpoints design
Covers entity model, failover state machine, health reporting,
UI/CLI changes, and deployment flow for optional backup endpoints
with automatic failover after configurable retry count.
2026-03-22 08:09:25 -04:00
Joseph Doherty 5ec7f35150 feat(dcl): replace hand-rolled LmxProxy gRPC client with real LmxProxyClient library
Switches from v1 string-based proto stubs to the production LmxProxyClient
(v2 native TypedValue protocol) via project reference. Deletes 6k+ lines of
generated proto code. Preserves ILmxProxyClient adapter interface for testability.
2026-03-22 07:55:50 -04:00
Joseph Doherty abb7579227 chore(infra): remove LmxFakeProxy — replaced by real LmxProxy v2 instances on windev
LmxFakeProxy is no longer needed now that two real LmxProxy v2 instances
are available for testing. Added remote test infra section to test_infra.md
documenting the windev instances. Removed tagsim (never committed).
2026-03-22 07:42:13 -04:00
Joseph Doherty efed8352c3 feat(infra): add second OPC UA server instance (opcua2) on port 50010
Enables multi-server testing with independent state. Both instances
share the same nodes.json tag config. Updated all infra documentation.
2026-03-22 07:31:56 -04:00
Joseph Doherty ac44122bf7 docs(lmxproxy): add dual-instance configuration (A on 50100, B on 50101)
Both instances share API keys and connect to the same AVEVA platform.
Verified: 17/17 integration tests pass against both instances.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 07:26:06 -04:00
Joseph Doherty 2c99b370a0 chore(lmxproxy): switch health probe tag to DevAppEngine.Scheduler.ScanTime, remove temp prompts
AppEngine built-in tag is always present and constantly updating (~1s),
making it a more reliable probe than a user-deployed TestChildObject tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 07:18:39 -04:00
Joseph Doherty ec21a9a2a0 docs(lmxproxy): mark gap 1 and gap 2 as resolved with test verification
Gap 1: Active health probing verified — 60s recovery after platform restart.
Gap 2: Address-based subscription cleanup — no stale handles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 07:10:38 -04:00
Joseph Doherty a6c01d73e2 feat(lmxproxy): active health probing + address-based subscription cleanup (gap 1 & 2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 06:44:21 -04:00
Joseph Doherty 86a15c0a65 docs(lmxproxy): document reconnection gaps from platform restart testing
Tested aaBootstrap kill on windev — three gaps identified:
1. No active health probing (IsConnected stays true on dead connection)
2. Stale SubscriptionManager handles after reconnect cycle
3. AVEVA objects don't auto-start after platform crash (platform behavior)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 06:19:30 -04:00
Joseph Doherty 5a9574fb95 feat(lmxproxy): add MxAccess status detail mapping for richer error messages
- MxStatusMapper: maps all 40+ MxStatusDetail codes, MxStatusCategory,
  and MxStatusSource to human-readable names and client messages
- OnDataChange: checks MXSTATUS_PROXY.success and overrides quality with
  specific OPC UA code when MxAccess reports a failure (e.g., CommFailure,
  ConfigError, WaitingForInitialData)
- OnWriteComplete: uses MxStatusMapper.FormatStatus for structured logging
- Write errors: catches COMException separately with HRESULT in message
- Read errors: distinguishes COM, timeout, and generic failures in logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 05:10:50 -04:00
Joseph Doherty 73b2b2f6d7 docs(lmxproxy): add STA message pump gap analysis with implementation guide
Documents when the full STA+Application.Run() approach is needed
(secured/verified writes), why our first attempt failed, the correct
pattern using Form.BeginInvoke(), and tradeoffs vs fire-and-forget.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 05:02:15 -04:00
Joseph Doherty 467fdc34d8 docs(lmxproxy): correct deviation #7 — OnWriteComplete is a COM threading issue, not MxAccess behavior
The MxAccess docs explicitly state OnWriteComplete always fires after Write().
The real cause is no Windows message pump in the headless service process to
marshal the COM callback. Fire-and-forget is safe for supervisory writes but
would miss secured/verified write rejections (errors 1012/1013).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:53:54 -04:00
Joseph Doherty 866c73dcd4 docs(lmxproxy): add deviation #8 — SubscriptionManager COM subscription wiring
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:47:23 -04:00
Joseph Doherty 7bed4b901a fix(lmxproxy): wire MxAccess COM subscriptions in SubscriptionManager
SubscriptionManager tracked client-to-tag routing but never called
MxAccessClient.SubscribeAsync to create the actual COM subscriptions,
so OnDataChange never fired. Now creates MxAccess subscriptions for
new tags and disposes them when the last client unsubscribes.

All 17 integration tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:46:15 -04:00
Joseph Doherty c5d4849bd3 fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:39:14 -04:00
Joseph Doherty e2c204b62b docs(lmxproxy): add execution prompt to fix failing write integration tests 2026-03-22 04:38:30 -04:00
Joseph Doherty 7079f6eed4 docs(lmxproxy): add ArchestrA MXAccess Toolkit reference documentation 2026-03-22 04:30:39 -04:00
Joseph Doherty f4386bc518 docs(lmxproxy): record v2 rebuild deviations and key technical decisions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:21:36 -04:00
Joseph Doherty 779598d962 feat(lmxproxy): phase 7 — integration tests, deployment to windev, v1 cutover
- Replaced STA dispatch thread with Task.Run pattern for COM interop
- Fixed TypedValue oneof tracking with property-level _setCase field
- Added x-api-key DelegatingHandler for gRPC metadata authentication
- Fixed CheckApiKey RPC to validate request body key (not header)
- Integration tests: 15/17 pass (reads, subscribes, API keys, connections)
- 2 write tests pending (OnWriteComplete callback timing issue)
- v2 service deployed on windev port 50100

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:11:44 -04:00
Joseph Doherty 6d9bf594ec feat(lmxproxy): phase 7 — integration test project and test scenarios
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:31:26 -04:00
Joseph Doherty 215cfa29f3 feat(lmxproxy): phase 6 — client extras (builder, factory, DI, streaming extensions)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:29:16 -04:00
Joseph Doherty 8ba75b50e8 feat(lmxproxy): phase 5 — client core (ILmxProxyClient, connection, read/write/subscribe)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:22:29 -04:00
Joseph Doherty 9eb81180c0 feat(lmxproxy): phase 4 — host health monitoring, metrics, status web server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:14:40 -04:00
Joseph Doherty 16d1b95e9a feat(lmxproxy): phase 3 — host gRPC server, security, configuration, service hosting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:05:36 -04:00
Joseph Doherty 64c92c63e5 feat(lmxproxy): phase 2 — host core (MxAccessClient, SessionManager, SubscriptionManager)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 23:58:17 -04:00
Joseph Doherty 0d63fb1105 feat(lmxproxy): phase 1 — v2 protocol types and domain model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 23:41:56 -04:00
Joseph Doherty 08d2a07d8b docs(lmxproxy): update test tags to TestChildObject namespace for v2 type coverage
Replace JoeAppEngine tags with TestChildObject tags (TestBool, TestInt, TestFloat,
TestDouble, TestString, TestDateTime, and array variants) in Phase 4 and Phase 7
plans. These tags cover all TypedValue oneof cases for comprehensive v2 testing.
2026-03-21 23:35:15 -04:00
Joseph Doherty 4303f06fc3 docs(lmxproxy): add v2 rebuild design, 7-phase implementation plans, and execution prompt
Design doc covers architecture, v2 protocol (TypedValue/QualityCode), COM threading
model, session lifecycle, subscription semantics, error model, and guardrails.
Implementation plans are detailed enough for autonomous Claude Code execution.
Verified all dev tooling on windev (Grpc.Tools, protobuf-net.Grpc, Polly v8, xUnit).
2026-03-21 23:29:42 -04:00
Joseph Doherty 683aea0fbe docs: add LmxProxy requirements documentation with v2 protocol as authoritative design
Generate high-level requirements and 10 component documents derived from source code
and protocol specs. Uses lmxproxy_updates.md (v2 TypedValue/QualityCode) as the source
of truth, with v1 string-based encoding documented as legacy context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:38:11 -04:00
Joseph Doherty 970d0a5cb3 refactor: simplify data connections from many-to-many site assignment to direct site ownership
Replace SiteDataConnectionAssignment join table with a direct SiteId FK on DataConnection,
simplifying the data model, repositories, UI, CLI, and deployment service.
2026-03-21 21:07:10 -04:00
Joseph Doherty cd6efeea90 docs: add requirements generation prompt for LmxProxy project 2026-03-21 21:06:59 -04:00
Joseph Doherty 2810306415 feat: add standalone LmxProxy solution, windev VM documentation
Split LmxProxy Host and Client into a self-contained solution under lmxproxy/,
ported from the ScadaBridge monorepo with updated namespaces (ZB.MOM.WW.LmxProxy.*).
Client project (.NET 10) inlines Core/DataEngine dependencies and builds clean.
Host project (.NET Fx 4.8) retains ArchestrA.MXAccess for Windows deployment.
Added windev.md documenting the WW_DEV_VM development environment setup.
2026-03-21 20:50:05 -04:00
Joseph Doherty 512153646a test: add role-based navigation tests verifying correct nav sections per user role 2026-03-21 15:25:34 -04:00
Joseph Doherty d3194e3634 feat: separate create/edit form pages, Playwright test infrastructure, /auth/token endpoint
Move all CRUD create/edit forms from inline on list pages to dedicated form pages
with back-button navigation and post-save redirect. Add Playwright Docker container
(browser server on port 3000) with 25 passing E2E tests covering login, navigation,
and site CRUD workflows. Add POST /auth/token endpoint for clean JWT retrieval.
2026-03-21 15:17:24 -04:00
Joseph Doherty b3f8850711 docs: document script hot-reload mechanisms for all script types 2026-03-21 13:42:06 -04:00
Joseph Doherty eeca930cbd fix: add EF migration for GrpcNodeAAddress/GrpcNodeBAddress columns on Sites table 2026-03-21 12:44:21 -04:00
Joseph Doherty 416a03b782 feat: complete gRPC streaming channel — site host, docker config, docs, integration tests
Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server,
add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber,
expose gRPC ports in docker-compose, add site seed script, update all 10
requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.
2026-03-21 12:38:33 -04:00
Joseph Doherty 3fe3c4161b test: add proto contract, cleanup verification, and regression guardrail tests 2026-03-21 12:36:27 -04:00
Joseph Doherty 49f042a937 refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC 2026-03-21 12:18:52 -04:00
Joseph Doherty 2cd43b6992 feat: update DebugStreamBridgeActor to use gRPC for streaming events
After receiving the initial snapshot via ClusterClient, the bridge actor
now opens a gRPC server-streaming subscription via SiteStreamGrpcClient
for ongoing AttributeValueChanged/AlarmStateChanged events. Adds NodeA/
NodeB failover with max 3 retries, retry count reset on successful event,
and IWithTimers-based reconnect scheduling.

- DebugStreamBridgeActor: gRPC stream after snapshot, reconnect state machine
- DebugStreamService: inject SiteStreamGrpcClientFactory, resolve gRPC addresses
- ServiceCollectionExtensions: register SiteStreamGrpcClientFactory singleton
- SiteStreamGrpcClient: make SubscribeAsync/Unsubscribe virtual for testability
- SiteStreamGrpcClientFactory: make GetOrCreate virtual for testability
- New test suite: DebugStreamBridgeActorTests (8 tests)
2026-03-21 12:14:24 -04:00
Joseph Doherty 25a6022f7b feat: add SiteStreamGrpcClient and SiteStreamGrpcClientFactory
Per-site gRPC client for central-side streaming subscriptions to site
servers. SiteStreamGrpcClient manages server-streaming calls with
keepalive, converts proto events to domain types, and supports
cancellation via Unsubscribe. SiteStreamGrpcClientFactory caches one
client per site identifier.

Includes InternalsVisibleTo for test access to conversion helpers and
comprehensive unit tests for event mapping, quality/alarm-state
conversion, unsubscribe behavior, and factory caching.
2026-03-21 12:06:38 -04:00
Joseph Doherty 55a05914d0 feat: add SiteStreamGrpcServer with Channel<T> bridge and stream limits
- Define ISiteStreamSubscriber interface for decoupling from SiteRuntime
- Implement SiteStreamGrpcServer (inherits SiteStreamServiceBase) with:
  - Readiness gate (SetReady)
  - Max concurrent stream enforcement
  - Duplicate correlationId replacement (cancels previous stream)
  - StreamRelayActor creation per subscription
  - Bounded Channel<SiteStreamEvent> bridge (1000 capacity, drop-oldest)
  - Clean teardown: unsubscribe, stop actor, remove tracking entry
- Identity-safe cleanup using ConcurrentDictionary.TryRemove(KeyValuePair)
  to prevent replacement streams from being removed by predecessor cleanup
- 7 unit tests covering reject-not-ready, max-streams, duplicate cancel,
  cleanup-on-cancel, subscribe/remove lifecycle, event forwarding
2026-03-21 11:52:31 -04:00
Joseph Doherty d70bbbe739 feat: add StreamRelayActor bridging Akka events to gRPC proto channel 2026-03-21 11:48:04 -04:00
Joseph Doherty 9b0a80dcbd feat: add GrpcNodeAAddress/GrpcNodeBAddress to Site entity, CLI, and UI 2026-03-21 11:45:22 -04:00
Joseph Doherty 64ee316609 feat: add GrpcPort config to NodeOptions with startup validation 2026-03-21 11:42:41 -04:00
Joseph Doherty deb58e1f17 feat: add sitestream.proto definition and generated gRPC stubs
Proto3 definition with SiteStreamService (server streaming), Quality and
AlarmStateEnum enums with UNSPECIFIED=0, google.protobuf.Timestamp for
cross-platform timestamps. Pre-generated C# stubs checked in (no protoc
at build time). 10 roundtrip tests covering serialization, oneof
discrimination, and Timestamp<->DateTimeOffset conversion.
2026-03-21 11:41:01 -04:00
Joseph Doherty 826cfbee31 feat: add sitestream.proto definition and generated gRPC stubs
Define the SiteStreamService proto for real-time instance event
streaming (attribute value changes, alarm state changes) from site
nodes to central. Add pre-generated C# stubs following the existing
LmxProxy pattern, gRPC NuGet packages with FrameworkReference for
ASP.NET Core server types, and proto roundtrip tests.
2026-03-21 11:37:39 -04:00
Joseph Doherty b76ce09221 docs: add gRPC streaming channel implementation plan with task tracking 2026-03-21 11:32:24 -04:00
Joseph Doherty 3efec91386 fix: route debug stream events through ClusterClient site→central path
ClusterClient Sender refs are temporary proxies — valid for immediate reply
but not durable for future Tells. Events now flow as DebugStreamEvent through
SiteCommunicationActor → ClusterClient → CentralCommunicationActor → bridge
actor (same pattern as health reports). Also fix DebugStreamHub to use
IHubContext for long-lived callbacks instead of transient hub instance.
2026-03-21 11:32:17 -04:00
Joseph Doherty 41aff339b2 docs: add gRPC streaming channel design plan for site→central real-time data
Replaces ClusterClient-based event streaming with dedicated gRPC server-streaming
channels. Covers proto definition, server/client patterns, Channel<T> bridging,
keepalive/orphan prevention, failover scenarios, port/address configuration,
extensibility guide for new event types, testing strategy, and implementation guardrails.
2026-03-21 11:26:09 -04:00
Joseph Doherty fd2e96fea2 feat: replace debug view polling with real-time SignalR streaming
The debug view polled every 2s by re-subscribing for full snapshots. Now a
persistent DebugStreamBridgeActor on central subscribes once and receives
incremental Akka stream events from the site, forwarding them to the Blazor
component via callbacks and to the CLI via a new SignalR hub at
/hubs/debug-stream. Adds `debug stream` CLI command with auto-reconnect.
2026-03-21 01:34:53 -04:00
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00
Joseph Doherty 0a85a839a2 feat(infra): add Traefik load balancer with active node health check for central cluster failover
Add ActiveNodeHealthCheck that returns 200 only on the Akka.NET cluster
leader, enabling Traefik to route traffic to the active central node and
automatically fail over when the leader changes. Also fixes AkkaClusterHealthCheck
to resolve ActorSystem from AkkaHostedService (was always null via DI).
2026-03-21 00:44:37 -04:00
Joseph Doherty 1a540f4f0a feat: add HTTP Management API, migrate CLI from Akka ClusterClient to HTTP
Replace the CLI's Akka.NET ClusterClient transport with a simple HTTP client
targeting a new POST /management endpoint on the Central Host. The endpoint
handles Basic Auth, LDAP authentication, role resolution, and ManagementActor
dispatch in a single round-trip — eliminating the CLI's Akka, LDAP, and
Security dependencies.

Also fixes DCL ReSubscribeAll losing subscriptions on repeated reconnect by
deriving the tag list from _subscriptionsByInstance instead of _subscriptionIds.
2026-03-20 23:55:31 -04:00
Joseph Doherty 7740a3bcf9 feat: add JoeAppEngine OPC UA nodes, fix DCL auto-reconnect and quality push
- Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime)
- Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads,
  preventing Self.Tell failure in Disconnected event handler
- Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect
- Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]"
- Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync
- Update test_infra_opcua.md with JoeAppEngine documentation
2026-03-19 13:27:54 -04:00
Joseph Doherty ffdda51990 fix(infra): use appConfig.Validate instead of CheckApplicationInstanceCertificate
Replace with Validate() which validates config without requiring a cert,
matching the RealOpcUaClient pattern. Fixes OPC UA connection failure.
2026-03-19 11:30:58 -04:00
Joseph Doherty 8f2700f11e test(infra): add integration smoke test for RealLmxProxyClient against LmxFakeProxy 2026-03-19 11:29:00 -04:00
Joseph Doherty 2cb592ad00 docs: add LmxFakeProxy to test infrastructure documentation 2026-03-19 11:27:30 -04:00
Joseph Doherty edb2ab98cb feat(infra): add LmxFakeProxy Dockerfile and docker-compose service 2026-03-19 11:26:19 -04:00
Joseph Doherty aef70bec7f feat(infra): wire up Program.cs with CLI args, env vars, and OPC UA bridge startup 2026-03-19 11:25:36 -04:00
Joseph Doherty 6852250497 feat(infra): add ScadaServiceImpl with full proto parity for all RPCs 2026-03-19 11:24:26 -04:00
Joseph Doherty 9cc8a1ae80 feat(infra): add TagMapper with address mapping, value parsing, and quality mapping 2026-03-19 11:20:57 -04:00
Joseph Doherty efbedc60a8 feat(infra): add SessionManager with full session tracking and API key validation 2026-03-19 11:20:44 -04:00
Joseph Doherty 1d498b94b4 feat(infra): add IOpcUaBridge interface and OpcUaBridge with OPC UA reconnection 2026-03-19 11:20:25 -04:00
Joseph Doherty 1b27b89ca0 feat(infra): scaffold LmxFakeProxy project with proto and test project 2026-03-19 11:15:54 -04:00
Joseph Doherty 3e93a0d8c3 docs: add LmxFakeProxy implementation plan with 10 tasks
Detailed task-by-task plan covering scaffolding, TagMapper, SessionManager,
OpcUaBridge, ScadaServiceImpl, Program.cs, Docker, docs, and integration test.
2026-03-19 11:13:51 -04:00
Joseph Doherty e19a568b9b docs: add LmxFakeProxy design — OPC UA-backed test proxy for LmxProxy protocol
Defines a gRPC server implementing the scada.ScadaService proto that bridges
to the existing OPC UA test server. Enables end-to-end testing of
RealLmxProxyClient without a Windows LmxProxy deployment.
2026-03-19 11:08:47 -04:00
592 changed files with 132456 additions and 13132 deletions
+23 -11
View File
@@ -5,17 +5,18 @@ This project contains design documentation for a distributed SCADA system built
## Project Structure
- `README.md` — Master index with component table and architecture diagrams.
- `HighLevelReqs.md` — Complete high-level requirements covering all functional areas.
- `Component-*.md` — Individual component design documents (one per component).
- `docs/requirements/HighLevelReqs.md` — Complete high-level requirements covering all functional areas.
- `docs/requirements/Component-*.md` — Individual component design documents (one per component).
- `docs/test_infra/test_infra.md` — Master test infrastructure doc (OPC UA, LDAP, MS SQL, SMTP, REST API, Traefik).
- `docs/plans/` — Design decision documents from refinement sessions.
- `AkkaDotNet/` — Akka.NET reference documentation and best practices notes.
- `test_infra.md` — Master test infrastructure doc (OPC UA, LDAP, MS SQL).
- `infra/` — Docker Compose and config files for local test services.
- `docker/` — Docker infrastructure for the 8-node cluster topology (2 central + 3 sites). See [`docker/README.md`](docker/README.md) for cluster setup, port allocation, and management commands.
## Document Conventions
- All documents are markdown files in the project root directory.
- Requirements documents (high-level and component-level) live in `docs/requirements/`.
- Test infrastructure documentation lives in `docs/test_infra/`.
- Component documents are named `Component-<Name>.md` (PascalCase, hyphen-separated).
- Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions.
- The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table.
@@ -35,13 +36,13 @@ This project contains design documentation for a distributed SCADA system built
- Use `git diff` to review changes before committing.
- Commit related changes together with a descriptive message summarizing the design decision.
## Current Component List (19 components)
## Current Component List (20 components)
1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs.
2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle.
3. Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream.
4. Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe.
5. CentralSite Communication — Akka.NET ClusterClient/ClusterClientReceptionist, message patterns, debug streaming.
5. CentralSite Communication — Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data), message patterns, debug streaming.
6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication.
7. External System Gateway — External system definitions, API method invocation, database connections.
8. Notification Service — Notification lists, email delivery, store-and-forward integration.
@@ -55,7 +56,8 @@ This project contains design documentation for a distributed SCADA system built
16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts.
17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations.
18. Management Service — Akka.NET actor providing programmatic access to all admin operations, ClusterClientReceptionist registration.
19. CLI — Command-line tool using ClusterClient to interact with Management Service, System.CommandLine, JSON/table output.
19. CLI — Command-line tool using HTTP Management API, System.CommandLine, JSON/table output.
20. Traefik Proxy — Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover.
## Key Design Decisions (for context across sessions)
@@ -78,7 +80,8 @@ This project contains design documentation for a distributed SCADA system built
- Tag path resolution retried periodically for devices still booting.
- Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment).
- All timestamps are UTC throughout the system.
- Inter-cluster communication uses ClusterClient/ClusterClientReceptionist. Both CentralCommunicationActor and SiteCommunicationActor registered with receptionist. Central creates one ClusterClient per site using NodeA/NodeB as contact points. Sites configure multiple central contact points for failover. Addresses cached in CentralCommunicationActor, refreshed periodically (60s) and on admin changes. Heartbeats serve health monitoring only.
- Inter-cluster communication uses two transports: ClusterClient for command/control (deployments, lifecycle, subscribe/unsubscribe handshake, snapshots) and gRPC server-streaming for real-time data (attribute values, alarm states). Both CentralCommunicationActor and SiteCommunicationActor registered with receptionist. Central creates one ClusterClient per site using NodeA/NodeB as contact points. Sites configure multiple central contact points for failover. Addresses cached in CentralCommunicationActor, refreshed periodically (60s) and on admin changes. Heartbeats serve health monitoring only.
- gRPC streaming channel: SiteStreamGrpcServer on each site node (Kestrel HTTP/2, port 8083); central creates per-site SiteStreamGrpcClient via SiteStreamGrpcClientFactory. Site entity has GrpcNodeAAddress/GrpcNodeBAddress fields. Proto: sitestream.proto with SiteStreamService, SiteStreamEvent (oneof: AttributeValueUpdate, AlarmStateUpdate). DebugStreamEvent message removed (no longer flows through ClusterClient).
### External Integrations
- External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth.
@@ -109,9 +112,9 @@ This project contains design documentation for a distributed SCADA system built
### Security & Auth
- Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required.
- JWT sessions: HMAC-SHA256 shared symmetric key, 15-minute expiry with sliding refresh, 30-minute idle timeout.
- Cookie+JWT hybrid sessions: HttpOnly/Secure cookie carries an embedded JWT (HMAC-SHA256 shared symmetric key), 15-minute expiry with sliding refresh, 30-minute idle timeout. Cookies are the correct transport for Blazor Server (SignalR circuits).
- LDAP failure: new logins fail; active sessions continue with current roles.
- Load balancer in front of central UI; JWT + shared Data Protection keys for failover transparency.
- Load balancer in front of central UI; cookie-embedded JWT + shared Data Protection keys for failover transparency.
### Cluster & Failover
- Keep-oldest split-brain resolver with `down-if-alone = on`, 15s stable-after.
@@ -123,7 +126,7 @@ This project contains design documentation for a distributed SCADA system built
### UI & Monitoring
- Central UI: Blazor Server (ASP.NET Core + SignalR) with Bootstrap CSS. No third-party component frameworks (no Blazorise, MudBlazor, Radzen, etc.). Build custom Blazor components for tables, grids, forms, etc.
- UI design: Clean, corporate, internal-use aesthetic. Not flashy. Use the `frontend-design` skill when designing UI pages/components.
- Real-time push for debug view, health dashboard, deployment status.
- Debug view: real-time streaming via DebugStreamBridgeActor + gRPC (events via SiteStreamGrpcClient, snapshot via ClusterClient). Health dashboard: 10s polling timer. Deployment status: real-time push via SignalR.
- Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval.
- Dead letter monitoring as a health metric.
- Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search.
@@ -149,3 +152,12 @@ This project contains design documentation for a distributed SCADA system built
- When consulting with the Codex MCP tool, use model `gpt-5.4`.
- When a task requires setting up or controlling system state (sites, templates, instances, data connections, deployments, security, etc.) and the Central UI is not needed, prefer the ScadaLink CLI over manual DB edits or UI navigation. See [`src/ScadaLink.CLI/README.md`](src/ScadaLink.CLI/README.md) for the full command reference.
### CLI Quick Reference (Docker / OrbStack)
- **Management URL**: `http://localhost:9000` — the CLI connects via the Traefik load balancer, which routes to the active central node. Direct access: central-a on port 9001, central-b on port 9002.
- **Test user**: `--username multi-role --password password` — has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot create templates, data connections, or deploy.
- **Config file**: `~/.scadalink/config.json` — stores `managementUrl` and default format. See `docker/README.md` for a ready-to-use test config.
- **Rebuild cluster**: `bash docker/deploy.sh` — builds the `scadalink:latest` image and recreates all containers. Run this after code changes to ManagementActor, Host, or any server-side component.
- **Infrastructure services**: `cd infra && docker compose up -d` — starts LDAP, MS SQL, OPC UA, SMTP, and REST API. These are separate from the cluster containers in `docker/`.
- **All test LDAP passwords**: `password` (see `infra/glauth/config.toml` for users and groups).
-180
View File
@@ -1,180 +0,0 @@
# Component: Data Connection Layer
## Purpose
The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a **clean data pipe** — it performs no evaluation of triggers, alarm conditions, or business logic.
## Location
Site clusters only. Central does not interact with machines directly.
## Responsibilities
- Manage data connections defined centrally and deployed to sites as part of artifact deployment (OPC UA servers, LmxProxy endpoints). Data connection definitions are stored in local SQLite after deployment.
- Establish and maintain connections to data sources based on deployed instance configurations.
- Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
- Deliver tag value updates to the requesting Instance Actors.
- Support writing values to machines (when Instance Actors forward `SetAttribute` write requests for data-connected attributes).
- Report data connection health status to the Health Monitoring component.
## Common Interface
Both OPC UA and LmxProxy implement the same interface:
```
IDataConnection : IAsyncDisposable
├── Connect(connectionDetails) → void
├── Disconnect() → void
├── Subscribe(tagPath, callback) → subscriptionId
├── Unsubscribe(subscriptionId) → void
├── Read(tagPath) → value
├── ReadBatch(tagPaths) → values
├── Write(tagPath, value) → void
├── WriteBatch(values) → void
├── WriteBatchAndWait(values, flagPath, flagValue, responsePath, responseValue, timeout) → bool
└── Status → ConnectionHealth
```
Additional protocols can be added by implementing this interface.
### Concrete Type Mappings
| IDataConnection | OPC UA SDK | LmxProxy (`RealLmxProxyClient`) |
|---|---|---|
| `Connect()` | OPC UA session establishment | gRPC `Connect` RPC with `x-api-key` metadata header, server returns `SessionId` |
| `Disconnect()` | Close OPC UA session | gRPC `Disconnect` RPC |
| `Subscribe(tagPath, callback)` | OPC UA Monitored Items | gRPC `Subscribe` server-streaming RPC (`stream VtqMessage`), cancelled via `CancellationTokenSource` |
| `Unsubscribe(id)` | Remove Monitored Item | Cancel the `CancellationTokenSource` for that subscription (stops streaming RPC) |
| `Read(tagPath)` | OPC UA Read | gRPC `Read` RPC → `VtqMessage``LmxVtq` |
| `ReadBatch(tagPaths)` | OPC UA Read (multiple nodes) | gRPC `ReadBatch` RPC → `repeated VtqMessage``IDictionary<string, LmxVtq>` |
| `Write(tagPath, value)` | OPC UA Write | gRPC `Write` RPC (throws on failure) |
| `WriteBatch(values)` | OPC UA Write (multiple nodes) | gRPC `WriteBatch` RPC (throws on failure) |
| `WriteBatchAndWait(...)` | OPC UA Write + poll for confirmation | `WriteBatch` + poll `Read` at 100ms intervals until response value matches or timeout |
| `Status` | OPC UA session state | `IsConnected` — true when `SessionId` is non-empty |
### Common Value Type
Both protocols produce the same value tuple consumed by Instance Actors. Before the first value update arrives from the DCL, data-sourced attributes are held at **uncertain** quality by the Instance Actor (see Site Runtime — Initialization):
| Concept | ScadaLink Design | LmxProxy Wire Format | Local Type |
|---|---|---|---|
| Value container | `TagValue(Value, Quality, Timestamp)` | `VtqMessage { Tag, Value, TimestampUtcTicks, Quality }` | `LmxVtq(Value, TimestampUtc, Quality)` — readonly record struct |
| Quality | `QualityCode` enum: Good / Bad / Uncertain | String: `"Good"` / `"Uncertain"` / `"Bad"` | `LmxQuality` enum: Good / Uncertain / Bad |
| Timestamp | `DateTimeOffset` (UTC) | `int64` (DateTime.Ticks, UTC) | `DateTime` (UTC) |
| Value type | `object?` | `string` (parsed by client to double, bool, or string) | `object?` |
## Supported Protocols
### OPC UA
- Standard OPC UA client implementation.
- Supports subscriptions (monitored items) and read/write operations.
### LmxProxy (Custom Protocol)
LmxProxy is a gRPC-based protocol for communicating with LMX data servers. The DCL includes its own proto-generated gRPC client (`RealLmxProxyClient`) — no external SDK dependency.
**Transport & Connection**:
- gRPC over HTTP/2, using proto-generated client stubs from `scada.proto` (service: `scada.ScadaService`). Pre-generated C# files are checked into `Adapters/LmxProxyGrpc/` to avoid running `protoc` in Docker (ARM64 compatibility).
- Default port: **50051**.
- Session-based: `Connect` RPC returns a `SessionId` used for all subsequent operations.
- Keep-alive: Managed by the LmxProxy server's session timeout. The DCL reconnect cycle handles session loss.
**Authentication & TLS**:
- API key-based authentication sent as `x-api-key` gRPC metadata header on every call. The server's `ApiKeyInterceptor` validates the header before the request reaches the service method. The API key is also included in the `ConnectRequest` body for session-level validation.
- Plain HTTP/2 (no TLS) for current deployments. The server supports TLS when configured.
**Subscriptions**:
- Server-streaming gRPC (`Subscribe` RPC returns `stream VtqMessage`).
- Configurable sampling interval (default: 0 = on-change).
- Wire format: `VtqMessage { tag, value (string), timestamp_utc_ticks (int64), quality (string: "Good"/"Uncertain"/"Bad") }`.
- Subscription lifetime managed by `CancellationTokenSource` — cancellation stops the streaming RPC.
**Client Implementation** (`RealLmxProxyClient`):
- Uses `Google.Protobuf` + `Grpc.Net.Client` (standard proto-generated stubs, no protobuf-net runtime IL emit).
- `ILmxProxyClientFactory` creates instances configured with host, port, and API key.
- Value conversion: string values from `VtqMessage` are parsed to `double`, `bool`, or left as `string`.
- Quality mapping: `"Good"``LmxQuality.Good`, `"Uncertain"``LmxQuality.Uncertain`, else `LmxQuality.Bad`.
**Proto Source**: The `.proto` file originates from the LmxProxy server repository (`lmx/Proxy/Grpc/Protos/scada.proto` in ScadaBridge). The C# stubs are pre-generated and stored at `Adapters/LmxProxyGrpc/`.
## Subscription Management
- When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
- The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
- Tag value updates are delivered directly to the requesting Instance Actor.
- When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
- When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.
## Write-Back Support
- When a script calls `Instance.SetAttribute` for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
- The DCL writes the value to the physical device via the appropriate protocol.
- The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
- The Instance Actor's in-memory value is **not** updated until the device confirms the write.
## Value Update Message Format
Each value update delivered to an Instance Actor includes:
- **Tag path**: The relative path of the attribute's data source reference.
- **Value**: The new value from the device.
- **Quality**: Data quality indicator (good, bad, uncertain).
- **Timestamp**: When the value was read from the device.
## Connection Actor Model
Each data connection is managed by a dedicated connection actor that uses the Akka.NET **Become/Stash** pattern to model its lifecycle as a state machine:
- **Connecting**: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are **stashed** (buffered in the actor's stash).
- **Connected**: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
- **Reconnecting**: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.
This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.
**LmxProxy-specific notes**: The `RealLmxProxyClient` holds the `SessionId` returned by the `Connect` RPC and includes it in all subsequent operations. The `LmxProxyDataConnection` adapter has no keep-alive timer — session liveness is handled by the DCL's existing reconnect cycle. Subscriptions use server-streaming gRPC — a background task reads from the `ResponseStream` and invokes the callback for each `VtqMessage`. On connection failure, the DCL actor transitions to **Reconnecting**, disposes the client (which cancels active subscriptions), and retries at the fixed interval.
## Connection Lifecycle & Reconnection
The DCL manages connection lifecycle automatically:
1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. For LmxProxy, the DCL's reconnect cycle owns all recovery — re-establishing the gRPC channel and session after any connection failure. Individual gRPC operations (reads, writes) fail immediately to the caller on error; there is no operation-level retry within the adapter.
3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging.
4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions.
## Write Failure Handling
Writes to physical devices are **synchronous** from the script's perspective:
- If the write fails (connection down, device rejection, timeout), the error is **returned to the calling script**. Script authors can catch and handle write errors (log, notify, retry, etc.).
- Write failures are also logged to Site Event Logging.
- There is **no store-and-forward for device writes** — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.
## Tag Path Resolution
When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):
1. The failure is **logged to Site Event Logging**.
2. The attribute is marked with quality `bad`.
3. The DCL **periodically retries resolution** at a configurable interval, accommodating devices that come online in stages or load modules after startup.
4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.
Note: Pre-deployment validation at central does **not** verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.
## Health Reporting
The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:
- **Connection status**: `connected`, `disconnected`, or `reconnecting` per data connection.
- **Tag resolution counts**: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.
## Dependencies
- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests.
- **Health Monitoring**: Reports connection status.
- **Site Event Logging**: Logs connection status changes.
## Interactions
- **Site Runtime (Instance Actors)**: Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
- **Health Monitoring**: Reports connection health periodically.
- **Site Event Logging**: Logs connection/disconnection events.
+28 -21
View File
@@ -28,31 +28,32 @@ This document serves as the master index for the SCADA system design. The system
## Document Map
### Requirements
- [HighLevelReqs.md](HighLevelReqs.md) — Complete high-level requirements covering all functional areas.
- [HighLevelReqs.md](docs/requirements/HighLevelReqs.md) — Complete high-level requirements covering all functional areas.
### Component Design Documents
| # | Component | Document | Description |
|---|-----------|----------|-------------|
| 1 | Template Engine | [Component-TemplateEngine.md](Component-TemplateEngine.md) | Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, flattening, semantic validation, revision hashing, and diff calculation. |
| 2 | Deployment Manager | [Component-DeploymentManager.md](Component-DeploymentManager.md) | Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status. |
| 3 | Site Runtime | [Component-SiteRuntime.md](Component-SiteRuntime.md) | Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, and site-wide Akka stream with per-subscriber backpressure. |
| 4 | Data Connection Layer | [Component-DataConnectionLayer.md](Component-DataConnectionLayer.md) | Common data connection interface (OPC UA, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry. |
| 5 | CentralSite Communication | [Component-Communication.md](Component-Communication.md) | Akka.NET remoting/cluster topology, 8 message patterns with per-pattern timeouts, application-level correlation IDs, transport heartbeat config, message ordering, connection failure behavior. |
| 6 | Store-and-Forward Engine | [Component-StoreAndForward.md](Component-StoreAndForward.md) | Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites. |
| 7 | External System Gateway | [Component-ExternalSystemGateway.md](Component-ExternalSystemGateway.md) | HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling. |
| 8 | Notification Service | [Component-NotificationService.md](Component-NotificationService.md) | SMTP with OAuth2 (M365) or Basic Auth, BCC delivery, plain text, transient/permanent SMTP error classification, store-and-forward integration. |
| 9 | Central UI | [Component-CentralUI.md](Component-CentralUI.md) | Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows. |
| 10 | Security & Auth | [Component-Security.md](Component-Security.md) | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. |
| 11 | Health Monitoring | [Component-HealthMonitoring.md](Component-HealthMonitoring.md) | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. |
| 12 | Site Event Logging | [Component-SiteEventLogging.md](Component-SiteEventLogging.md) | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. |
| 13 | Cluster Infrastructure | [Component-ClusterInfrastructure.md](Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. |
| 14 | Inbound API | [Component-InboundAPI.md](Component-InboundAPI.md) | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. |
| 15 | Host | [Component-Host.md](Component-Host.md) | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. |
| 16 | Commons | [Component-Commons.md](Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention. |
| 17 | Configuration Database | [Component-ConfigurationDatabase.md](Component-ConfigurationDatabase.md) | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management. |
| 18 | Management Service | [Component-ManagementService.md](Component-ManagementService.md) | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. |
| 19 | CLI | [Component-CLI.md](Component-CLI.md) | Standalone command-line tool, System.CommandLine, Akka.NET ClusterClient transport, LDAP auth, JSON/table output, mirrors all Management Service operations. |
| 1 | Template Engine | [docs/requirements/Component-TemplateEngine.md](docs/requirements/Component-TemplateEngine.md) | Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, flattening, semantic validation, revision hashing, diff calculation, and folder organization (nested folders, drag-drop). |
| 2 | Deployment Manager | [docs/requirements/Component-DeploymentManager.md](docs/requirements/Component-DeploymentManager.md) | Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status. |
| 3 | Site Runtime | [docs/requirements/Component-SiteRuntime.md](docs/requirements/Component-SiteRuntime.md) | Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, and site-wide Akka stream with per-subscriber backpressure. |
| 4 | Data Connection Layer | [docs/requirements/Component-DataConnectionLayer.md](docs/requirements/Component-DataConnectionLayer.md) | Common data connection interface (OPC UA, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry. |
| 5 | CentralSite Communication | [docs/requirements/Component-Communication.md](docs/requirements/Component-Communication.md) | Dual transport: Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data). 8 message patterns with per-pattern timeouts, SiteStreamGrpcServer/Client, application-level correlation IDs, transport heartbeat config, gRPC keepalive, message ordering, connection failure behavior. |
| 6 | Store-and-Forward Engine | [docs/requirements/Component-StoreAndForward.md](docs/requirements/Component-StoreAndForward.md) | Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites. |
| 7 | External System Gateway | [docs/requirements/Component-ExternalSystemGateway.md](docs/requirements/Component-ExternalSystemGateway.md) | HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling. |
| 8 | Notification Service | [docs/requirements/Component-NotificationService.md](docs/requirements/Component-NotificationService.md) | SMTP with OAuth2 (M365) or Basic Auth, BCC delivery, plain text, transient/permanent SMTP error classification, store-and-forward integration. |
| 9 | Central UI | [docs/requirements/Component-CentralUI.md](docs/requirements/Component-CentralUI.md) | Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows. |
| 10 | Security & Auth | [docs/requirements/Component-Security.md](docs/requirements/Component-Security.md) | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. |
| 11 | Health Monitoring | [docs/requirements/Component-HealthMonitoring.md](docs/requirements/Component-HealthMonitoring.md) | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. |
| 12 | Site Event Logging | [docs/requirements/Component-SiteEventLogging.md](docs/requirements/Component-SiteEventLogging.md) | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. |
| 13 | Cluster Infrastructure | [docs/requirements/Component-ClusterInfrastructure.md](docs/requirements/Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. |
| 14 | Inbound API | [docs/requirements/Component-InboundAPI.md](docs/requirements/Component-InboundAPI.md) | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. |
| 15 | Host | [docs/requirements/Component-Host.md](docs/requirements/Component-Host.md) | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. |
| 16 | Commons | [docs/requirements/Component-Commons.md](docs/requirements/Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention. |
| 17 | Configuration Database | [docs/requirements/Component-ConfigurationDatabase.md](docs/requirements/Component-ConfigurationDatabase.md) | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management. |
| 18 | Management Service | [docs/requirements/Component-ManagementService.md](docs/requirements/Component-ManagementService.md) | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. |
| 19 | CLI | [docs/requirements/Component-CLI.md](docs/requirements/Component-CLI.md) | Standalone command-line tool, System.CommandLine, HTTP transport via Management API, JSON/table output, mirrors all Management Service operations. |
| 20 | Traefik Proxy | [docs/requirements/Component-TraefikProxy.md](docs/requirements/Component-TraefikProxy.md) | Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. |
### Reference Documentation
@@ -89,6 +90,8 @@ This document serves as the master index for the SCADA system design. The system
│ └──────────┘ │
│ ┌───────────────────────────────────┐ │
│ │ Akka.NET Communication Layer │ │
│ │ ClusterClient: command/control │ │
│ │ gRPC Client: real-time streams │ │
│ │ (correlation IDs, per-pattern │ │
│ │ timeouts, message ordering) │ │
│ └──────────────┬────────────────────┘ │
@@ -97,7 +100,8 @@ This document serves as the master index for the SCADA system design. The system
│ └───────────────────────────────────┘ (Config DB)│
│ │ Machine Data DB│
└─────────────────┼───────────────────────────────────┘
│ Akka.NET Remoting
│ Akka.NET Remoting (command/control)
│ gRPC HTTP/2 (real-time data, port 8083)
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
@@ -111,6 +115,9 @@ This document serves as the master index for the SCADA system design. The system
│ │Site │ │ │ │Site │ │ │ │Site │ │
│ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │gRPC │ │ │ │gRPC │ │ │ │gRPC │ │
│ │Srvr │ │ │ │Srvr │ │ │ │Srvr │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │S&F │ │ │ │S&F │ │ │ │S&F │ │
│ │Engine│ │ │ │Engine│ │ │ │Engine│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
+1
View File
@@ -41,5 +41,6 @@
<Project Path="tests/ScadaLink.ManagementService.Tests/ScadaLink.ManagementService.Tests.csproj" />
<Project Path="tests/ScadaLink.IntegrationTests/ScadaLink.IntegrationTests.csproj" />
<Project Path="tests/ScadaLink.PerformanceTests/ScadaLink.PerformanceTests.csproj" />
<Project Path="tests/ScadaLink.CentralUI.PlaywrightTests/ScadaLink.CentralUI.PlaywrightTests.csproj" />
</Folder>
</Solution>
+224
View File
@@ -0,0 +1,224 @@
# WinDev — Windows Development VM
Remote Windows 10 VM used for development and testing.
- **ESXi host**: See [esxi.md](/Users/dohertj2/Desktop/netfix/esxi.md) — VM name `WW_DEV_VM` on ESXi 8.0.3 at 10.2.0.12
- **Backup**: See [veeam.md](/Users/dohertj2/Desktop/netfix/veeam.md) — Veeam B&R 12.3 at 10.100.0.30. Dedicated job "Backup WW_DEV_VM" targeting NAS repo. First restore point (2026-03-21) = **Baseline**: Win10 + .NET 10 SDK + .NET Fx 4.8 + Git + 7-Zip + Chrome + Claude Code + csharp-ls.
## Connection Details
| Field | Value |
|-------|-------|
| **Hostname** | DESKTOP-6JL3KKO |
| **IP** | 10.100.0.48 |
| **OS** | Windows 10 Enterprise (10.0.19045), 64-bit |
| **CPU** | Intel Xeon E5-2697 v4 @ 2.30GHz |
| **RAM** | ~12 GB |
| **Disk** | C: 235 GB free / 256 GB total |
| **User** | `dohertj2` (local administrator) |
| **SSH** | OpenSSH Server (passwordless via ed25519 key) |
| **Default shell** | cmd.exe |
## SSH Access
Passwordless SSH is configured. An alias `windev` is set up in `~/.ssh/config`.
```bash
# Connect
ssh windev
# Run a command
ssh windev "hostname"
# Run PowerShell
ssh windev "powershell -Command \"Get-Process\""
```
### SSH Config Entry (`~/.ssh/config`)
```
Host windev
HostName 10.100.0.48
User dohertj2
IdentityFile ~/.ssh/id_ed25519
```
### How Passwordless Auth Works
Since `dohertj2` is in the local Administrators group, Windows OpenSSH uses a special authorized keys file instead of the per-user `~/.ssh/authorized_keys`:
```
C:\ProgramData\ssh\administrators_authorized_keys
```
This is configured in `C:\ProgramData\ssh\sshd_config` via the `Match Group administrators` block. If you need to add another key, append it to that file and ensure ACLs are correct:
```powershell
icacls C:\ProgramData\ssh\administrators_authorized_keys /inheritance:r /grant "Administrators:F" /grant "SYSTEM:F"
```
## File Transfer
```bash
# Copy file to Windows
scp localfile.txt windev:C:/Users/dohertj2/Desktop/
# Copy file from Windows
scp windev:C:/Users/dohertj2/Desktop/file.txt ./
# Copy directory recursively
scp -r ./mydir windev:C:/Users/dohertj2/Desktop/mydir
```
## Running Commands
The default shell is `cmd.exe`. For PowerShell, prefix commands explicitly.
```bash
# cmd (default)
ssh windev "dir C:\Users\dohertj2"
# PowerShell
ssh windev "powershell -Command \"Get-Service | Where-Object { \$_.Status -eq 'Running' }\""
# Multi-line PowerShell script
ssh windev "powershell -File C:\scripts\myscript.ps1"
```
### Service Management
```bash
# List services
ssh windev "sc query state= all"
# Start/stop a service
ssh windev "sc stop ServiceName"
ssh windev "sc start ServiceName"
# Check a specific service
ssh windev "sc query ServiceName"
```
### Process Management
```bash
# List processes
ssh windev "tasklist"
# Kill a process
ssh windev "taskkill /F /PID 1234"
ssh windev "taskkill /F /IM process.exe"
```
## Installed Software
### Package Manager
| Tool | Version | Install Path |
|------|---------|-------------|
| **winget** | v1.28.190 | AppX package |
The `msstore` source has been removed (requires interactive agreement acceptance). Only the `winget` community source is configured. To install packages:
```bash
ssh windev "winget install --id <PackageId> --silent --disable-interactivity"
```
### Development Tools
| Tool | Version | Install Path |
|------|---------|-------------|
| **7-Zip** | 26.00 (x64) | `C:\Program Files\7-Zip\` |
| **.NET Framework** | 4.8.1 (Developer Pack) | GAC / Reference Assemblies (v4.8.1 ref assemblies present) |
| **.NET SDK** | 10.0.201 | `C:\Program Files\dotnet\` |
| **.NET Runtime** | 10.0.5 (Core + ASP.NET + Desktop) | `C:\Program Files\dotnet\` |
| **Git** | 2.53.0.2 | `C:\Program Files\Git\` |
| **Claude Code** | 2.1.81 | `C:\Users\dohertj2\.local\bin\claude.exe` |
Launch with `cc` alias (cmd or Git Bash) which runs `claude --dangerously-skip-permissions --chrome`.
**C# LSP**`csharp-ls` v0.22.0 installed as dotnet global tool (`C:\Users\dohertj2\.dotnet\tools\csharp-ls.exe`). Configured via the `csharp-lsp@claude-plugins-official` plugin. Provides `goToDefinition`, `findReferences`, `hover`, `documentSymbol`, `workspaceSymbol`, `goToImplementation`, and call hierarchy operations on `.cs` files. First invocation in a session is slow (~1-2 min) while the solution loads.
Git is configured with `credential.helper=store` (not GCM — the bundled Git Credential Manager was removed from system config to avoid OAuth/tty issues over SSH). Credentials are stored in `C:\Users\dohertj2\.git-credentials`.
**Gitea** (`gitea.dohertylan.com`) is pre-authenticated — no login prompts. Clone repos with:
```bash
ssh windev "git clone https://gitea.dohertylan.com/dohertj2/<repo>.git C:\src\<repo>"
```
### Applications
| App | Version | Default For |
|-----|---------|-------------|
| **Google Chrome** | 146.0.7680.154 | HTTP, HTTPS, .htm, .html, .pdf |
| **Notepad++** | 8.9.2 | — |
Defaults set via Group Policy `DefaultAssociationsConfiguration` pointing to `C:\Windows\System32\DefaultAssociations.xml`.
### Not Installed
- **Git** — `winget install Git.Git`
- **Python** — `winget install Python.Python.3.12`
- **Visual Studio** — `winget install Microsoft.VisualStudio.2022.BuildTools`
## Network
Single network interface:
| Interface | IP |
|-----------|-----|
| Ethernet0 | 10.100.0.48 (static) |
## Backup (Veeam)
Veeam job "Backup WW_DEV_VM" on the Veeam server (10.100.0.30). Targets the NAS repo (`nfs41://10.50.0.25:/mnt/mypool/veeam`).
```bash
# Incremental backup (changed blocks only)
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Start-VBRJob -Job (Get-VBRJob -Name 'Backup WW_DEV_VM')\""
# Full backup
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Start-VBRJob -Job (Get-VBRJob -Name 'Backup WW_DEV_VM') -FullBackup\""
# Check status
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; (Get-VBRJob -Name 'Backup WW_DEV_VM').FindLastSession() | Select-Object State, Result, CreationTime, EndTime\""
# List restore points
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Get-VBRRestorePoint -Backup (Get-VBRBackup -Name 'Backup WW_DEV_VM') | Select-Object CreationTime, Type, @{N='SizeGB';E={[math]::Round(\`$_.ApproxSize/1GB,2)}} | Format-Table -AutoSize\""
```
### Restore Points
| ID | Date | Type | Notes |
|----|------|------|-------|
| `f2cd44a9` | 2026-03-21 14:28 | Full | **Baseline** — Win10 + .NET 10 SDK + .NET Fx 4.8 + Git + 7-Zip + Chrome + Claude Code + csharp-ls (old UUID) |
| `2879a744` | 2026-03-21 15:15 | Increment | UUID fixed to `1BFC4D56-8DFA-A897-D1E4-BF1FD7F0096C`, static IP 10.100.0.48 |
| `b4e87cfe` | 2026-03-21 16:43 | Increment | **Pre-licensing** — Notepad++ added, firewall/Defender disabled, licensing backups staged |
| `f38a8aed` | 2026-03-21 17:01 | Increment | **Post-licensing** — WPS2020 licensing applied and verified working |
## Troubleshooting
### "Permission denied" on SSH key auth
Windows OpenSSH is strict about file permissions on `administrators_authorized_keys`. Re-run:
```powershell
icacls C:\ProgramData\ssh\administrators_authorized_keys /inheritance:r /grant "Administrators:F" /grant "SYSTEM:F"
```
### Host key changed error
If the VM is rebuilt, clear the old key:
```bash
ssh-keygen -R 10.100.0.48
```
### Firewall blocking SSH
If the VM becomes unreachable, RDP in and check Windows Firewall or disable it:
```powershell
Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False
```
+1 -1
View File
@@ -22,7 +22,7 @@ COPY src/ScadaLink.InboundAPI/ScadaLink.InboundAPI.csproj src/ScadaLink.InboundA
COPY src/ScadaLink.ConfigurationDatabase/ScadaLink.ConfigurationDatabase.csproj src/ScadaLink.ConfigurationDatabase/
COPY src/ScadaLink.ManagementService/ScadaLink.ManagementService.csproj src/ScadaLink.ManagementService/
# Restore NuGet packages via Host project (follows ProjectReferences to all 17 dependencies)
# Restore NuGet packages via Host project (follows ProjectReferences to all dependencies)
# This layer is cached until any .csproj changes — source-only changes skip restore entirely
RUN dotnet restore src/ScadaLink.Host/ScadaLink.Host.csproj
+48 -17
View File
@@ -5,7 +5,12 @@ Local Docker deployment of the full ScadaLink cluster topology: a 2-node central
## Cluster Topology
```
┌─────────────────────────────────────────────────────┐
───────────────────┐
│ Traefik LB :9000 │ ◄── CLI / Browser
│ Dashboard :8180 │
└────────┬──────────┘
│ routes to active node
┌──────────────────────┼──────────────────────────────┐
│ Central Cluster │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
@@ -24,7 +29,8 @@ Local Docker deployment of the full ScadaLink cluster topology: a 2-node central
│ (Test Plant A) │ │ (Test Plant B) │ │ (Test Plant C) │
│ │ │ │ │ │
│ node-a ◄──► node-b│ │ node-a ◄──► node-b│ │ node-a ◄──► node-b│
│ :9021 :9022 │ │ :9031 :9032 │ │ :9041 :9042 │
Akka :9021 :9022 │ │ Akka :9031 :9032 │ │ Akka :9041 :9042 │
│ gRPC :9023 :9024 │ │ gRPC :9033 :9034 │ │ gRPC :9043 :9044 │
└────────────────────┘ └────────────────────┘ └────────────────────┘
```
@@ -34,7 +40,7 @@ Runs the web UI (Blazor Server), Template Engine, Deployment Manager, Security,
### Site Clusters (active/standby each)
Each site cluster runs Site Runtime, Data Connection Layer, Store-and-Forward, and Site Event Logging. Sites connect to OPC UA for device data and to the central cluster via Akka.NET remoting. Deployed configurations and S&F buffers are stored in local SQLite databases per node.
Each site cluster runs Site Runtime, Data Connection Layer, Store-and-Forward, and Site Event Logging. Sites connect to OPC UA for device data and to the central cluster via Akka.NET remoting. Each site node also hosts a gRPC streaming server (port 8083) that central nodes connect to for real-time attribute value and alarm state streams. Deployed configurations and S&F buffers are stored in local SQLite databases per node.
| Site Cluster | Site Identifier | Central UI Name |
|-------------|-----------------|-----------------|
@@ -46,18 +52,19 @@ Each site cluster runs Site Runtime, Data Connection Layer, Store-and-Forward, a
### Application Nodes
| Node | Container Name | Host Web Port | Host Akka Port | Internal Ports |
|------|---------------|---------------|----------------|----------------|
| Central A | `scadalink-central-a` | 9001 | 9011 | 5000 (web), 8081 (Akka) |
| Central B | `scadalink-central-b` | 9002 | 9012 | 5000 (web), 8081 (Akka) |
| Site-A A | `scadalink-site-a-a` | | 9021 | 8082 (Akka) |
| Site-A B | `scadalink-site-a-b` | — | 9022 | 8082 (Akka) |
| Site-B A | `scadalink-site-b-a` | — | 9031 | 8082 (Akka) |
| Site-B B | `scadalink-site-b-b` | — | 9032 | 8082 (Akka) |
| Site-C A | `scadalink-site-c-a` | — | 9041 | 8082 (Akka) |
| Site-C B | `scadalink-site-c-b` | — | 9042 | 8082 (Akka) |
| Node | Container Name | Host Web Port | Host Akka Port | Host gRPC Port | Internal Ports |
|------|---------------|---------------|----------------|----------------|----------------|
| Traefik LB | `scadalink-traefik` | 9000 | — | — | 80 (proxy), 8080 (dashboard) |
| Central A | `scadalink-central-a` | 9001 | 9011 | — | 5000 (web), 8081 (Akka) |
| Central B | `scadalink-central-b` | 9002 | 9012 | — | 5000 (web), 8081 (Akka) |
| Site-A A | `scadalink-site-a-a` | — | 9021 | 9023 | 8082 (Akka), 8083 (gRPC) |
| Site-A B | `scadalink-site-a-b` | — | 9022 | 9024 | 8082 (Akka), 8083 (gRPC) |
| Site-B A | `scadalink-site-b-a` | — | 9031 | 9033 | 8082 (Akka), 8083 (gRPC) |
| Site-B B | `scadalink-site-b-b` | — | 9032 | 9034 | 8082 (Akka), 8083 (gRPC) |
| Site-C A | `scadalink-site-c-a` | — | 9041 | 9043 | 8082 (Akka), 8083 (gRPC) |
| Site-C B | `scadalink-site-c-b` | — | 9042 | 9044 | 8082 (Akka), 8083 (gRPC) |
Port block pattern: `90X1`/`90X2` where X = 0 (central), 1 (web), 2 (site-a), 3 (site-b), 4 (site-c).
Port block pattern: `90X1`/`90X2` (Akka), `90X3`/`90X4` (gRPC) where X = 0 (central), 2 (site-a), 3 (site-b), 4 (site-c). gRPC streaming ports are used by central nodes to subscribe to real-time site data streams.
### Infrastructure Services (from `infra/docker-compose.yml`)
@@ -79,6 +86,7 @@ docker/
├── docker-compose.yml # 8-node application stack
├── build.sh # Build Docker image
├── deploy.sh # Build + deploy all containers
├── seed-sites.sh # Create test sites with Akka + gRPC addresses
├── teardown.sh # Stop and remove containers
├── central-node-a/
│ ├── appsettings.Central.json # Central node A configuration
@@ -124,6 +132,9 @@ cd infra && docker compose up -d && cd ..
# 2. Build and deploy all 8 ScadaLink nodes
docker/deploy.sh
# 3. Seed test sites (first-time only, after cluster is healthy)
docker/seed-sites.sh
```
### After Code Changes
@@ -185,12 +196,32 @@ curl -s http://localhost:9002/health/ready | python3 -m json.tool
### CLI Access
Connect the ScadaLink CLI to the central cluster via host-mapped Akka remoting ports:
The CLI connects to the Central Host's HTTP management API via the Traefik load balancer at `http://localhost:9000`, which routes to the active central node:
```bash
dotnet run --project src/ScadaLink.CLI -- \
--contact-points akka.tcp://scadalink@localhost:9011 \
--username admin --password password \
--url http://localhost:9000 \
--username multi-role --password password \
template list
```
Direct access to individual nodes is also available at `http://localhost:9001` (central-a) and `http://localhost:9002` (central-b).
> **Note:** The `multi-role` test user has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot perform design or deployment operations. See `infra/glauth/config.toml` for all test users and their group memberships.
A recommended `~/.scadalink/config.json` for the Docker test environment:
```json
{
"managementUrl": "http://localhost:9000"
}
```
With this config file in place, the URL is automatic:
```bash
dotnet run --project src/ScadaLink.CLI -- \
--username multi-role --password password \
template list
```
+10 -5
View File
@@ -18,10 +18,15 @@ docker compose -f "$SCRIPT_DIR/docker-compose.yml" ps
echo ""
echo "Access points:"
echo " Central UI (node A): http://localhost:9001"
echo " Central UI (node B): http://localhost:9002"
echo " Health check: http://localhost:9001/health/ready"
echo " CLI contact points: akka.tcp://scadalink@localhost:9011"
echo " akka.tcp://scadalink@localhost:9012"
echo " Central (Traefik LB): http://localhost:9000"
echo " Central UI (node A): http://localhost:9001"
echo " Central UI (node B): http://localhost:9002"
echo " Health check: http://localhost:9001/health/ready"
echo " Active node check: http://localhost:9001/health/active"
echo " Traefik dashboard: http://localhost:8180"
echo " Management API: http://localhost:9000/management"
echo ""
echo "To seed test sites (first-time setup):"
echo " docker/seed-sites.sh"
echo ""
echo "Logs: docker compose -f $SCRIPT_DIR/docker-compose.yml logs -f"
+19
View File
@@ -40,6 +40,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9021:8082" # Akka remoting (host access for debugging)
- "9023:8083" # gRPC streaming
volumes:
- ./site-a-node-a/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-a-node-a/data:/app/data
@@ -55,6 +56,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9022:8082" # Akka remoting
- "9024:8083" # gRPC streaming
volumes:
- ./site-a-node-b/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-a-node-b/data:/app/data
@@ -70,6 +72,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9031:8082" # Akka remoting
- "9033:8083" # gRPC streaming
volumes:
- ./site-b-node-a/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-b-node-a/data:/app/data
@@ -85,6 +88,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9032:8082" # Akka remoting
- "9034:8083" # gRPC streaming
volumes:
- ./site-b-node-b/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-b-node-b/data:/app/data
@@ -100,6 +104,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9041:8082" # Akka remoting
- "9043:8083" # gRPC streaming
volumes:
- ./site-c-node-a/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-c-node-a/data:/app/data
@@ -115,6 +120,7 @@ services:
SCADALINK_CONFIG: Site
ports:
- "9042:8082" # Akka remoting
- "9044:8083" # gRPC streaming
volumes:
- ./site-c-node-b/appsettings.Site.json:/app/appsettings.Site.json:ro
- ./site-c-node-b/data:/app/data
@@ -123,6 +129,19 @@ services:
- scadalink-net
restart: unless-stopped
traefik:
image: traefik:v3.4
container_name: scadalink-traefik
ports:
- "9000:80" # Central load-balanced entrypoint
- "8180:8080" # Traefik dashboard
volumes:
- ./traefik/traefik.yml:/etc/traefik/traefik.yml:ro
- ./traefik/dynamic.yml:/etc/traefik/dynamic.yml:ro
networks:
- scadalink-net
restart: unless-stopped
networks:
scadalink-net:
external: true
+92
View File
@@ -0,0 +1,92 @@
#!/usr/bin/env bash
#
# Regenerates the gRPC C# files from sitestream.proto.
#
# Background: protoc (linux/arm64) segfaults inside our Docker build container
# (Grpc.Tools 2.71.0). As a workaround the generated Sitestream.cs +
# SitestreamGrpc.cs are checked into src/ScadaLink.Communication/SiteStreamGrpc/
# and the Protobuf ItemGroup in the .csproj is commented out — Docker just
# compiles the checked-in C# files.
#
# Run this script ON YOUR DEV MACHINE whenever Protos/sitestream.proto changes:
#
# 1. Temporarily uncomments the Protobuf ItemGroup so Grpc.Tools runs.
# 2. dotnet build (regen writes fresh files to obj/).
# 3. Copies the regenerated files back into SiteStreamGrpc/.
# 4. Re-comments the Protobuf ItemGroup so Docker builds stay safe.
#
# Once we move to a Dockerfile base image that ships a working linux/arm64
# protoc, this script can be retired and Docker can regen the proto on every
# build like every other normal .NET project.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
COMM_DIR="$REPO_ROOT/src/ScadaLink.Communication"
CSPROJ="$COMM_DIR/ScadaLink.Communication.csproj"
GEN_DIR="$COMM_DIR/SiteStreamGrpc"
echo "=== Regenerating gRPC files from sitestream.proto ==="
if [[ ! -f "$CSPROJ" ]]; then
echo "ERROR: csproj not found at $CSPROJ" >&2
exit 1
fi
# Backup so we can always restore the comment state on failure.
BACKUP="$(mktemp)"
cp "$CSPROJ" "$BACKUP"
trap 'cp "$BACKUP" "$CSPROJ"; rm -f "$BACKUP"; echo "Restored csproj from backup."' ERR
# 1. Uncomment the Protobuf ItemGroup (strip the surrounding <!-- ... --> wrapper).
python3 - <<PY
import re, pathlib
p = pathlib.Path("$CSPROJ")
src = p.read_text()
# Find the commented Protobuf block and unwrap it.
new = re.sub(
r"<!--\s*\n(\s*<ItemGroup>\s*\n\s*<Protobuf [^>]*/>\s*\n\s*</ItemGroup>)\s*\n\s*-->",
r"\1",
src,
count=1,
)
if new == src:
raise SystemExit("Couldn't find commented Protobuf ItemGroup to enable.")
p.write_text(new)
PY
# 2. Delete the stale files so any failure to regen is obvious.
rm -f "$GEN_DIR/Sitestream.cs" "$GEN_DIR/SitestreamGrpc.cs"
# 3. Regenerate by building.
echo "Building Communication project (regen)..."
dotnet build "$CSPROJ" --nologo -v minimal | tail -5
# 4. Copy generated files back into the source tree.
mkdir -p "$GEN_DIR"
cp "$COMM_DIR/obj/Debug/net10.0/Protos/Sitestream.cs" "$GEN_DIR/Sitestream.cs"
cp "$COMM_DIR/obj/Debug/net10.0/Protos/SitestreamGrpc.cs" "$GEN_DIR/SitestreamGrpc.cs"
echo "Copied regenerated files to $GEN_DIR/"
# 5. Re-comment the Protobuf ItemGroup so Docker builds keep working.
python3 - <<PY
import re, pathlib
p = pathlib.Path("$CSPROJ")
src = p.read_text()
new = re.sub(
r"(\s*<ItemGroup>\s*\n\s*<Protobuf [^>]*/>\s*\n\s*</ItemGroup>)",
r"\n <!--\1\n -->",
src,
count=1,
)
p.write_text(new)
PY
rm -f "$BACKUP"
trap - ERR
echo ""
echo "Done. Review and commit:"
echo " git diff src/ScadaLink.Communication/Protos/sitestream.proto"
echo " git diff src/ScadaLink.Communication/SiteStreamGrpc/"
+62
View File
@@ -0,0 +1,62 @@
#!/bin/bash
set -euo pipefail
# Seed the three test sites with Akka and gRPC addresses.
# Run after deploy.sh once the central cluster is healthy.
#
# Prerequisites:
# - Infrastructure services running (infra/docker-compose up -d)
# - Application containers running (docker/deploy.sh)
# - Central cluster healthy (curl http://localhost:9000/health/ready)
#
# Usage:
# docker/seed-sites.sh
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
CLI="dotnet run --project $PROJECT_ROOT/src/ScadaLink.CLI --"
AUTH="--username multi-role --password password"
URL="--url http://localhost:9000"
echo "=== Seeding ScadaLink Sites ==="
echo ""
echo "Creating Site-A (Test Plant A)..."
$CLI $URL $AUTH site create \
--name "Test Plant A" \
--identifier "site-a" \
--description "Test site A - two-node cluster" \
--node-a-address "akka.tcp://scadalink@scadalink-site-a-a:8082" \
--node-b-address "akka.tcp://scadalink@scadalink-site-a-b:8082" \
--grpc-node-a-address "http://scadalink-site-a-a:8083" \
--grpc-node-b-address "http://scadalink-site-a-b:8083" \
|| echo " (Site-A may already exist)"
echo ""
echo "Creating Site-B (Test Plant B)..."
$CLI $URL $AUTH site create \
--name "Test Plant B" \
--identifier "site-b" \
--description "Test site B - two-node cluster" \
--node-a-address "akka.tcp://scadalink@scadalink-site-b-a:8082" \
--node-b-address "akka.tcp://scadalink@scadalink-site-b-b:8082" \
--grpc-node-a-address "http://scadalink-site-b-a:8083" \
--grpc-node-b-address "http://scadalink-site-b-b:8083" \
|| echo " (Site-B may already exist)"
echo ""
echo "Creating Site-C (Test Plant C)..."
$CLI $URL $AUTH site create \
--name "Test Plant C" \
--identifier "site-c" \
--description "Test site C - two-node cluster" \
--node-a-address "akka.tcp://scadalink@scadalink-site-c-a:8082" \
--node-b-address "akka.tcp://scadalink@scadalink-site-c-b:8082" \
--grpc-node-a-address "http://scadalink-site-c-a:8083" \
--grpc-node-b-address "http://scadalink-site-c-b:8083" \
|| echo " (Site-C may already exist)"
echo ""
echo "=== Site seeding complete ==="
echo ""
echo "Verify with: $CLI $URL $AUTH site list"
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-a-a",
"SiteId": "site-a",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-a-b",
"SiteId": "site-a",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-b-a",
"SiteId": "site-b",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-b-b",
"SiteId": "site-b",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-c-a",
"SiteId": "site-c",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+2 -1
View File
@@ -4,7 +4,8 @@
"Role": "Site",
"NodeHostname": "scadalink-site-c-b",
"SiteId": "site-c",
"RemotingPort": 8082
"RemotingPort": 8082,
"GrpcPort": 8083
},
"Cluster": {
"SeedNodes": [
+18
View File
@@ -0,0 +1,18 @@
http:
routers:
central:
rule: "PathPrefix(`/`)"
service: central
entryPoints:
- web
services:
central:
loadBalancer:
healthCheck:
path: /health/active
interval: 5s
timeout: 3s
servers:
- url: "http://scadalink-central-a:5000"
- url: "http://scadalink-central-b:5000"
+11
View File
@@ -0,0 +1,11 @@
entryPoints:
web:
address: ":80"
api:
dashboard: true
insecure: true
providers:
file:
filename: /etc/traefik/dynamic.yml
@@ -1,7 +1,7 @@
# Cluster Infrastructure Refinement — Design
**Date**: 2026-03-16
**Component**: Cluster Infrastructure (`Component-ClusterInfrastructure.md`)
**Component**: Cluster Infrastructure (`docs/requirements/Component-ClusterInfrastructure.md`)
**Status**: Approved
## Problem
@@ -33,7 +33,7 @@ The Cluster Infrastructure doc covered topology and failover behavior but lacked
| Document | Change |
|----------|--------|
| `Component-ClusterInfrastructure.md` | Added 3 new sections: Split-Brain Resolution, Failure Detection Timing, Dual-Node Recovery. Updated Node Configuration to clarify both-as-seed. |
| `docs/requirements/Component-ClusterInfrastructure.md` | Added 3 new sections: Split-Brain Resolution, Failure Detection Timing, Dual-Node Recovery. Updated Node Configuration to clarify both-as-seed. |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Communication Layer Refinement — Design
**Date**: 2026-03-16
**Component**: CentralSite Communication (`Component-Communication.md`)
**Component**: CentralSite Communication (`docs/requirements/Component-Communication.md`)
**Status**: Approved
## Problem
@@ -36,7 +36,7 @@ The Communication Layer doc defined 8 message patterns clearly but lacked specif
| Document | Change |
|----------|--------|
| `Component-Communication.md` | Added 4 new sections: Message Timeouts, Transport Configuration, Message Ordering, Connection Failure Behavior |
| `docs/requirements/Component-Communication.md` | Added 4 new sections: Message Timeouts, Transport Configuration, Message Ordering, Connection Failure Behavior |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Data Connection Layer Refinement — Design
**Date**: 2026-03-16
**Component**: Data Connection Layer (`Component-DataConnectionLayer.md`)
**Component**: Data Connection Layer (`docs/requirements/Component-DataConnectionLayer.md`)
**Status**: Approved
## Problem
@@ -38,9 +38,9 @@ The Data Connection Layer doc covered the happy path (interface, subscriptions,
| Document | Change |
|----------|--------|
| `Component-DataConnectionLayer.md` | Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting |
| `Component-HealthMonitoring.md` | Added tag resolution counts to monitored metrics table |
| `Component-SiteRuntime.md` | Updated SetAttribute description to note synchronous write failure errors |
| `docs/requirements/Component-DataConnectionLayer.md` | Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting |
| `docs/requirements/Component-HealthMonitoring.md` | Added tag resolution counts to monitored metrics table |
| `docs/requirements/Component-SiteRuntime.md` | Updated SetAttribute description to note synchronous write failure errors |
## Alternatives Considered
@@ -1,7 +1,7 @@
# External System Gateway Refinement — Design
**Date**: 2026-03-16
**Component**: External System Gateway (`Component-ExternalSystemGateway.md`)
**Component**: External System Gateway (`docs/requirements/Component-ExternalSystemGateway.md`)
**Status**: Approved
## Problem
@@ -45,10 +45,10 @@ The External System Gateway doc lacked specification for the invocation protocol
| Document | Change |
|----------|--------|
| `Component-ExternalSystemGateway.md` | Updated External System Definition fields. Added sections: External System Call Modes (dual-mode API), Invocation Protocol, Call Timeout & Error Handling, Database Connection Management. |
| `Component-StoreAndForward.md` | Clarified that only transient failures are buffered; 4xx errors are not queued. |
| `Component-SiteRuntime.md` | Updated Script Runtime API with `ExternalSystem.Call()` and `ExternalSystem.CachedCall()`. |
| `HighLevelReqs.md` | Updated script capabilities (Section 4.4) to reflect dual call modes. |
| `docs/requirements/Component-ExternalSystemGateway.md` | Updated External System Definition fields. Added sections: External System Call Modes (dual-mode API), Invocation Protocol, Call Timeout & Error Handling, Database Connection Management. |
| `docs/requirements/Component-StoreAndForward.md` | Clarified that only transient failures are buffered; 4xx errors are not queued. |
| `docs/requirements/Component-SiteRuntime.md` | Updated Script Runtime API with `ExternalSystem.Call()` and `ExternalSystem.CachedCall()`. |
| `docs/requirements/HighLevelReqs.md` | Updated script capabilities (Section 4.4) to reflect dual call modes. |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Health Monitoring Refinement — Design
**Date**: 2026-03-16
**Component**: Health Monitoring (`Component-HealthMonitoring.md`)
**Component**: Health Monitoring (`docs/requirements/Component-HealthMonitoring.md`)
**Status**: Approved
## Problem
@@ -35,7 +35,7 @@ The Health Monitoring doc listed metrics and described the reporting concept but
| Document | Change |
|----------|--------|
| `Component-HealthMonitoring.md` | Expanded Reporting Protocol with concrete defaults and offline/online logic. Added Error Rate Metrics section. |
| `docs/requirements/Component-HealthMonitoring.md` | Expanded Reporting Protocol with concrete defaults and offline/online logic. Added Error Rate Metrics section. |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Inbound API Refinement — Design
**Date**: 2026-03-16
**Component**: Inbound API (`Component-InboundAPI.md`)
**Component**: Inbound API (`docs/requirements/Component-InboundAPI.md`)
**Status**: Approved
## Problem
@@ -34,8 +34,8 @@ The Inbound API doc had good coverage of authentication, method definitions, and
| Document | Change |
|----------|--------|
| `Component-InboundAPI.md` | Added HTTP Contract section (URL structure, API key header, request/response format, extended type system). Added API Call Logging section. Updated Authentication Details to be definitive about X-API-Key header. |
| `Component-ExternalSystemGateway.md` | Updated method definition parameter/return types to note extended type system support. |
| `docs/requirements/Component-InboundAPI.md` | Added HTTP Contract section (URL structure, API key header, request/response format, extended type system). Added API Call Logging section. Updated Authentication Details to be definitive about X-API-Key header. |
| `docs/requirements/Component-ExternalSystemGateway.md` | Updated method definition parameter/return types to note extended type system support. |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Notification Service Refinement — Design
**Date**: 2026-03-16
**Component**: Notification Service (`Component-NotificationService.md`)
**Component**: Notification Service (`docs/requirements/Component-NotificationService.md`)
**Status**: Approved
## Problem
@@ -32,7 +32,7 @@ The Notification Service doc covered notification lists and the script API but l
| Document | Change |
|----------|--------|
| `Component-NotificationService.md` | Expanded SMTP configuration with OAuth2 support, connection settings. Added Email Delivery Behavior section (recipient handling, error classification, rate limiting). Specified plain text content. |
| `docs/requirements/Component-NotificationService.md` | Expanded SMTP configuration with OAuth2 support, connection settings. Added Email Delivery Behavior section (recipient handling, error classification, rate limiting). Specified plain text content. |
## Alternatives Considered
@@ -1,7 +1,7 @@
# Security & Auth Refinement — Design
**Date**: 2026-03-16
**Component**: Security & Auth (`Component-Security.md`)
**Component**: Security & Auth (`docs/requirements/Component-Security.md`)
**Status**: Approved
## Problem
@@ -36,8 +36,8 @@ The Security & Auth doc defined roles and LDAP mapping but lacked specification
| Document | Change |
|----------|--------|
| `Component-Security.md` | Replaced Windows Integrated Auth with direct LDAP bind. Added Session Management, Token Lifecycle, Load Balancer Compatibility, and LDAP Connection Failure sections. |
| `HighLevelReqs.md` | Updated authentication description (Section 9.1) to reflect username/password with JWT. |
| `docs/requirements/Component-Security.md` | Replaced Windows Integrated Auth with direct LDAP bind. Added Session Management, Token Lifecycle, Load Balancer Compatibility, and LDAP Connection Failure sections. |
| `docs/requirements/HighLevelReqs.md` | Updated authentication description (Section 9.1) to reflect username/password with JWT. |
## Alternatives Considered
@@ -0,0 +1,634 @@
# gRPC Streaming Channel Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Replace ClusterClient-based debug event streaming with a dedicated gRPC server-streaming channel from site nodes to central, following the design in `docs/plans/grpc_streams.md`.
**Architecture:** Each site node runs a gRPC server (`SiteStreamGrpcServer`) on a dedicated HTTP/2 port. Central creates per-site gRPC clients (`SiteStreamGrpcClient`) that open server-streaming subscriptions filtered by instance name. Events flow: SiteStreamManager → relay actor → Channel\<T\> → gRPC stream → central callback → DebugStreamBridgeActor → SignalR/Blazor. ClusterClient continues handling command/control (subscribe/unsubscribe/snapshot).
**Tech Stack:** .NET 10, gRPC (Grpc.AspNetCore + Grpc.Net.Client), Protocol Buffers, Akka.NET, ASP.NET Core Kestrel, System.Threading.Channels
**Design Reference:** `docs/plans/grpc_streams.md` — full architecture, proto definition, failover, keepalive, backpressure, and review notes.
---
### Task 0: Proto Definition & Stub Generation
**Files:**
- Create: `src/ScadaLink.Communication/Protos/sitestream.proto`
- Create: `src/ScadaLink.Communication/SiteStreamGrpc/` (generated stubs)
- Modify: `src/ScadaLink.Communication/ScadaLink.Communication.csproj`
**Step 1: Create the proto file**
Create `src/ScadaLink.Communication/Protos/sitestream.proto` with the proto definition from `docs/plans/grpc_streams.md` "Proto Improvements" section (V1 review notes version with enums and `google.protobuf.Timestamp`):
```protobuf
syntax = "proto3";
option csharp_namespace = "ScadaLink.Communication.Grpc";
package sitestream;
import "google/protobuf/timestamp.proto";
service SiteStreamService {
rpc SubscribeInstance(InstanceStreamRequest) returns (stream SiteStreamEvent);
}
message InstanceStreamRequest {
string correlation_id = 1;
string instance_unique_name = 2;
}
message SiteStreamEvent {
string correlation_id = 1;
oneof event {
AttributeValueUpdate attribute_changed = 2;
AlarmStateUpdate alarm_changed = 3;
}
}
enum Quality {
QUALITY_UNSPECIFIED = 0;
QUALITY_GOOD = 1;
QUALITY_UNCERTAIN = 2;
QUALITY_BAD = 3;
}
enum AlarmStateEnum {
ALARM_STATE_UNSPECIFIED = 0;
ALARM_STATE_NORMAL = 1;
ALARM_STATE_ACTIVE = 2;
}
message AttributeValueUpdate {
string instance_unique_name = 1;
string attribute_path = 2;
string attribute_name = 3;
string value = 4;
Quality quality = 5;
google.protobuf.Timestamp timestamp = 6;
}
message AlarmStateUpdate {
string instance_unique_name = 1;
string alarm_name = 2;
AlarmStateEnum state = 3;
int32 priority = 4;
google.protobuf.Timestamp timestamp = 5;
}
```
**Step 2: Add gRPC NuGet packages**
Add to `src/ScadaLink.Communication/ScadaLink.Communication.csproj`:
```xml
<PackageReference Include="Grpc.AspNetCore" Version="2.71.0" />
<PackageReference Include="Grpc.Net.Client" Version="2.71.0" />
<PackageReference Include="Google.Protobuf" Version="3.29.3" />
<PackageReference Include="Grpc.Tools" Version="2.71.0" PrivateAssets="All" />
```
Also add `<FrameworkReference Include="Microsoft.AspNetCore.App" />` if not already present (needed for `Grpc.AspNetCore`).
**Step 3: Generate C# stubs**
Run `protoc` locally to generate stubs. Check generated files into `src/ScadaLink.Communication/SiteStreamGrpc/` — pre-generated and checked in, no `protoc` at build time.
**Step 4: Verify build**
Run: `dotnet build src/ScadaLink.Communication/`
Expected: Build succeeded, 0 errors
**Step 5: Write proto roundtrip tests**
Create `tests/ScadaLink.Communication.Tests/Grpc/ProtoRoundtripTests.cs`:
- Test `AttributeValueUpdate` serialization/deserialization with all Quality enum values
- Test `AlarmStateUpdate` serialization/deserialization with all AlarmStateEnum values
- Test `SiteStreamEvent` oneof discrimination (attribute vs alarm)
- Test `google.protobuf.Timestamp` conversion to/from `DateTimeOffset`
**Step 6: Run tests**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: All pass including new proto tests
**Step 7: Commit**
```bash
git add src/ScadaLink.Communication/Protos/ src/ScadaLink.Communication/SiteStreamGrpc/ src/ScadaLink.Communication/ScadaLink.Communication.csproj tests/ScadaLink.Communication.Tests/
git commit -m "feat: add sitestream.proto definition and generated gRPC stubs"
```
---
### Task 1: Site Config — GrpcPort in NodeOptions
**Files:**
- Modify: `src/ScadaLink.Host/NodeOptions.cs:8`
- Modify: `src/ScadaLink.Host/StartupValidator.cs:43-48`
- Modify: `src/ScadaLink.Host/appsettings.Site.json:7`
- Test: `tests/ScadaLink.Host.Tests/`
**Step 1: Write failing test for GrpcPort validation**
Add to existing startup validator tests: test that a site node with `GrpcPort` outside 1-65535 fails validation, and that a valid `GrpcPort` passes.
**Step 2: Run test to verify it fails**
Run: `dotnet test tests/ScadaLink.Host.Tests/`
Expected: New test FAILS (GrpcPort not validated yet)
**Step 3: Add GrpcPort to NodeOptions**
In `src/ScadaLink.Host/NodeOptions.cs`, add:
```csharp
public int GrpcPort { get; set; } = 8083;
```
**Step 4: Add validation in StartupValidator**
In `src/ScadaLink.Host/StartupValidator.cs`, after the existing site validation block (line ~43):
```csharp
if (role == "Site")
{
var grpcPortStr = nodeSection["GrpcPort"];
if (grpcPortStr != null && (!int.TryParse(grpcPortStr, out var gp) || gp < 1 || gp > 65535))
errors.Add("ScadaLink:Node:GrpcPort must be 1-65535");
}
```
**Step 5: Add GrpcPort to appsettings.Site.json**
In `src/ScadaLink.Host/appsettings.Site.json`, add after `"RemotingPort": 8082`:
```json
"GrpcPort": 8083
```
**Step 6: Run tests**
Run: `dotnet test tests/ScadaLink.Host.Tests/`
Expected: All pass
**Step 7: Commit**
```bash
git add src/ScadaLink.Host/ tests/ScadaLink.Host.Tests/
git commit -m "feat: add GrpcPort config to NodeOptions with startup validation"
```
---
### Task 2: Site Entity — gRPC Address Fields
**Files:**
- Modify: `src/ScadaLink.Commons/Entities/Sites/Site.cs:9-10`
- Modify: `src/ScadaLink.Commons/Messages/Management/SiteCommands.cs:5-6`
- Modify: `src/ScadaLink.ConfigurationDatabase/` (migration)
- Modify: `src/ScadaLink.ManagementService/ManagementActor.cs` (handlers)
- Modify: `src/ScadaLink.CLI/Commands/SiteCommands.cs`
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor`
- Test: `tests/ScadaLink.Commons.Tests/`
**Step 1: Add fields to Site entity**
In `src/ScadaLink.Commons/Entities/Sites/Site.cs`, add after `NodeBAddress`:
```csharp
public string? GrpcNodeAAddress { get; set; }
public string? GrpcNodeBAddress { get; set; }
```
**Step 2: Update management commands**
In `src/ScadaLink.Commons/Messages/Management/SiteCommands.cs`, add `GrpcNodeAAddress` and `GrpcNodeBAddress` optional params to `CreateSiteCommand` and `UpdateSiteCommand`.
**Step 3: Add EF Core migration**
Run: `dotnet ef migrations add AddGrpcNodeAddresses --project src/ScadaLink.ConfigurationDatabase/ --startup-project src/ScadaLink.Host/`
Or create manual migration adding nullable `GrpcNodeAAddress` and `GrpcNodeBAddress` string columns to Sites table.
**Step 4: Update ManagementActor handlers**
In `src/ScadaLink.ManagementService/ManagementActor.cs`, update `HandleCreateSite` and `HandleUpdateSite` to pass gRPC addresses to the repository.
**Step 5: Update CLI SiteCommands**
In `src/ScadaLink.CLI/Commands/SiteCommands.cs`, add `--grpc-node-a-address` and `--grpc-node-b-address` options to `site create` and `site update` commands.
**Step 6: Update Central UI Sites.razor**
In `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor`:
- Add `_formGrpcNodeAAddress` and `_formGrpcNodeBAddress` form fields
- Add table columns for gRPC addresses
- Wire into create/update handlers
**Step 7: Run tests**
Run: `dotnet test tests/ScadaLink.Commons.Tests/ && dotnet test tests/ScadaLink.CLI.Tests/ && dotnet test tests/ScadaLink.Host.Tests/`
Expected: All pass
**Step 8: Commit**
```bash
git add src/ScadaLink.Commons/ src/ScadaLink.ConfigurationDatabase/ src/ScadaLink.ManagementService/ src/ScadaLink.CLI/ src/ScadaLink.CentralUI/
git commit -m "feat: add GrpcNodeAAddress/GrpcNodeBAddress to Site entity, CLI, and UI"
```
---
### Task 3: Site-Side gRPC Server — StreamRelayActor
**Files:**
- Create: `src/ScadaLink.Communication/Grpc/StreamRelayActor.cs`
- Test: `tests/ScadaLink.Communication.Tests/Grpc/StreamRelayActorTests.cs`
**Step 1: Write failing test**
Test that `StreamRelayActor` receives `AttributeValueChanged` and writes a correctly-converted `SiteStreamEvent` proto message to a `ChannelWriter<SiteStreamEvent>`. Use Akka.TestKit.
**Step 2: Run test to verify it fails**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: FAIL (class doesn't exist)
**Step 3: Implement StreamRelayActor**
Create `src/ScadaLink.Communication/Grpc/StreamRelayActor.cs`:
- `ReceiveActor` that receives `AttributeValueChanged` and `AlarmStateChanged`
- Converts each to the proto `SiteStreamEvent` with correct enum mappings and `Timestamp` conversion
- Writes to `ChannelWriter<SiteStreamEvent>` via `TryWrite`
- Logs dropped events when channel is full
**Step 4: Run tests**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: All pass
**Step 5: Commit**
```bash
git add src/ScadaLink.Communication/Grpc/ tests/ScadaLink.Communication.Tests/
git commit -m "feat: add StreamRelayActor bridging Akka events to gRPC proto channel"
```
---
### Task 4: Site-Side gRPC Server — SiteStreamGrpcServer
**Files:**
- Create: `src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs`
- Test: `tests/ScadaLink.Communication.Tests/Grpc/SiteStreamGrpcServerTests.cs`
**Step 1: Write failing tests**
- Server accepts subscription, relays events from mock SiteStreamManager to gRPC stream
- Server cleans up SiteStreamManager subscription on cancellation
- Server rejects duplicate `correlation_id` (cancels old stream)
- Server enforces max concurrent streams (100), rejects with `ResourceExhausted`
- Server rejects with `Unavailable` before actor system is ready
**Step 2: Run tests to verify they fail**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: FAIL
**Step 3: Implement SiteStreamGrpcServer**
Create `src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs`:
- Inherits `SiteStreamService.SiteStreamServiceBase`
- Injects `SiteStreamManager` (or interface), `ActorSystem`
- Tracks active streams in `ConcurrentDictionary<string, CancellationTokenSource>`
- `SubscribeInstance`: creates `Channel<SiteStreamEvent>`, creates `StreamRelayActor`, subscribes to SiteStreamManager, reads channel → writes to gRPC response stream
- `finally`: removes subscription, stops relay actor, removes from active streams
- Readiness gate: checks `ActorSystem` availability before accepting
**Step 4: Run tests**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: All pass
**Step 5: Commit**
```bash
git add src/ScadaLink.Communication/Grpc/ tests/ScadaLink.Communication.Tests/
git commit -m "feat: add SiteStreamGrpcServer with Channel<T> bridge and stream limits"
```
---
### Task 5: Switch Site Host to WebApplicationBuilder + gRPC
**Files:**
- Modify: `src/ScadaLink.Host/Program.cs:157-174`
- Modify: `src/ScadaLink.Host/appsettings.Site.json`
- Modify: `docker/docker-compose.yml`
- Test: `tests/ScadaLink.Host.Tests/`
**Step 1: Write failing test**
Site host startup test: verify `WebApplicationBuilder` starts, gRPC port is configured, `MapGrpcService` is registered.
**Step 2: Switch site host from generic Host to WebApplicationBuilder**
In `src/ScadaLink.Host/Program.cs`, replace the `Host.CreateDefaultBuilder()` site section with `WebApplication.CreateBuilder()` + Kestrel HTTP/2 on `GrpcPort` + `AddGrpc()` + `MapGrpcService<SiteStreamGrpcServer>()`. Keep all existing service registrations via `SiteServiceRegistration.Configure()`.
Add gRPC keepalive settings from `CommunicationOptions`:
- `KeepAlivePingDelay = 15s`
- `KeepAlivePingTimeout = 10s`
**Step 3: Update docker-compose.yml**
Expose gRPC port 8083 for each site node:
- Site-A: `9023:8083` / `9024:8083`
- Site-B: `9033:8083` / `9034:8083`
- Site-C: `9043:8083` / `9044:8083`
**Step 4: Add gRPC keepalive config to CommunicationOptions**
Add to `src/ScadaLink.Communication/CommunicationOptions.cs`:
```csharp
public TimeSpan GrpcKeepAlivePingDelay { get; set; } = TimeSpan.FromSeconds(15);
public TimeSpan GrpcKeepAlivePingTimeout { get; set; } = TimeSpan.FromSeconds(10);
public TimeSpan GrpcMaxStreamLifetime { get; set; } = TimeSpan.FromHours(4);
public int GrpcMaxConcurrentStreams { get; set; } = 100;
```
**Step 5: Run tests and build**
Run: `dotnet build src/ScadaLink.Host/ && dotnet test tests/ScadaLink.Host.Tests/`
Expected: All pass
**Step 6: Commit**
```bash
git add src/ScadaLink.Host/ src/ScadaLink.Communication/ docker/
git commit -m "feat: switch site host to WebApplicationBuilder with Kestrel gRPC server"
```
---
### Task 6: Central-Side gRPC Client
**Files:**
- Create: `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClient.cs`
- Create: `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClientFactory.cs`
- Test: `tests/ScadaLink.Communication.Tests/Grpc/SiteStreamGrpcClientTests.cs`
- Test: `tests/ScadaLink.Communication.Tests/Grpc/SiteStreamGrpcClientFactoryTests.cs`
**Step 1: Write failing tests for SiteStreamGrpcClient**
- Client connects, reads stream, converts proto→domain types (`AttributeValueChanged`, `AlarmStateChanged`), invokes callback
- Client handles stream errors (throws on `RpcException`)
- Client cancellation stops the background reader
**Step 2: Implement SiteStreamGrpcClient**
- Creates `GrpcChannel` with keepalive settings from `CommunicationOptions`
- `SubscribeAsync`: calls `SiteStreamService.SubscribeInstance()`, launches background task to read `ResponseStream`, converts proto→domain, invokes callback
- `Unsubscribe`: cancels the `CancellationTokenSource` for the subscription
- `IAsyncDisposable`: disposes channel
**Step 3: Write failing tests for SiteStreamGrpcClientFactory**
- Creates and caches per-site clients
- Falls back to NodeB on NodeA connection failure
- Disposes clients on site removal
**Step 4: Implement SiteStreamGrpcClientFactory**
- `GetOrCreateAsync(siteIdentifier, grpcNodeAAddress, grpcNodeBAddress)``SiteStreamGrpcClient`
- Caches by `siteIdentifier` in `ConcurrentDictionary`
- Manages `GrpcChannel` lifecycle
**Step 5: Run tests**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: All pass
**Step 6: Commit**
```bash
git add src/ScadaLink.Communication/Grpc/ tests/ScadaLink.Communication.Tests/
git commit -m "feat: add SiteStreamGrpcClient and SiteStreamGrpcClientFactory"
```
---
### Task 7: Update DebugStreamBridgeActor to Use gRPC
**Files:**
- Modify: `src/ScadaLink.Communication/Actors/DebugStreamBridgeActor.cs`
- Modify: `src/ScadaLink.Communication/DebugStreamService.cs`
- Modify: `src/ScadaLink.Communication/ServiceCollectionExtensions.cs`
- Test: `tests/ScadaLink.Communication.Tests/`
**Step 1: Write failing tests for updated bridge actor**
- Bridge actor sends subscribe via ClusterClient, receives snapshot
- After snapshot, opens gRPC stream via `SiteStreamGrpcClient`
- Events from gRPC callback forwarded to `_onEvent`
- On gRPC stream error: reconnects to other node with backoff (max 3 retries)
- On stop: cancels gRPC + sends unsubscribe via ClusterClient
- Handles `DebugStreamTerminated` idempotently
**Step 2: Update DebugStreamBridgeActor**
Rewrite to:
1. `PreStart`: send `SubscribeDebugViewRequest` via ClusterClient (unchanged)
2. On `DebugViewSnapshot` received: open gRPC stream first (per handoff race mitigation — stream first, then apply snapshot)
3. gRPC callback delivers events to `Self` via `Tell` (marshals onto actor thread)
4. On gRPC error: enter reconnecting state, try other node, backoff, max retries
5. On stop: cancel gRPC subscription + send `UnsubscribeDebugViewRequest`
**Step 3: Update DebugStreamService**
Inject `SiteStreamGrpcClientFactory`. Resolve `GrpcNodeAAddress`/`GrpcNodeBAddress` from `Site` entity. Pass to bridge actor.
**Step 4: Register factory in DI**
In `src/ScadaLink.Communication/ServiceCollectionExtensions.cs`:
```csharp
services.AddSingleton<SiteStreamGrpcClientFactory>();
```
**Step 5: Run tests**
Run: `dotnet test tests/ScadaLink.Communication.Tests/`
Expected: All pass
**Step 6: Commit**
```bash
git add src/ScadaLink.Communication/ tests/ScadaLink.Communication.Tests/
git commit -m "feat: update DebugStreamBridgeActor to use gRPC for streaming events"
```
---
### Task 8: Remove ClusterClient Streaming Path
**Files:**
- Modify: `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs`
- Modify: `src/ScadaLink.Communication/Actors/SiteCommunicationActor.cs`
- Modify: `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs`
- Delete: `src/ScadaLink.Commons/Messages/DebugView/DebugStreamEvent.cs`
- Test: `tests/ScadaLink.SiteRuntime.Tests/Actors/InstanceActorIntegrationTests.cs`
- Test: `tests/ScadaLink.Commons.Tests/ArchitecturalConstraintTests.cs`
**Step 1: Remove DebugStreamEvent from InstanceActor**
Remove `_debugSubscriberCorrelationIds`, `_siteCommActor`, and all `DebugStreamEvent` forwarding from `PublishAndNotifyChildren` and `HandleAlarmStateChanged`. InstanceActor just publishes to `SiteStreamManager` — the gRPC server picks up events from there.
Keep `HandleSubscribeDebugView` (for snapshot) and `HandleUnsubscribeDebugView`.
**Step 2: Remove DebugStreamEvent from SiteCommunicationActor**
Remove `Receive<DebugStreamEvent>` handler.
**Step 3: Remove DebugStreamEvent from CentralCommunicationActor**
Remove `Receive<DebugStreamEvent>` handler and `HandleDebugStreamEvent` method.
**Step 4: Delete DebugStreamEvent.cs**
Delete `src/ScadaLink.Commons/Messages/DebugView/DebugStreamEvent.cs`.
**Step 5: Update InstanceActorIntegrationTests**
Remove `DebugStreamEventForwarder` test helper. Update debug subscriber tests to verify events reach `SiteStreamManager` only.
**Step 6: Add architectural constraint test**
In `tests/ScadaLink.Commons.Tests/ArchitecturalConstraintTests.cs`, add test verifying `DebugStreamEvent` type no longer exists in the Commons assembly.
**Step 7: Run full test suite**
Run: `dotnet test tests/ScadaLink.SiteRuntime.Tests/ && dotnet test tests/ScadaLink.Communication.Tests/ && dotnet test tests/ScadaLink.Commons.Tests/ && dotnet test tests/ScadaLink.Host.Tests/`
Expected: All pass
**Step 8: Commit**
```bash
git add -A
git commit -m "refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC"
```
---
### Task 9: Docker & End-to-End Integration Test
**Files:**
- Modify: `docker/docker-compose.yml`
- Modify: `docker/deploy.sh` (if needed)
- Create: `tests/ScadaLink.IntegrationTests/Grpc/GrpcStreamIntegrationTests.cs`
**Step 1: Update docker-compose site appsettings**
Ensure site container configs include `GrpcPort: 8083` and gRPC ports are exposed.
**Step 2: Write integration test**
End-to-end: start in-process site gRPC server → central gRPC client → verify event delivery, cancellation cleanup.
**Step 3: Build and deploy cluster**
Run: `bash docker/deploy.sh`
Expected: All containers start, gRPC ports exposed
**Step 4: Manual end-to-end verification**
Run: `timeout 35 dotnet run --project src/ScadaLink.CLI -- --url http://localhost:9000 --username multi-role --password password debug stream --id 1 --format table`
Expected: Initial snapshot + streaming ATTR/ALARM rows via gRPC (not ClusterClient).
Write an OPC UA tag to verify:
```bash
python3 infra/tools/opcua_tool.py write --node "ns=3;s=JoeAppEngine.BTCS" --value "gRPC streaming test" --type String
```
**Step 5: Commit**
```bash
git add tests/ScadaLink.IntegrationTests/ docker/
git commit -m "test: add gRPC stream integration test and docker config"
```
---
### Task 10: Documentation Updates
**Files:**
- Modify: `docs/requirements/HighLevelReqs.md`
- Modify: `docs/requirements/Component-Communication.md`
- Modify: `docs/requirements/Component-SiteRuntime.md`
- Modify: `docs/requirements/Component-Host.md`
- Modify: `docs/requirements/Component-CentralUI.md`
- Modify: `docs/requirements/Component-CLI.md`
- Modify: `docs/requirements/Component-ConfigurationDatabase.md`
- Modify: `docs/requirements/Component-ClusterInfrastructure.md`
- Modify: `CLAUDE.md`
- Modify: `README.md`
- Modify: `docker/README.md`
**Step 1: Update HighLevelReqs.md**
Section 5 (Communication): Add gRPC streaming transport. ClusterClient for command/control, gRPC for real-time data.
**Step 2: Update Component-Communication.md**
Pattern 6: Replace ClusterClient streaming with gRPC. Add SiteStreamGrpcServer, SiteStreamGrpcClient, SiteStreamGrpcClientFactory. Add gRPC keepalive config.
**Step 3: Update remaining component docs**
Per the documentation update table in `docs/plans/grpc_streams.md` § Documentation Updates.
**Step 4: Update CLAUDE.md**
Add under Data & Communication: "gRPC streaming for site→central real-time data; ClusterClient for command/control only"
**Step 5: Update README.md architecture diagram**
Add gRPC streaming channel between site and central in the ASCII diagram.
**Step 6: Update docker/README.md**
Add gRPC ports to port allocation table.
**Step 7: Commit**
```bash
git add docs/ CLAUDE.md README.md docker/README.md
git commit -m "docs: update requirements and architecture for gRPC streaming channel"
```
---
### Task 11: Final Guardrail Tests
**Files:**
- Test: `tests/ScadaLink.Communication.Tests/Grpc/ProtoContractTests.cs`
- Test: `tests/ScadaLink.Communication.Tests/Grpc/CleanupVerificationTests.cs`
**Step 1: Proto contract test**
Verify all `oneof` variants in `SiteStreamEvent` have corresponding handlers in `StreamRelayActor` and `ConvertToDomainEvent`. If a new proto field is added without handlers, the test fails.
**Step 2: Cleanup verification test**
Verify that after gRPC stream cancellation, `SiteStreamManager.SubscriptionCount` returns to zero (no leaked subscriptions).
**Step 3: No ClusterClient streaming regression test**
Integration test that subscribes via gRPC, triggers changes, and verifies events arrive via gRPC — NOT via `DebugStreamEvent`.
**Step 4: Run full test suite**
Run: `dotnet test tests/ScadaLink.Host.Tests/ && dotnet test tests/ScadaLink.Communication.Tests/ && dotnet test tests/ScadaLink.SiteRuntime.Tests/ && dotnet test tests/ScadaLink.Commons.Tests/ && dotnet test tests/ScadaLink.CLI.Tests/ && dotnet test tests/ScadaLink.ManagementService.Tests/`
Expected: All pass, zero warnings
**Step 5: Commit**
```bash
git add tests/
git commit -m "test: add proto contract, cleanup verification, and regression guardrail tests"
```
@@ -0,0 +1,18 @@
{
"planPath": "docs/plans/2026-03-21-grpc-streaming-channel.md",
"tasks": [
{"id": 0, "taskId": "1", "subject": "Task 0: Proto Definition & Stub Generation", "status": "pending"},
{"id": 1, "taskId": "2", "subject": "Task 1: Site Config — GrpcPort in NodeOptions", "status": "pending", "blockedBy": [0]},
{"id": 2, "taskId": "3", "subject": "Task 2: Site Entity — gRPC Address Fields", "status": "pending", "blockedBy": [0]},
{"id": 3, "taskId": "4", "subject": "Task 3: Site-Side gRPC Server — StreamRelayActor", "status": "pending", "blockedBy": [0]},
{"id": 4, "taskId": "5", "subject": "Task 4: Site-Side gRPC Server — SiteStreamGrpcServer", "status": "pending", "blockedBy": [3]},
{"id": 5, "taskId": "6", "subject": "Task 5: Switch Site Host to WebApplicationBuilder + gRPC", "status": "pending", "blockedBy": [4]},
{"id": 6, "taskId": "7", "subject": "Task 6: Central-Side gRPC Client", "status": "pending", "blockedBy": [0]},
{"id": 7, "taskId": "8", "subject": "Task 7: Update DebugStreamBridgeActor to Use gRPC", "status": "pending", "blockedBy": [6, 5]},
{"id": 8, "taskId": "9", "subject": "Task 8: Remove ClusterClient Streaming Path", "status": "pending", "blockedBy": [7]},
{"id": 9, "taskId": "10", "subject": "Task 9: Docker & End-to-End Integration Test", "status": "pending", "blockedBy": [5, 1, 2]},
{"id": 10, "taskId": "11", "subject": "Task 10: Documentation Updates", "status": "pending", "blockedBy": [9]},
{"id": 11, "taskId": "12", "subject": "Task 11: Final Guardrail Tests", "status": "pending", "blockedBy": [9]}
],
"lastUpdated": "2026-03-21T14:15:00Z"
}
@@ -0,0 +1,151 @@
# Primary/Backup Data Connection Endpoints — Design
**Date:** 2026-03-22
**Status:** Approved
## Problem
Data connections currently support a single endpoint. If that endpoint goes down, the connection retries indefinitely at 5s intervals against the same address. When redundant infrastructure exists (e.g., two OPC UA servers), there is no way to automatically fail over to a backup.
## Design Decisions
| Decision | Choice |
|----------|--------|
| Failover mode | Automatic after N failed retries |
| Failback | No auto-failback; stay on active until it fails (round-robin) |
| Backup required? | Optional — single-endpoint connections work unchanged |
| Failover trigger | After configurable retry count (default 3) |
| Entity model | Separate `PrimaryConfiguration` and `BackupConfiguration` columns |
| UI approach | Two JSON text areas; backup collapsible |
| Failover logic location | DataConnectionActor (adapters stay single-endpoint) |
| Observability | Health reports + site event log entries |
## Entity Model
**`DataConnection` changes:**
| Field | Type | Notes |
|-------|------|-------|
| `PrimaryConfiguration` | string? (max 4000) | Renamed from `Configuration` |
| `BackupConfiguration` | string? (max 4000) | New. Null = no backup |
| `FailoverRetryCount` | int (default 3) | New. Retries before switching |
Both endpoints use the same `Protocol`. EF Core migration renames `Configuration``PrimaryConfiguration` (data-preserving).
**`DataConnectionArtifact` changes:**
- `ConfigurationJson``PrimaryConfigurationJson` + `BackupConfigurationJson`
## Failover State Machine
The `DataConnectionActor` Reconnecting state is extended:
```
Connected
│ disconnect detected
Push bad quality to all subscribers
Retry active endpoint (5s interval)
│ failure
_consecutiveFailures++
├─ < FailoverRetryCount → retry same endpoint
├─ ≥ FailoverRetryCount AND backup exists
│ → dispose adapter, switch _activeEndpoint, reset counter
│ → create fresh adapter with other config
│ → attempt connect
└─ ≥ FailoverRetryCount AND no backup
→ keep retrying indefinitely (current behavior)
```
**On successful reconnect (either endpoint):**
1. Reset `_consecutiveFailures = 0`
2. `ReSubscribeAll()` — re-create all subscriptions on the new adapter
3. Transition to Connected
4. Log failover event if endpoint changed
5. Report active endpoint in health metrics
**Round-robin on failure:** primary → backup → primary → backup...
**Adapter lifecycle on failover:** Actor disposes current `IDataConnection` adapter and creates a fresh one via `DataConnectionFactory.Create()` with the other endpoint's config. Clean slate — no stale state.
## Actor State
New fields in `DataConnectionActor`:
- `IDictionary<string, string> _primaryConfig`
- `IDictionary<string, string>? _backupConfig`
- `ActiveEndpoint _activeEndpoint` (enum: Primary, Backup)
- `int _consecutiveFailures`
- `int _failoverRetryCount`
`CreateConnectionCommand` gains: `primaryConfig`, `backupConfig`, `failoverRetryCount`.
`DataConnectionFactory` is unchanged — still creates single-endpoint adapters.
## Health & Observability
**`DataConnectionHealthReport`** gains:
- `ActiveEndpoint` (string): `"Primary"`, `"Backup"`, or `"Primary (no backup)"`
**Site event log entries:**
- `DataConnectionFailover` — connection name, from-endpoint, to-endpoint, reason
- `DataConnectionRestored` — connection name, active endpoint
Uses existing `ISiteEventLogger`.
## Central UI
**List page:** Add `Active Endpoint` column from health reports.
**Form (Create/Edit):**
- "Primary Endpoint Configuration" label (renamed from "Configuration")
- "Add Backup Endpoint" button reveals second JSON text area
- "Remove Backup" button in edit mode when backup exists
- "Failover Retry Count" numeric input (default 3, min 1, max 20) — visible only when backup configured
- Vertical stacking, collapsible backup subsection
## CLI
- `--configuration` renamed to `--primary-config` (hidden alias for backwards compat)
- `--backup-config` (optional)
- `--failover-retry-count` (optional, default 3)
- `data-connection get` shows both configs and active endpoint
## Management API
- `CreateDataConnectionCommand` / `UpdateDataConnectionCommand` gain `PrimaryConfiguration`, `BackupConfiguration`, `FailoverRetryCount`
- Setting `BackupConfiguration` to null removes the backup
- `GetDataConnectionResponse` returns both configs
## Deployment Flow
`DataConnectionArtifact` carries `PrimaryConfigurationJson` and `BackupConfigurationJson`. Site-side deployment handler passes both to `CreateConnectionCommand`.
## Testing
**Unit tests:**
- Actor: failover after N failures, round-robin, single-endpoint retries forever, counter reset, ReSubscribeAll on failover
- Manager actor: updated CreateConnectionCommand
- Factory: unchanged registration
**Integration test (manual with test infra):**
1. Primary=`opc.tcp://localhost:50000`, backup=`opc.tcp://localhost:50010`
2. Subscribe to `Motor.Speed`
3. `docker compose stop opcua` → verify failover to opcua2 after 3 retries
4. `docker compose stop opcua2 && docker compose start opcua` → verify round-robin back
## Implementation Tasks
1. **#4** Entity model & database (foundation)
2. **#6** CreateConnectionCommand & DataConnectionManagerActor (blocked by #4)
3. **#5** DataConnectionActor failover state machine (blocked by #4, #6)
4. **#7** Health reporting & site event log (blocked by #5)
5. **#8** Central UI (blocked by #4)
6. **#9** CLI, Management API, deployment (blocked by #4)
7. **#10** Documentation (blocked by #5)
8. **#11** Tests (blocked by #5)
@@ -0,0 +1,695 @@
# Primary/Backup Data Connection Endpoints — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Add optional backup endpoints to data connections with automatic failover after configurable retry count.
**Architecture:** The `DataConnectionActor` gains failover logic in its Reconnecting state — after N failed retries on the active endpoint, it disposes the adapter and creates a fresh one with the other endpoint's config. Adapters remain single-endpoint. Entity model splits `Configuration` into `PrimaryConfiguration` + `BackupConfiguration`.
**Tech Stack:** C# / .NET 10, Akka.NET, EF Core, Blazor Server, System.CommandLine
**Design doc:** `docs/plans/2026-03-22-primary-backup-data-connections-design.md`
---
## Task 1: Entity Model & Database Migration
**Files:**
- Modify: `src/ScadaLink.Commons/Entities/Sites/DataConnection.cs`
- Modify: `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs` (lines 32-56)
- Modify: `src/ScadaLink.Commons/Messages/Artifacts/DataConnectionArtifact.cs`
### Step 1: Update DataConnection entity
In `DataConnection.cs`, rename `Configuration` to `PrimaryConfiguration`, add `BackupConfiguration` and `FailoverRetryCount`:
```csharp
public class DataConnection
{
public int Id { get; set; }
public int SiteId { get; set; }
public string Name { get; set; }
public string Protocol { get; set; }
public string? PrimaryConfiguration { get; set; }
public string? BackupConfiguration { get; set; }
public int FailoverRetryCount { get; set; } = 3;
public DataConnection(int siteId, string name, string protocol)
{
SiteId = siteId;
Name = name ?? throw new ArgumentNullException(nameof(name));
Protocol = protocol ?? throw new ArgumentNullException(nameof(protocol));
}
}
```
### Step 2: Update EF Core mapping
In `SiteConfiguration.cs`, update the DataConnection mapping (around lines 46-47):
- Rename `Configuration` property mapping to `PrimaryConfiguration` (MaxLength 4000)
- Add `BackupConfiguration` property (optional, MaxLength 4000)
- Add `FailoverRetryCount` property (required, default 3)
```csharp
builder.Property(d => d.PrimaryConfiguration).HasMaxLength(4000);
builder.Property(d => d.BackupConfiguration).HasMaxLength(4000);
builder.Property(d => d.FailoverRetryCount).HasDefaultValue(3);
```
### Step 3: Create EF Core migration
Run:
```bash
cd src/ScadaLink.ConfigurationDatabase
dotnet ef migrations add AddDataConnectionBackupEndpoint \
--startup-project ../ScadaLink.Host
```
Verify the migration renames `Configuration``PrimaryConfiguration` (should use `RenameColumn`, not drop+add). If the scaffolded migration drops and recreates, manually fix it:
```csharp
migrationBuilder.RenameColumn(
name: "Configuration",
table: "DataConnections",
newName: "PrimaryConfiguration");
migrationBuilder.AddColumn<string>(
name: "BackupConfiguration",
table: "DataConnections",
maxLength: 4000,
nullable: true);
migrationBuilder.AddColumn<int>(
name: "FailoverRetryCount",
table: "DataConnections",
nullable: false,
defaultValue: 3);
```
### Step 4: Update DataConnectionArtifact
In `DataConnectionArtifact.cs`, replace single `ConfigurationJson` with both:
```csharp
public record DataConnectionArtifact(
string Name,
string Protocol,
string? PrimaryConfigurationJson,
string? BackupConfigurationJson,
int FailoverRetryCount = 3);
```
### Step 5: Build and fix compile errors
Run: `dotnet build ScadaLink.slnx`
This will surface all references to the old `Configuration` and `ConfigurationJson` fields across the codebase. Fix each one — this includes:
- ManagementActor handlers
- CLI commands
- UI pages
- Deployment/flattening code
- Tests
Fix only the field name renames in this step (use `PrimaryConfiguration` where `Configuration` was). Don't add backup logic yet — just make it compile.
### Step 6: Run tests, fix failures
Run: `dotnet test ScadaLink.slnx`
Fix any test failures caused by the rename.
### Step 7: Commit
```bash
git add -A
git commit -m "feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount"
```
---
## Task 2: Update CreateConnectionCommand & Manager Actor
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/DataConnection/CreateConnectionCommand.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionManagerActor.cs` (lines 39-62)
### Step 1: Update CreateConnectionCommand message
```csharp
public record CreateConnectionCommand(
string ConnectionName,
string ProtocolType,
IDictionary<string, string> PrimaryConnectionDetails,
IDictionary<string, string>? BackupConnectionDetails = null,
int FailoverRetryCount = 3);
```
### Step 2: Update DataConnectionManagerActor.HandleCreateConnection
Update the handler (around line 39-62) to pass both configs to DataConnectionActor:
```csharp
private void HandleCreateConnection(CreateConnectionCommand command)
{
if (_connectionActors.ContainsKey(command.ConnectionName))
{
_log.Warning("Connection {0} already exists", command.ConnectionName);
return;
}
var adapter = _factory.Create(command.ProtocolType, command.PrimaryConnectionDetails);
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName,
adapter,
_options,
_healthCollector,
command.ProtocolType,
command.PrimaryConnectionDetails,
command.BackupConnectionDetails,
command.FailoverRetryCount));
var actorName = new string(command.ConnectionName
.Select(c => char.IsLetterOrDigit(c) || "-_.*$+:@&=,!~';()".Contains(c) ? c : '-')
.ToArray());
var actorRef = Context.ActorOf(props, actorName);
_connectionActors[command.ConnectionName] = actorRef;
_log.Info("Created DataConnectionActor for {0} (protocol={1}, backup={2})",
command.ConnectionName, command.ProtocolType, command.BackupConnectionDetails != null ? "yes" : "none");
}
```
### Step 3: Update all callers of CreateConnectionCommand
Search for all places that construct `CreateConnectionCommand` and update them to use the new signature. The primary caller is the site-side deployment handler.
### Step 4: Build and test
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
### Step 5: Commit
```bash
git add -A
git commit -m "feat(dcl): extend CreateConnectionCommand with backup config and failover retry count"
```
---
## Task 3: DataConnectionActor Failover State Machine
**Files:**
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs`
This is the core change. The actor gains failover logic in its Reconnecting state.
### Step 1: Add new state fields to DataConnectionActor
Add these fields alongside the existing ones (around line 30):
```csharp
private readonly string _protocolType;
private readonly IDictionary<string, string> _primaryConfig;
private readonly IDictionary<string, string>? _backupConfig;
private readonly int _failoverRetryCount;
private readonly IDataConnectionFactory _factory;
private ActiveEndpoint _activeEndpoint = ActiveEndpoint.Primary;
private int _consecutiveFailures;
public enum ActiveEndpoint { Primary, Backup }
```
### Step 2: Update constructor
Extend the constructor to accept both configs and the factory:
```csharp
public DataConnectionActor(
string connectionName,
IDataConnection adapter,
DataConnectionOptions options,
ISiteHealthCollector healthCollector,
string protocolType,
IDictionary<string, string> primaryConfig,
IDictionary<string, string>? backupConfig = null,
int failoverRetryCount = 3)
{
_connectionName = connectionName;
_adapter = adapter;
_options = options;
_healthCollector = healthCollector;
_protocolType = protocolType;
_primaryConfig = primaryConfig;
_backupConfig = backupConfig;
_failoverRetryCount = failoverRetryCount;
_connectionDetails = primaryConfig; // start with primary
}
```
Note: The actor also needs `IDataConnectionFactory` injected to create new adapters on failover. Pass it through the constructor or resolve via DI. The `DataConnectionManagerActor` already has the factory — pass it through to the actor constructor.
### Step 3: Extend HandleReconnectResult with failover logic
Replace the reconnect failure handling (around lines 279-296) to include failover:
```csharp
private void HandleReconnectResult(ConnectResult result)
{
if (result.Success)
{
_consecutiveFailures = 0;
_log.Info("Reconnected {0} on {1} endpoint", _connectionName, _activeEndpoint);
ReSubscribeAll();
BecomeConnected();
return;
}
_consecutiveFailures++;
_log.Warning("Reconnect attempt {0}/{1} failed for {2} on {3}: {4}",
_consecutiveFailures, _failoverRetryCount, _connectionName, _activeEndpoint, result.Error);
if (_consecutiveFailures >= _failoverRetryCount && _backupConfig != null)
{
// Switch endpoint
var previousEndpoint = _activeEndpoint;
_activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary
? ActiveEndpoint.Backup
: ActiveEndpoint.Primary;
_consecutiveFailures = 0;
var newConfig = _activeEndpoint == ActiveEndpoint.Primary ? _primaryConfig : _backupConfig;
_log.Warning("Failing over {0} from {1} to {2}", _connectionName, previousEndpoint, _activeEndpoint);
// Dispose old adapter, create new one
_ = _adapter.DisposeAsync();
_adapter = _factory.Create(_protocolType, newConfig);
_connectionDetails = newConfig;
// Wire up disconnect handler on new adapter
_adapter.Disconnected += () => _self.Tell(new AdapterDisconnected());
}
// Schedule next retry
Context.System.Scheduler.ScheduleTellOnce(
_options.ReconnectInterval, Self, AttemptConnect.Instance, ActorRefs.NoSender);
}
```
### Step 4: Pass IDataConnectionFactory to DataConnectionActor
Update `DataConnectionManagerActor.HandleCreateConnection` to pass the factory:
```csharp
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName, adapter, _options, _healthCollector,
_factory, // pass factory for failover adapter creation
command.ProtocolType, command.PrimaryConnectionDetails,
command.BackupConnectionDetails, command.FailoverRetryCount));
```
And update the DataConnectionActor constructor to store `_factory`.
### Step 5: Build and run existing tests
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
Existing tests must pass (they use single-endpoint configs, so no failover triggered).
### Step 6: Commit
```bash
git add -A
git commit -m "feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching"
```
---
## Task 4: Failover Tests
**Files:**
- Modify: `tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs`
### Step 1: Write test — failover after N retries
```csharp
[Fact]
public async Task Reconnecting_AfterFailoverRetryCount_SwitchesToBackup()
{
// Arrange: create actor with primary + backup, failoverRetryCount = 2
var primaryAdapter = Substitute.For<IDataConnection>();
var backupAdapter = Substitute.For<IDataConnection>();
var factory = Substitute.For<IDataConnectionFactory>();
factory.Create("OpcUa", Arg.Is<IDictionary<string, string>>(d => d["endpoint"] == "backup"))
.Returns(backupAdapter);
// Primary connects then disconnects
primaryAdapter.ConnectAsync(Arg.Any<IDictionary<string, string>>(), Arg.Any<CancellationToken>())
.Returns(Task.CompletedTask);
primaryAdapter.Status.Returns(ConnectionHealth.Connected);
var primaryConfig = new Dictionary<string, string> { ["endpoint"] = "primary" };
var backupConfig = new Dictionary<string, string> { ["endpoint"] = "backup" };
// Create actor, connect on primary
// ... (use test kit patterns from existing tests)
// Simulate disconnect, verify 2 failures then factory.Create called with backup config
}
```
### Step 2: Write test — single endpoint retries forever
```csharp
[Fact]
public async Task Reconnecting_NoBackup_RetriesIndefinitely()
{
// Arrange: create actor with primary only, no backup
// Simulate 10 reconnect failures
// Verify: factory.Create never called with backup, just keeps retrying
}
```
### Step 3: Write test — round-robin back to primary after backup fails
```csharp
[Fact]
public async Task Reconnecting_BackupFails_SwitchesBackToPrimary()
{
// Arrange: primary + backup, failoverRetryCount = 1
// Simulate: primary fails 1x → switch to backup → backup fails 1x → switch to primary
// Verify: round-robin pattern
}
```
### Step 4: Write test — successful reconnect resets counter
```csharp
[Fact]
public async Task Reconnecting_SuccessfulConnect_ResetsConsecutiveFailures()
{
// Arrange: failoverRetryCount = 3
// Simulate: 2 failures on primary, then success
// Verify: no failover, counter reset
}
```
### Step 5: Write test — ReSubscribeAll called after failover
```csharp
[Fact]
public async Task Failover_ReSubscribesAllTagsOnNewAdapter()
{
// Arrange: actor with subscriptions, then failover
// Verify: new adapter receives SubscribeAsync calls for all previously subscribed tags
}
```
### Step 6: Run all tests
Run: `dotnet test tests/ScadaLink.DataConnectionLayer.Tests -v`
### Step 7: Commit
```bash
git add -A
git commit -m "test(dcl): add failover state machine tests for DataConnectionActor"
```
---
## Task 5: Health Reporting & Site Event Logging
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/DataConnection/DataConnectionHealthReport.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` (ReplyWithHealthReport, HandleReconnectResult)
### Step 1: Add ActiveEndpoint to health report
```csharp
public record DataConnectionHealthReport(
string ConnectionName,
ConnectionHealth Status,
int TotalSubscribedTags,
int ResolvedTags,
string ActiveEndpoint,
DateTimeOffset Timestamp);
```
### Step 2: Update ReplyWithHealthReport in DataConnectionActor
Update the health report method (around line 516) to include the active endpoint:
```csharp
private void ReplyWithHealthReport()
{
var endpointLabel = _backupConfig == null
? "Primary (no backup)"
: _activeEndpoint.ToString();
Sender.Tell(new DataConnectionHealthReport(
_connectionName, _adapter.Status,
_subscriptionsByInstance.Values.Sum(s => s.Count),
_resolvedTags,
endpointLabel,
DateTimeOffset.UtcNow));
}
```
### Step 3: Add site event logging on failover
In `HandleReconnectResult`, after switching endpoints, log a site event:
```csharp
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Warning", null, _connectionName,
$"Failover from {previousEndpoint} to {_activeEndpoint}",
$"After {_failoverRetryCount} consecutive failures");
}
```
Note: The actor needs `ISiteEventLogger` injected. Add it as an optional constructor parameter.
### Step 4: Add site event logging on successful reconnect after failover
In `HandleReconnectResult` success path, if the endpoint changed from last known good:
```csharp
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Info", null, _connectionName,
$"Connection restored on {_activeEndpoint} endpoint", null);
}
```
### Step 5: Build and test
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
### Step 6: Commit
```bash
git add -A
git commit -m "feat(dcl): add active endpoint to health reports and log failover events"
```
---
## Task 6: Central UI Changes
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor`
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor`
### Step 1: Update DataConnections list page
Add `Active Endpoint` column to the table (around line 28-64). Insert after the Protocol column:
```html
<th>Active Endpoint</th>
```
And in the row template:
```html
<td>@connection.ActiveEndpoint</td>
```
This requires the list page to fetch health data alongside the connection list. Add a health status lookup or include `ActiveEndpoint` in the data connection response.
### Step 2: Update DataConnectionForm — rename Configuration label
Change the "Configuration" label to "Primary Endpoint Configuration" (around line 44-61).
### Step 3: Add backup endpoint section
Below the primary config field, add:
```html
@if (!_showBackup)
{
<button type="button" class="btn btn-outline-secondary btn-sm mt-2"
@onclick="() => _showBackup = true">
Add Backup Endpoint
</button>
}
else
{
<div class="mt-3">
<div class="d-flex justify-content-between align-items-center">
<label class="form-label">Backup Endpoint Configuration</label>
<button type="button" class="btn btn-outline-danger btn-sm"
@onclick="RemoveBackup">
Remove Backup
</button>
</div>
<textarea class="form-control" rows="4"
@bind="_model.BackupConfiguration"
placeholder='{"Host": "backup-host", "Port": 50101}' />
</div>
<div class="mt-3">
<label class="form-label">Failover Retry Count</label>
<input type="number" class="form-control" min="1" max="20"
@bind="_model.FailoverRetryCount" />
<small class="text-muted">Retries before switching to backup (default: 3)</small>
</div>
}
```
### Step 4: Update form model and save logic
Add `BackupConfiguration` and `FailoverRetryCount` to the form model. Update the save method to pass both configs to the management API.
In edit mode, set `_showBackup = true` if `BackupConfiguration` is not null.
### Step 5: Build and verify visually
Run: `dotnet build ScadaLink.slnx`
Visual verification requires running the cluster — document as manual test.
### Step 6: Commit
```bash
git add -A
git commit -m "feat(ui): add primary/backup endpoint fields to data connection form"
```
---
## Task 7: CLI, Management API, and Deployment
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/Management/DataConnectionCommands.cs`
- Modify: `src/ScadaLink.CLI/Commands/DataConnectionCommands.cs`
- Modify: `src/ScadaLink.ManagementService/ManagementActor.cs` (lines 689-711)
- Modify: Deployment/flattening code that creates DataConnectionArtifact
### Step 1: Update management command messages
```csharp
public record CreateDataConnectionCommand(
int SiteId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
public record UpdateDataConnectionCommand(
int DataConnectionId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
```
### Step 2: Update ManagementActor handlers
In `HandleCreateDataConnection` (around line 689): set `PrimaryConfiguration`, `BackupConfiguration`, `FailoverRetryCount` from command.
In `HandleUpdateDataConnection` (around line 699): same fields.
### Step 3: Update CLI commands
In `BuildCreate` (around line 75-98):
- Rename `--configuration` to `--primary-config`
- Add hidden alias `--configuration` pointing to same option
- Add `--backup-config` option (optional)
- Add `--failover-retry-count` option (optional, default 3)
In `BuildUpdate` (around line 36-59): same changes.
In `BuildGet` (around line 22-34): update output to show both configs.
### Step 4: Update deployment artifact creation
Find where `DataConnectionArtifact` is constructed (in deployment/flattening code). Update to pass `PrimaryConfigurationJson` and `BackupConfigurationJson` from the entity.
### Step 5: Build and test CLI
Run: `dotnet build ScadaLink.slnx`
Test CLI manually:
```bash
scadalink data-connection create --site-id 1 --name "Test" --protocol OpcUa \
--primary-config '{"endpoint":"opc.tcp://localhost:50000"}' \
--backup-config '{"endpoint":"opc.tcp://localhost:50010"}' \
--failover-retry-count 3
```
### Step 6: Commit
```bash
git add -A
git commit -m "feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands"
```
---
## Task 8: Documentation Updates
**Files:**
- Modify: `docs/requirements/Component-DataConnectionLayer.md`
- Modify: `docs/requirements/HighLevelReqs.md`
- Modify: `docs/requirements/Component-CentralUI.md`
- Modify: `docs/test_infra/test_infra.md`
### Step 1: Update Component-DataConnectionLayer.md
Add new section "Endpoint Redundancy" covering:
- Optional backup endpoints
- Failover state machine (include ASCII diagram from design doc)
- Configuration model (PrimaryConfiguration + BackupConfiguration)
- Failover retry count and round-robin behavior
- Subscription re-creation on failover
- Health reporting (ActiveEndpoint field)
- Site event logging (DataConnectionFailover, DataConnectionRestored)
Update the configuration reference tables to show the new entity fields.
### Step 2: Update HighLevelReqs.md
Add requirement: "Data connections support optional backup endpoints with automatic failover after configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint."
### Step 3: Update Component-CentralUI.md
Update the Data Connections workflow section to describe:
- Primary/backup config fields on the form
- Collapsible backup section
- Failover retry count field
- Active endpoint column on list page
### Step 4: Update test_infra.md
Add a note in the Remote Test Infrastructure section that the dual OPC UA servers (50000/50010) enable primary/backup testing.
### Step 5: Commit
```bash
git add -A
git commit -m "docs(dcl): document primary/backup endpoint redundancy across requirements and test infra"
```
@@ -0,0 +1,14 @@
{
"planPath": "docs/plans/2026-03-22-primary-backup-data-connections.md",
"tasks": [
{"id": 1, "subject": "Task 1: Entity Model & Database Migration", "status": "pending"},
{"id": 2, "subject": "Task 2: Update CreateConnectionCommand & Manager Actor", "status": "pending", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: DataConnectionActor Failover State Machine", "status": "pending", "blockedBy": [1, 2]},
{"id": 4, "subject": "Task 4: Failover Tests", "status": "pending", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Health Reporting & Site Event Logging", "status": "pending", "blockedBy": [3]},
{"id": 6, "subject": "Task 6: Central UI Changes", "status": "pending", "blockedBy": [1]},
{"id": 7, "subject": "Task 7: CLI, Management API, and Deployment", "status": "pending", "blockedBy": [1]},
{"id": 8, "subject": "Task 8: Documentation Updates", "status": "pending", "blockedBy": [3]}
],
"lastUpdated": "2026-03-22T12:00:00Z"
}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,16 @@
{
"planPath": "docs/plans/2026-03-23-treeview-component.md",
"tasks": [
{"id": 22, "subject": "Task 1: Create TreeView.razor — Core Rendering (R1-R4, R14)", "status": "pending"},
{"id": 23, "subject": "Task 2: Add Selection Support (R5)", "status": "pending", "blockedBy": [22]},
{"id": 24, "subject": "Task 3: Add Session Storage Persistence (R11)", "status": "pending", "blockedBy": [23]},
{"id": 25, "subject": "Task 4: Add ExpandAll, CollapseAll, RevealNode (R12, R13)", "status": "pending", "blockedBy": [24]},
{"id": 26, "subject": "Task 5: Add Context Menu (R15)", "status": "pending", "blockedBy": [25]},
{"id": 27, "subject": "Task 6: Add External Filtering Tests (R8)", "status": "pending", "blockedBy": [26]},
{"id": 28, "subject": "Task 7: Integrate TreeView into Data Connections Page", "status": "pending", "blockedBy": [27]},
{"id": 29, "subject": "Task 8: Integrate TreeView into Areas Page", "status": "pending", "blockedBy": [27]},
{"id": 30, "subject": "Task 9: Integrate TreeView into Instances Page", "status": "pending", "blockedBy": [27]},
{"id": 31, "subject": "Task 10: Full Build Verification", "status": "pending", "blockedBy": [28, 29, 30]}
],
"lastUpdated": "2026-03-23T00:00:00Z"
}
@@ -0,0 +1,126 @@
# Data Connections page — Topology-style refresh
Date: 2026-05-11
Status: Design
## Goal
Bring the Data Connections admin page up to the same UX standard as the new Topology page (`/deployment/topology`). The page already uses TreeView and the form already navigates as a separate page, so the refresh is a layered enhancement, not a rewrite.
## Decisions (captured from Q&A)
1. **Features to add** (others explicitly excluded):
- Search with dim non-matches (opacity 0.4, shape preserved — Topology behavior)
- Toolbar: **+ Connection**, **Refresh**, **Expand**, **Collapse**
- **No** per-node icons / protocol badges beyond what's already rendered
- **No** selection persistence via sessionStorage (selection is in-memory only)
2. **Site context menu** gains an "Add Connection here" item that navigates to the create form with `?siteId=N` preselecting and locking the Site field.
3. **+ Connection toolbar button** is **disabled until a site is selected**. Selecting either a site node or one of its connection nodes resolves to that site; the create form then preselects and locks Site.
4. **No move support** — moving a connection between sites is out of scope (would require a net-new service method and has knock-on effects on `InstanceConnectionBinding`).
5. **Empty sites still appear** at the top level (so they can be right-clicked to add a connection).
6. **URL renames**:
- List page: `/admin/connections` (primary) + `/admin/data-connections` (legacy secondary).
- Form: `/admin/connections/create` and `/admin/connections/{Id}/edit` (primary) + `/admin/data-connections/create` and `/admin/data-connections/{Id}/edit` (legacy secondaries).
- Nav menu label changes from "Data Connections" to **"Connections"**.
7. **Form cleanup** to match the canonical `SiteForm.razor` style (per `feedback_form_layout` memory):
- Add explicit `<h6 class="text-muted border-bottom pb-1">` subsection headers: **Primary Endpoint** and **Backup Endpoint**.
- Move Failover Retry Count inside the Backup subsection (it only applies when backup is enabled).
- Site field stays first; read-only in edit mode; preselected & disabled when `?siteId=` is passed on create.
## Files to modify
### `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor`
- Add primary route `@page "/admin/connections"` and secondary legacy `@page "/admin/data-connections"`.
- Inject `IJSRuntime` only if needed (search doesn't need it; no sessionStorage).
- Add toolbar row above the tree:
- Search input (`@bind="_searchText" @bind:event="oninput" @bind:after="OnSearchChanged"`)
- btn-group with: **+ Connection** (disabled-bind to `!HasSiteSelected`), **Refresh**, **Expand**, **Collapse**.
- TreeView wiring:
- Add `@ref="_tree"` and use `_tree?.ExpandAll()` / `CollapseAll()`.
- Set `Selectable="true"` and `SelectedKeyChanged="OnTreeNodeSelected"`. Keep selected key in `_selectedKey` (in-memory only).
- Search dim:
- Recompute a `HashSet<string> _matchKeys` of keys whose own label or any descendant's label contains the search text.
- In `NodeContent`, wrap the label `<span>` with `style="opacity: 0.4"` if a search is active and the node is not in `_matchKeys`.
- Always-show-empty sites: current code already creates a Site node per Site regardless of children — keep as-is.
- Site context menu: add an item **"Add Connection here"** that navigates to `/admin/connections/create?siteId=@node.SiteId`.
- Connection context menu: keep Edit + Delete; update the Edit href to the new `/admin/connections/{id}/edit` path.
### `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor`
- Add primary routes:
```razor
@page "/admin/connections/create"
@page "/admin/connections/{Id:int}/edit"
@page "/admin/data-connections/create"
@page "/admin/data-connections/{Id:int}/edit"
```
- Add `[SupplyParameterFromQuery] public int? SiteId { get; set; }`.
- On `OnInitializedAsync`, if `Id` is null and `SiteId` has a value, set `_formSiteId = SiteId.Value` and render the Site field as a disabled `<input>` (same pattern as edit mode) — also set `_siteName` for display.
- Reorganize fields to subsections per `SiteForm.razor` reference:
- Site (already first), Name, Protocol.
- `<h6 class="text-muted border-bottom pb-1">Primary Endpoint</h6>` then Primary Endpoint Configuration.
- `<h6 class="text-muted border-bottom pb-1">Backup Endpoint</h6>` — collapsed (Add Backup Endpoint button) by default; when toggled on, render: Backup Configuration, Failover Retry Count, Remove Backup button.
- `GoBack()` → `NavigationManager.NavigateTo("/admin/connections")`.
### `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor`
- Change `<NavLink class="nav-link" href="/admin/data-connections">Data Connections</NavLink>` to:
```razor
<NavLink class="nav-link" href="/admin/connections">Connections</NavLink>
```
### `tests/ScadaLink.CentralUI.PlaywrightTests/NavigationTests.cs`
- Update the AdminNavLinks theory: `[InlineData("Data Connections", "/admin/data-connections")]` → `[InlineData("Connections", "/admin/connections")]`.
## New tests
### `tests/ScadaLink.CentralUI.Tests/DataConnectionsPageTests.cs` (new)
bUnit rendering tests, modeled after `TopologyPageTests`:
1. `Renders_EmptyState_WhenNoSites` — no sites configured.
2. `Renders_EmptySite_AsTopLevelNode` — site with no connections still appears.
3. `Renders_SiteConnection_Nesting` — connection nested under site after click-expand.
4. `Search_DimsNonMatches_PreservesShape` — typing in search dims unmatched siblings.
5. `AddConnectionButton_DisabledUntilSiteSelected` — toolbar `+ Connection` is `disabled` initially, becomes enabled after clicking a site row.
6. `LegacyDataConnectionsRoute_IsDeclaredOnListPage` — both `/admin/connections` and `/admin/data-connections` routes are present (reflection check).
JSInterop stubs (TreeView calls `treeviewStorage.load`/`save` even when `StorageKey` isn't supplied — verify):
- `JSInterop.Setup<string?>("treeviewStorage.load", _ => true).SetResult(null);`
- `JSInterop.SetupVoid("treeviewStorage.save", _ => true);`
## Out of scope
- Moving connections between sites (would require new service method + binding consequences).
- Connection status indicators (live state) — DCL connection state isn't surfaced in this page; deferred.
- Drag-and-drop reorder.
- Selection persistence across page reloads.
## Verification
1. `dotnet build` clean.
2. `dotnet test tests/ScadaLink.CentralUI.Tests/ScadaLink.CentralUI.Tests.csproj` — all green incl. new tests.
3. Existing Playwright NavigationTests pass with the updated label/URL.
4. Browser smoke (after `bash docker/deploy.sh`):
- `/admin/data-connections` (legacy bookmark) loads the same page as `/admin/connections`.
- + Connection disabled until a site is selected; then navigates with `?siteId=N`; Site field is locked in the form.
- Right-click on an empty site → "Add Connection here" works.
- Search "OPC" dims non-matching connections (label-based search, case-insensitive).
- Expand / Collapse buttons work; Refresh re-fetches from repos.
- Form sections "Primary Endpoint" / "Backup Endpoint" render with the SiteForm-style headers; Failover Retry Count appears inside the Backup section only when backup is enabled.
## Critical files
- `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor`
- `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor`
- `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor`
- `tests/ScadaLink.CentralUI.PlaywrightTests/NavigationTests.cs`
- `tests/ScadaLink.CentralUI.Tests/DataConnectionsPageTests.cs` (new)
## Reference patterns
- TreeView usage with toolbar/search: `src/ScadaLink.CentralUI/Components/Pages/Deployment/Topology.razor`
- Form layout convention: `src/ScadaLink.CentralUI/Components/Pages/Admin/SiteForm.razor`
- bUnit harness for tree page: `tests/ScadaLink.CentralUI.Tests/TopologyPageTests.cs`
@@ -0,0 +1,266 @@
# Deployment Topology Page — Design
A single page under `/deployment` that owns the Site → Area → Instance hierarchy: structural management (create, rename, move, delete) and instance lifecycle (deploy, enable/disable, configure, diff), built on the existing `TreeView` component with the same V1V7 visual identity as the templates page.
This page **replaces** both `/deployment/instances` (current read-mostly tree) and `/admin/areas*` (current flat-list CRUD for areas).
## Decisions
| Question | Decision |
|---|---|
| Page identity | Replace both `/deployment/instances` and `/admin/areas*` with one new page |
| Route | `/deployment/topology` |
| Empty containers | Always shown (so they're valid move/create targets) |
| Instance configuration | Stays on dedicated `/deployment/instances/{id}/configure` page |
| Filters | Search-only (single input above the tree) |
| Search semantics | Dim non-matches (50% opacity), preserve tree shape |
| Single-click behavior | Select-only; nothing navigates |
| Rename UX | Inline (F2 / double-click) for areas only. Instance rename is out of scope (see "Instance rename" below). |
| Site-node menu | Add Area, Create Instance here |
| Area-node menu | Add Sub-area, Create Instance here, Move to Area…, Rename…, Delete |
| Instance-node menu | Deploy/Redeploy, Enable/Disable, Configure, Diff, Move to Area…, Delete |
| Delete-area cascade | Keep server semantics — block on any non-empty subtree |
| Top-of-page buttons | Create Area, Create Instance, Refresh |
| Move structural scope | Same-site only (instance↔area, area↔area). Cross-site moves out of scope. |
| Backend area re-parenting | New `AreaService.MoveAreaAsync(int areaId, int? newParentAreaId, string user)` |
| State persistence | Expanded nodes + selected key, both in sessionStorage |
| Glyphs | Site `bi-building`, Area `bi-diagram-3`, Instance `bi-box` |
## Layout
```
┌─────────────────────────────────────────────────────────────┐
│ Topology │
├─────────────────────────────────────────────────────────────┤
│ [Search box ............................. ] │
│ [Create Area] [Create Instance] [Refresh] │
├─────────────────────────────────────────────────────────────┤
│ ▾ 🏢 Plant-A │
│ ▾ ▦ Line-1 │
│ ▸ ▦ Station-3 │
│ □ Pump-001 [Enabled] [Current] │
│ □ Pump-002 [Disabled] │
│ ▾ ▦ Line-2 │
│ □ Conveyor-01 [NotDeployed] │
│ ▾ 🏢 Plant-B │
│ ▸ ▦ (empty area, still shown) │
└─────────────────────────────────────────────────────────────┘
```
## Visual identity
Follows the existing `Component-TreeView.md` V1V7 guide. Glyphs adopted:
| Node | Glyph | Color hook |
|---|---|---|
| Site | `bi-building` | default |
| Area | `bi-diagram-3` | default |
| Instance | `bi-box` | default; state badge to the right |
Instance state badges (kept from current page):
| State | Badge |
|---|---|
| Enabled | `bg-success` |
| Disabled | `bg-secondary` |
| NotDeployed | `bg-light text-dark` |
| Stale (deployed but template revision drifted) | `bg-warning text-dark` |
| Current | `bg-light text-dark` |
Search dimming: non-matches receive `opacity: 0.4`. Matches keep full opacity. Tree shape is preserved; ancestors of matches are auto-expanded on first keystroke.
## Context menus
### Site
- **Add Area** → opens "Create Area" dialog with this site pre-selected (parent = root)
- **Create Instance here** → navigates `/deployment/instances/create?siteId={id}`
### Area
- **Add Sub-area** → "Create Area" dialog with this area pre-selected as parent
- **Create Instance here** → navigates `/deployment/instances/create?siteId={siteId}&areaId={id}`
- **Move to Area…** → opens `MoveAreaDialog`. Destination list = areas in the same site, excluding self and descendants. Plus "(root of site)" option.
- divider
- **Rename…** → opens `RenameAreaDialog` (also reachable via F2 / double-click for inline edit)
- **Delete** → calls `DeleteAreaAsync`; server rejects if non-empty, error surfaced via toast
### Instance
- **Deploy** / **Redeploy** (label depends on `IsStale`)
- **Enable** / **Disable** (state-dependent)
- **Configure** → navigates `/deployment/instances/{id}/configure`
- **Diff** → opens the existing diff modal (ported from current Instances page)
- **Move to Area…** → opens `MoveInstanceDialog`. Destination list = areas in the same site + "(no area, site root)".
- divider
- **Delete**
## Inline rename
Applies to **Area rows only**. Instance rows do not support rename on this page (see "Instance rename" below).
- `F2` or double-click on the label of an Area row replaces the label span with an `<input>` bound to a local edit buffer.
- `Enter` commits via `AreaService.UpdateAreaAsync(areaId, name, user)`.
- `Escape` cancels.
- On commit failure (e.g., name collision at the same level), the toast shows the server error and the input stays open with the bad value highlighted.
## Instance rename
**Out of scope for this page.** `InstanceService` currently has no rename method. Adding one is non-trivial:
- `Instance.UniqueName` is also the identity of the site-side `InstanceActor` (Akka actor name).
- It appears in deployment records, audit history, and deploy paths.
- Renaming a deployed instance would require coordinated site-side actor stop/restart, deployment-record rebinding, and potentially redeployment.
This warrants its own design pass. For now: an instance row's label is read-only on the topology page. If a rename is needed, the user can delete + recreate (with the limitation that deployment history is lost).
The Area-rename context-menu item ("Rename…") is **not** added to the instance menu.
## Backend changes
### `AreaService.MoveAreaAsync(int areaId, int? newParentAreaId, string user)` — NEW
Parallel to `InstanceService.AssignToAreaAsync`. Validates:
1. Area exists.
2. `newParentAreaId` is null OR refers to an area in the **same site** as the area being moved.
3. `newParentAreaId != areaId` (not self).
4. The new parent is not a descendant of the area being moved (cycle prevention) — reuse the existing descendant-walking helper that `DeleteAreaAsync` uses.
5. No sibling area at the new level has the same name (case-insensitive).
On success: updates `ParentAreaId`, persists, audits as `"Move"` on entity `"Area"`.
`UpdateAreaAsync` stays name-only.
### `Templates.razor` parent-immutability pattern is **not** repeated here
Areas can be moved freely (subject to validation). Templates are different because re-parenting changes inheritance semantics; areas are pure organizational containers.
### No change to:
- `InstanceService.AssignToAreaAsync` (already supports re-parenting; will be called by `MoveInstanceDialog`)
- `AreaService.DeleteAreaAsync` (keep current block-on-non-empty semantics)
- `AreaService.UpdateAreaAsync` (stays name-only)
- `InstanceService` lifecycle methods (already used by current Instances page)
### CLI / ManagementService parity (optional follow-up)
- Add `MoveAreaCommand` message + `ManagementService` handler that wraps `MoveAreaAsync`.
- Add CLI: `cli area move --id X --parent-id Y --username … --password …` (omit `--parent-id` to move to site root).
Not strictly required to ship the UI page, but worth doing for parity with how the rest of the app exposes admin ops.
## Routes affected
| Route | Before | After |
|---|---|---|
| `/deployment/topology` | — | **NEW** (this page — canonical route) |
| `/deployment/instances` | tree + lifecycle page | **secondary `@page` directive on `Topology.razor`** — old bookmarks continue to work. NavMenu and all internal back-navs retarget to `/deployment/topology`. |
| `/admin/areas` | flat list | **removed** |
| `/admin/areas/add` | dialog page | **removed** (Create Area dialog lives on topology page) |
| `/admin/areas/edit/{id}` | edit page | **removed** (rename via inline / context menu) |
| `/admin/areas/delete/{id}` | confirm page | **removed** (confirm via shared `ConfirmDialog`) |
| `/deployment/instances/create` | unchanged | accepts new `?siteId=` and `?areaId=` query params for preselection |
| `/deployment/instances/{id}/configure` | unchanged | unchanged |
The admin nav entry for "Areas" gets removed; "Topology" goes under the Deployment nav group.
## Files to add
```
src/ScadaLink.CentralUI/Components/Pages/Deployment/Topology.razor (~500 lines)
src/ScadaLink.CentralUI/Components/Pages/Deployment/MoveInstanceDialog.razor (~50 lines)
src/ScadaLink.CentralUI/Components/Pages/Deployment/MoveAreaDialog.razor (~55 lines)
src/ScadaLink.CentralUI/Components/Pages/Deployment/CreateAreaDialog.razor (~60 lines)
src/ScadaLink.CentralUI/Components/Pages/Deployment/RenameAreaDialog.razor (~45 lines) (optional if inline-only)
```
## Files to modify
```
src/ScadaLink.TemplateEngine/Services/AreaService.cs (+ MoveAreaAsync, ~40 lines)
src/ScadaLink.Commons/Interfaces/... (interface for AreaService if exposed)
src/ScadaLink.CentralUI/Components/Pages/Deployment/InstanceCreate.razor
(+ SiteId, AreaId query-param SupplyParameterFromQuery;
retarget back-nav to /deployment/topology — 3 sites)
src/ScadaLink.CentralUI/Components/Pages/Deployment/InstanceConfigure.razor
(retarget back-nav to /deployment/topology — 1 site)
src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor (replace 'Instances' nav with 'Topology' at /deployment/topology;
remove 'Areas' nav under Admin)
tests/ScadaLink.CentralUI.PlaywrightTests/NavigationTests.cs
(update InlineData: 'Instances' → 'Topology', '/deployment/instances' → '/deployment/topology')
docs/requirements/Component-TreeView.md (rewrite §1 'Instances Page' → 'Topology Page' with new route;
remove §3 'Areas Page')
```
Note: `CLAUDE.md` does **not** reference `/deployment/instances` today, so no edit required there.
## Files to remove
```
src/ScadaLink.CentralUI/Components/Pages/Deployment/Instances.razor (replaced by Topology.razor; old route preserved as secondary @page)
src/ScadaLink.CentralUI/Components/Pages/Admin/Areas.razor
src/ScadaLink.CentralUI/Components/Pages/Admin/AreaAdd.razor
src/ScadaLink.CentralUI/Components/Pages/Admin/AreaEdit.razor
src/ScadaLink.CentralUI/Components/Pages/Admin/AreaDelete.razor
tests/ScadaLink.CentralUI.Tests/InstancesPageTests.cs (if it exists)
tests/ScadaLink.CentralUI.Tests/AreaPageTests.cs (if it exists)
```
Verified there are no other references to `/admin/areas*` in CLI, ManagementService, requirement docs (other than `Component-TreeView.md` §3, which is updated above), or tests.
## State persistence
- `topology-tree` (sessionStorage) — expansion state (Set of node keys), already supported by `TreeView.StorageKey`.
- `topology-tree-selected` (sessionStorage) — selected node key. New; the `TreeView` already exposes `SelectedKey` two-way binding, but the page is responsible for persisting it. Pattern: write in `SelectedKeyChanged`, read on `OnAfterRenderAsync` after data load.
## Tests
### Unit (`tests/ScadaLink.TemplateEngine.Tests/AreaServiceTests.cs`)
- `MoveArea_ToOtherArea_Succeeds`
- `MoveArea_ToSiteRoot_Succeeds` (newParentAreaId = null)
- `MoveArea_ToSelf_Fails`
- `MoveArea_ToDescendant_FailsWithCycleError`
- `MoveArea_DifferentSite_Fails`
- `MoveArea_NameCollidesAtNewParent_Fails`
- `MoveArea_NameUniqueAtNewParent_Succeeds`
- `MoveArea_AuditLogged`
### bUnit (`tests/ScadaLink.CentralUI.Tests/TopologyPageTests.cs`)
- `Renders_EmptyState_WhenNoSites`
- `Renders_EmptySite_WhenSiteHasNoAreasOrInstances` (empty containers visible)
- `Renders_SiteAreaInstance_Nesting`
- `Search_DimsNonMatches_PreservesShape`
- `F2_OnAreaRow_EntersRenameMode`
- `F2_OnInstanceRow_DoesNothing` (rename out of scope)
- `EscapeDuringInlineRename_Cancels`
- `ContextMenu_AreaMove_OpensDialogWithCycleFreeOptions`
- `ContextMenu_InstanceMove_OpensDialogWithSameSiteAreasOnly`
- `ContextMenu_SiteCreateInstance_NavigatesWithSiteIdQuery`
- `LegacyInstancesRoute_RoutesToTopologyPage` (visiting `/deployment/instances` resolves to the same component)
### Removal cleanup
- Drop `InstancesPageTests` and any `AreaPageTests` along with the source files.
## Edge cases
- **Two sites with the same area name at root** — fine. Same-site uniqueness is the rule; areas in different sites are independent.
- **Move an area while it has an instance assigned at its root** — allowed. The instance keeps the same `AreaId`; the area's new parent doesn't affect it.
- **Site with no areas, just root instances** — instance rows render directly under the site node.
- **Concurrent rename of a node by another user** — last-write-wins (consistent with template policy).
- **Search match inside a collapsed branch** — auto-expand the ancestor chain so the highlighted match is visible.
- **Network failure during inline rename** — leave the input open with the pending value; show the error in a toast; user can retry or Escape.
- **Deleting an area, then immediately Ctrl+Z** — not supported (no undo); destructive actions are confirmed via `ConfirmDialog` and audited.
## Out of scope
- Cross-site moves (would need new `Instance.SiteId` rebinding semantics, deployment-record handling, name-collision check at new site).
- Drag-and-drop reordering of areas (no ordinal column today; arbitrary alpha-sort).
- Bulk operations (select multiple instances and move/deploy together).
- Search across templates / sites / instances from the same input (the search is scoped to this page's tree).
- **Instance rename.** No `RenameInstanceAsync` in `InstanceService` today; adding one requires a separate design pass (site-side actor identity, deployment-record rebinding, audit history continuity). Users wanting to rename should delete + recreate.
## Out-of-band consistency tasks
When this lands, the following docs need a touch-up:
- `README.md` — component table; verify no reference to the removed Instances/Areas pages remains.
- `docs/requirements/Component-CentralUI.md` (or the routing section if one exists) — route table.
- `src/ScadaLink.CLI/README.md` — if existing CLI examples reference `area` subcommands, align with the optional CLI `area move` addition.
Confirmed clean (no edit needed):
- `CLAUDE.md` does not reference `/deployment/instances` or `/admin/areas` today.
@@ -0,0 +1,248 @@
# Templates Page — Folder & Hierarchy Reorganization
**Date:** 2026-05-11
**Status:** Design approved, ready for implementation planning
**Scope:** `/design/templates` page in Central UI, plus supporting data model, services, message contracts, and migration.
## Goal
Replace the current single-list view at `/design/templates` with a tree-organized browser modeled on the Wonderware ArchestrA Template Toolbox. Users organize templates into nested folders, see composition children inline under their owning template, and navigate to a dedicated edit page (`/design/templates/{id}`) when authoring a specific template. The tree page itself does not host the editor.
## Reference
The reference image (Wonderware Template Toolbox) shows three distinct concepts that this design carries over:
- **Folders** (yellow folder glyphs) — purely organizational, can be nested arbitrarily deep.
- **Templates** (`$Name`) — placed inside folders or at the tree root.
- **Composition children** — rendered inline under their owning template (e.g., `$TestMachine` shows `DelmiaReceiver` and `MESReceiver`).
Inheritance is **not** rendered as tree nesting in the image, and it is not rendered as tree nesting in this design. Inheritance remains metadata on the template node label ("inherits $Parent").
## Locked decisions
| Decision | Choice |
|---|---|
| Inheritance in tree | Not shown as nesting; **not shown on the node label either** (label is name only). Inheritance is visible in the TemplateEdit page when a template is selected. |
| Folder model | New `TemplateFolder` entity with self-referencing `ParentFolderId`. `Template.FolderId` nullable. |
| Reorganization UX | **Right-click context menus only** (no drag-drop). Modal dialog pickers for move targets. |
| Composition rendering | Read-only leaves with navigation; right-click → Open composed template / Remove composition. |
| Root-level templates | Allowed (`FolderId` nullable). Existing templates migrate with `FolderId = null`. |
| Folder delete with contents | Blocked; structured error lists child counts. |
| Page layout | **Tree browser only** — no split-pane editor. Selecting a template navigates to `/design/templates/{id}` (TemplateEdit page); creating navigates to `/design/templates/create`. |
| Tree node visuals | Per `Component-TreeView.md` Visual Design Guide V7: Bootstrap Icons (`bi-folder` / `bi-folder2-open` / `bi-file-earmark-text` / `bi-arrow-return-right`), name-only labels (no count/inherit badges on template nodes; composition rows also name-only — the glyph signals the kind), folder child-count pill. |
## Data model
**New entity** in `src/ScadaLink.Commons/Entities/Templates/TemplateFolder.cs`:
```csharp
public class TemplateFolder
{
public int Id { get; set; }
public string Name { get; set; } // unique among siblings of the same parent (case-insensitive)
public int? ParentFolderId { get; set; } // null = root
public int SortOrder { get; set; } // reserved for future manual ordering; defaults to 0
// Audit fields follow existing entity conventions.
public TemplateFolder(string name)
{
Name = name ?? throw new ArgumentNullException(nameof(name));
}
}
```
**Modification to `Template`:**
```csharp
public int? FolderId { get; set; } // null = root
```
**Invariants (server-enforced):**
- Folder name unique among siblings of the same parent (case-insensitive).
- `ParentFolderId` graph is acyclic.
- A folder cannot be deleted if it has any child folders or child templates.
- Moving a template into a folder is a single FK update; folders carry no semantic meaning to the template engine.
**Repository surface** (in `ITemplateEngineRepository` or a new `ITemplateFolderRepository`):
- `GetAllFoldersAsync()`
- `GetFolderAsync(int id)`
- `AddFolderAsync(TemplateFolder)`
- `UpdateFolderAsync(TemplateFolder)`
- `DeleteFolderAsync(int id)`
- `MoveFolderAsync(int folderId, int? newParentId)`
- `MoveTemplateAsync(int templateId, int? newFolderId)`
**Migration:** EF Core migration adds a `TemplateFolders` table and a nullable `FolderId` column on `Templates`. Existing templates retain `FolderId = null` (root). No data movement.
**Audit:** All folder mutations and template-folder moves go through `IAuditService` with the same conventions as existing template operations.
## Server-side service
**`TemplateFolderService`** (new, in `src/ScadaLink.TemplateEngine/`), mirroring `TemplateService`:
- `CreateFolderAsync(name, parentFolderId?, user) → Result<TemplateFolder>`
- `RenameFolderAsync(id, newName, user) → Result<TemplateFolder>`
- `MoveFolderAsync(id, newParentId?, user) → Result<TemplateFolder>` — cycle check: walk parent chain from `newParentId` upward, reject if `id` appears.
- `DeleteFolderAsync(id, user) → Result<Unit>` — structured failure with `(childFolderCount, childTemplateCount)` when non-empty.
- `MoveTemplateAsync(templateId, newFolderId?, user) → Result<Template>` — also accessible from `TemplateService`.
Validations on all paths: non-empty name, name unique among siblings, parent exists (when not null).
## Management Service contracts
In `src/ScadaLink.Commons/Messages/Management/`:
- `CreateTemplateFolderRequest` / `Response`
- `RenameTemplateFolderRequest` / `Response`
- `MoveTemplateFolderRequest` / `Response`
- `DeleteTemplateFolderRequest` / `Response`
- `ListTemplateFoldersRequest` / `Response`
- `MoveTemplateToFolderRequest` / `Response`
Additive-only evolution rules apply. Management actor handlers delegate to `TemplateFolderService`. Required for parity with the rest of the management API and makes future CLI support free (CLI is out of scope here).
**Authorization:** All folder operations require the `Design` policy.
## Tree model
Page-level tree node (`TmplNode`) consolidates all three node kinds into one structure for the generic `TreeView`:
```csharp
private enum TmplNodeKind { Folder, Template, Composition }
private record TmplNode(
string Key, // "f:{id}" | "t:{id}" | "c:{id}" — uniqueness across kinds
TmplNodeKind Kind,
int EntityId, // FolderId, TemplateId, or CompositionId
string Label,
int? ParentFolderId, // folders + templates
int? OwnerTemplateId, // composition leaves: the template that owns this composition
Template? Template, // populated for Template nodes (for inline metadata)
TemplateComposition? Composition, // populated for Composition nodes
List<TmplNode> Children);
```
**Build order in `LoadTreeAsync()`:**
1. `GetAllFoldersAsync()` + `GetAllTemplatesAsync()` (and `GetAllCompositionsAsync()` if compositions aren't eager-loaded by the list call).
2. Build folder nodes keyed `f:{id}`, attach by `ParentFolderId`.
3. For each template, build a Template node and attach its compositions as `c:{compositionId}` leaves.
4. Attach each template to its `FolderId` folder, or to `_roots` if `FolderId == null`.
5. Sort siblings: folders first (alphabetical by name), then templates (alphabetical by name). Compositions sort alphabetical by `InstanceName`.
**`TreeView` wiring:**
| Param | Value |
|---|---|
| `Items` | `_roots` |
| `ChildrenSelector` | `n => n.Children` |
| `HasChildrenSelector` | `n => n.Kind != TmplNodeKind.Composition && n.Children.Count > 0` |
| `KeySelector` | `n => (object)n.Key` |
| `StorageKey` | `"templates-tree"` (preserved from current usage) |
| `Selectable` | `true` |
| `SelectedKeyChanged` | dispatch on key prefix: `t:``NavigationManager.NavigateTo($"/design/templates/{id}")` (TemplateEdit page); `f:` → no-op; `c:``NavigateTo` the composed template's edit page |
**Inline node labels** (see `Component-TreeView.md` V7 for the canonical recipe):
- Folder: `<i class="bi bi-folder">` (closed) or `<i class="bi bi-folder2-open">` (expanded) + name (semibold when has children) + count-pill badge of direct children.
- Template: `<i class="bi bi-file-earmark-text">` + `$Name` (semibold when has compositions). **No** inheritance hint, **no** attr/alarm/script count, **no** composition count on the node.
- Composition: `<i class="bi bi-arrow-return-right">` + composition instance name only. The composed template name is intentionally omitted from the tree — open the owning template's edit page to see/manage compositions.
**Search/filter:** out of scope for v1; the underlying component supports external filtering (per `Component-TreeView.md` R8) so it can be added later without component changes.
## Page layout
`/design/templates` is a **single-column tree browser** — no inline editor, no split pane.
```
+--------------------------------------------+
| Templates |
| [+Folder] [+Template] [Expand] [Collapse] |
| |
| ▶ 📁 _Default Templates |
| ▼ 📂 Dev |
| 📄 $TestMachine |
| ↪ DelmiaReceiver |
| ↪ MESReceiver |
| 📄 $TestObject |
| ▶ 📁 System |
| 📄 $UnfiledTemplate |
+--------------------------------------------+
```
- Tree scrollable region: `max-height: calc(100vh - 160px); overflow-y: auto`. The 2533% sidebar width constraint is removed; the tree uses the page's main container width.
- Selecting a template node navigates to `/design/templates/{id}` (TemplateEdit page).
- Selecting a composition node navigates to the composed template's edit page.
- Selecting a folder node is a no-op (still allowed; expansion and context-menu still work).
- Creating a template: toolbar "+ Template" button (or folder context-menu "New Template") navigates to `/design/templates/create?folderId={id}`. After successful create, the create page navigates to `/design/templates/{newId}`.
- URL contract for deep links: `/design/templates/{id}` resolves to the TemplateEdit page directly — the browser doesn't need to be on the tree page first.
## Context menus
The context menu is the **only** reorganization mechanism. Per-node-kind `ContextMenu` fragment driven by `node.Kind`:
**Folder:** New Folder · New Template · Rename · Move to Folder… · Delete
**Template:** Edit · Move to Folder… · Delete
**Composition:** Open composed template · Remove composition
- **Move to Folder…** opens a modal (`MoveFolderDialog` / `MoveTemplateDialog`) with a flat folder picker. The list includes "(Root)" as the first entry. For folder-move, the dialog client-side prunes the folder being moved and its descendants from the candidate list to prevent obvious cycles; the server still validates (authoritative). For template-move, all folders are valid targets.
- **Edit** on a template navigates to `/design/templates/{id}` (TemplateEdit page) — equivalent to clicking the node, kept in the menu for discoverability.
- Root-level "+ Folder" and "+ Template" buttons live in the toolbar above the tree.
**Server-side validation (authoritative)**:
- Folder onto descendant → reject (cycle).
- Folder onto itself → no-op (client prunes).
- Template-onto-template → not a valid target (templates aren't shown in the folder picker).
## Edge cases
- Deep-link route `/design/templates/{id}` resolves directly to the TemplateEdit page; the tree page is not involved. If the user navigates back, the tree's sessionStorage-persisted expansion state is restored.
- Stale `f:{id}` keys in `sessionStorage` after folder delete are harmless (ignored on next render).
- Selected template moved to another folder → tree rebuilds; selection preserved by stable key.
- Template deleted from the TemplateEdit page → page navigates back to `/design/templates`; the tree rebuilds without the deleted node.
- Last-write-wins on concurrent folder edits, matching existing template policy.
- Tree fully rebuilt on every CRUD; expected scale (dozens to low hundreds) makes this trivially cheap.
## Validation summary
| Operation | Check | Failure mode |
|---|---|---|
| Create folder | name non-empty, unique among siblings | structured error |
| Rename folder | same as create | structured error |
| Move folder | parent exists or null; no cycle; name still unique in new parent | structured error |
| Delete folder | no child folders, no child templates | error with counts |
| Move template | target folder exists or null | structured error |
## Testing
**Unit (`tests/ScadaLink.TemplateEngine.Tests/`):**
- `TemplateFolderServiceTests` — create / rename / move (happy + cycle + duplicate) / delete (happy + non-empty).
- `TemplateServiceTests``MoveTemplateAsync` happy + missing target.
- Migration test confirming nullable `FolderId` and existing templates retaining null.
**bUnit (`tests/ScadaLink.CentralUI.Tests/`):**
- Tree renders folders / templates / compositions in correct nesting.
- Empty state when no roots exist (no folders, no root templates).
- Selecting a template node invokes `NavigationManager.NavigateTo($"/design/templates/{id}")`.
- Selecting a composition node invokes `NavigateTo` for the composed template's edit page.
- Selecting a folder node is a no-op (no navigation).
- Right-click menus differ by node kind (Folder / Template / Composition each have distinct items).
- Folder context menu includes "Move to Folder…"; the dialog excludes the folder being moved and its descendants from candidates.
- Folder-delete-non-empty surfaces a structured error toast.
- Bootstrap Icons render in the glyph slot for each node kind (`bi-folder` / `bi-folder2-open` / `bi-file-earmark-text` / `bi-arrow-return-right`).
**Manual smoke (per `CLAUDE.md`):** nested folder creation, context-menu reorg (folder + template Move-to-Folder dialogs), cycle rejection, refresh persistence, composition navigation, navigation from tree to TemplateEdit and back.
## Documentation updates
- `docs/requirements/Component-CentralUI.md` — describe the templates page tree layout.
- `docs/requirements/Component-TemplateEngine.md` — add `TemplateFolder` entity + folder operations.
- `docs/requirements/Component-ConfigurationDatabase.md` — add `TemplateFolders` table + `Templates.FolderId` column.
- `docs/requirements/Component-ManagementService.md` — add new message contracts.
- `README.md` — note folder organization in the Template Engine row's responsibilities.
## Out of scope (for v1)
- Tree search / filter input (component already supports it; add when needed).
- CLI commands for folder operations (message contracts make this trivial later).
- Sibling reorder (sort stays alphabetical).
- Root context menu (right-click in empty tree area).
- (Removed from out-of-scope.) Bootstrap Icons are now adopted (static files at `wwwroot/lib/bootstrap-icons/`) — see `Component-TreeView.md` V4.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,29 @@
{
"planPath": "docs/plans/2026-05-11-templates-folder-hierarchy.md",
"tasks": [
{"id": 7, "subject": "Task 0: Confirm baseline + create work branch", "status": "pending"},
{"id": 8, "subject": "Task 1: Add TemplateFolder entity + Template.FolderId", "status": "pending", "blockedBy": [7]},
{"id": 9, "subject": "Task 2: EF configuration for TemplateFolder + Template.FolderId", "status": "pending", "blockedBy": [8]},
{"id": 10, "subject": "Task 3: Generate EF migration AddTemplateFolders", "status": "pending", "blockedBy": [9]},
{"id": 11, "subject": "Task 4: Repository methods for TemplateFolder", "status": "pending", "blockedBy": [10]},
{"id": 12, "subject": "Task 5: TemplateFolderService.CreateFolderAsync (TDD)", "status": "pending", "blockedBy": [11]},
{"id": 13, "subject": "Task 6: TemplateFolderService.RenameFolderAsync", "status": "pending", "blockedBy": [12]},
{"id": 14, "subject": "Task 7: TemplateFolderService.MoveFolderAsync with cycle detection", "status": "pending", "blockedBy": [13]},
{"id": 15, "subject": "Task 8: TemplateFolderService.DeleteFolderAsync (non-empty check)", "status": "pending", "blockedBy": [14]},
{"id": 16, "subject": "Task 9: TemplateService.MoveTemplateAsync", "status": "pending", "blockedBy": [11]},
{"id": 17, "subject": "Task 10: DI registration for TemplateFolderService", "status": "pending", "blockedBy": [15, 16]},
{"id": 18, "subject": "Task 11: Management command records for TemplateFolder", "status": "pending", "blockedBy": [17]},
{"id": 19, "subject": "Task 12: ManagementActor authorization + handlers", "status": "pending", "blockedBy": [18]},
{"id": 20, "subject": "Task 13: Templates.razor — load folders alongside templates", "status": "pending", "blockedBy": [17]},
{"id": 21, "subject": "Task 14: Build new TmplNode tree model", "status": "pending", "blockedBy": [20]},
{"id": 22, "subject": "Task 15: Split-pane layout + new TreeView wiring", "status": "pending", "blockedBy": [21]},
{"id": 23, "subject": "Task 16: Per-kind context menus", "status": "pending", "blockedBy": [22]},
{"id": 24, "subject": "Task 17: New-folder, new-template, move-template dialogs", "status": "pending", "blockedBy": [23]},
{"id": 25, "subject": "Task 18: Drag-drop reorganization", "status": "pending", "blockedBy": [24]},
{"id": 26, "subject": "Task 19: Deep-link reveal on load", "status": "pending", "blockedBy": [22]},
{"id": 27, "subject": "Task 20: bUnit tests for the new page", "status": "pending", "blockedBy": [22]},
{"id": 28, "subject": "Task 21: Documentation updates", "status": "pending", "blockedBy": [25, 26, 27]},
{"id": 29, "subject": "Task 22: Final smoke + green-suite check", "status": "pending", "blockedBy": [25, 26, 27, 28]}
],
"lastUpdated": "2026-05-11"
}
@@ -0,0 +1,272 @@
# Derive-on-compose template specialization
## Goal
Match Aveva System Platform's composition model: composing template
`$Sensor` into template `$Pump` no longer references `$Sensor` directly. Instead
the system creates a derived template that **inherits** from `$Sensor`, then the
composition references the derived template. The derived template lives under
the owning parent and can:
- override attribute default values
- override script bodies
- add new attributes / scripts the base doesn't have
- be prevented from overriding fields the base marks as locked
This is the user-selected approach (Option C "Always-derive") from the
brainstorming session, with all four customization scopes enabled.
## Why
- Per-composition customization is a real SCADA use case (Pump's TempSensor
needs different alarm thresholds from Motor's TempSensor).
- Single parent always at design time: removes the multi-parent picker we just
added.
- Industry-standard mental model for users coming from Aveva / Wonderware.
## Non-goals
- Replacing the existing `ParentTemplateId` inheritance chain — we reuse it.
- Versioning of base templates separately from derived (out of scope; can layer
later).
- Cross-template attribute references (already covered by Children/Parent).
## Data model changes
`Template` gains:
```csharp
public bool IsDerived { get; set; } // hides from main tree
public int? OwnerCompositionId { get; set; } // back-ref to composition
```
`TemplateAttribute` gains:
```csharp
public bool IsInherited { get; set; } // value came from base
public bool LockedInDerived { get; set; } // base marks "no override"
```
`TemplateScript` gains the same `IsInherited` / `LockedInDerived` pair.
`TemplateComposition` is unchanged in shape — `ComposedTemplateId` now points
at the **derived** template, not the base. The base is reachable via
`derived.ParentTemplateId`.
**Why a separate `IsDerived` flag rather than just "has a parent and is composed
once":** explicit marker keeps the tree-view filtering trivial and signals
intent independent of current composition state.
**Why `OwnerCompositionId` instead of inferring from `TemplateComposition`
back-pointers:** O(1) lookup for cascade-delete and forbid-direct-edit paths.
## Lifecycle
```
Compose "$Sensor" into "$Pump" as instance "TempSensor":
1. Create new template { Name: "Pump.TempSensor", ParentTemplateId: $Sensor.Id,
IsDerived: true, Description: from $Sensor }
2. Copy $Sensor.Attributes into the new template marked IsInherited=true
3. Copy $Sensor.Scripts into the new template marked IsInherited=true
4. Create TemplateComposition { TemplateId: $Pump.Id,
ComposedTemplateId: newTemplate.Id,
InstanceName: "TempSensor" }
5. Set newTemplate.OwnerCompositionId = the new composition's Id
```
Delete composition or owning parent → cascade-delete the derived template.
Rename composition InstanceName → rename the derived template (`Pump.NewName`).
Edit base attribute that is `IsInherited=true` on derivatives → the derivatives
pick up the change *if* they haven't overridden that field. Override sets
`IsInherited=false`.
## Lock semantics
Existing `IsLocked` on `TemplateAttribute` already exists with the meaning
"this attribute on this template is locked for editing." Add a second flag
`LockedInDerived` meaning "derived templates may not override the value
inherited from this attribute." These compose:
| State on base | What derived can do |
|---|---|
| neither flag set | Override value freely |
| `LockedInDerived` only | Cannot override; inherited value is final |
| `IsLocked` only | Base itself can't be edited; derived can still override |
| both | Locked everywhere |
## Flattening implications
`FlatteningService.ResolveInheritedScripts` already walks a template chain via
`ParentTemplateId`. That logic already handles "child overrides parent;
parent's `IsLocked` blocks override." We extend the same with
`LockedInDerived` for both attributes and scripts.
`ResolveComposedScripts` walks compositions → composed templates. Today the
prefix is the `InstanceName`. With derived templates the prefix is still the
`InstanceName` (the derived template's name `Pump.TempSensor` doesn't show up
in canonical paths — paths use the slot name, not the template name).
The `ResolvedScript.Scope` we landed for Phase 2 of the previous design still
applies: `SelfPath = "TempSensor"`, `ParentPath = ""`. No change.
## UI changes
### Template tree
Hide `IsDerived` templates from the main list. They're reachable via:
- the Compositions tab on the parent template (click the row → opens the
derived template's edit page)
- a "Show derived templates" toggle on the tree page (off by default)
### TemplateEdit for a derived template
Top banner: *"Derived from `$Sensor` — composed inside `$Pump` as `TempSensor`."*
Attributes table renders three columns of state:
- **Override / Inherited** badge per row
- Locked-from-base attributes render readonly with a 🔒 icon and tooltip
*"Locked by base — cannot override."*
Scripts table same treatment.
Adding a new attribute or script on the derived template is allowed (creates
a row with `IsInherited = false`).
Removing an inherited row reverts it to the base value (the row goes back to
inherited state). Removing an own-added row deletes it.
### TemplateEdit for a base template
Two extra columns on attribute / script tables:
- 🔒 toggle for `LockedInDerived` — "Lock this against per-slot override"
### Compositions tab
Today: lists composition rows with InstanceName + ComposedTemplate name.
After: each row links to *its derived template* (not the base). InstanceName
becomes the visible label.
Renaming a composition renames the derived template too.
### Composition picker (when adding a composition)
Today: pick a template + provide an instance name.
After: pick a **base** template + provide an instance name. The system creates
the derived template behind the scenes.
The picker filters out `IsDerived` templates — you can only compose bases.
## Editor metadata implications
The multi-parent picker becomes mostly irrelevant:
- **Derived template**: always single parent (the composition it's owned by).
`Parent.*` resolves to that one. No picker.
- **Base template**: still has no direct parent (it's a library entry).
`Parent.*` autocompletion is suppressed. Scripts on bases that use
`Parent.*` get a warning *"Parent access on a base template is ambiguous —
override this script in the derived template instead."*
`TemplateEdit.BuildParentContextsAsync` simplifies to: "if derived, return the
single owning parent; else return null."
`GetTemplatesComposingAsync` repository method still useful (e.g., for "find
all uses of this base"), but the editor metadata path doesn't need it.
## Migration
One-shot for existing data:
```sql
-- pseudo-SQL describing intent
FOREACH composition IN TemplateComposition:
derived := INSERT INTO Templates (
Name = parent.Name + "." + composition.InstanceName,
ParentTemplateId = composition.ComposedTemplateId,
IsDerived = true,
OwnerCompositionId = composition.Id
)
-- Copy attributes from base, mark IsInherited=true
INSERT INTO TemplateAttributes
SELECT @derived.Id, Name, Value, DataType, true, ... FROM base.Attributes
-- Same for scripts
UPDATE TemplateComposition SET ComposedTemplateId = derived.Id WHERE Id = composition.Id
```
EF Core migration in `ScadaLink.ConfigurationDatabase/Migrations/`.
Rollback strategy: the migration is one-way for new derivations, but old
composition data can be reconstructed from `IsDerived` templates' `ParentTemplateId`.
## Phased rollout
Each phase is independently shippable and reviewable.
1. **Schema + entities.** Add the new fields. Empty migration. EF mappings.
No behavior changes. Existing data unaffected.
2. **Composition flow change.** Modify `TemplateService.AddCompositionAsync`
to derive on compose for *new* compositions. Existing data still has direct
compositions and continues to work. Two modes coexist during the cutover.
3. **Migration.** EF Core migration script that walks existing compositions
and creates the derived templates retroactively. After this all
compositions are derived.
4. **Inherit/override resolution.** Update `FlatteningService` to merge
inherited and overridden fields. Tests for the override semantics.
5. **Lock semantics.** Wire `LockedInDerived` through `TemplateService`
update paths. Tests.
6. **Template tree UI.** Hide derived templates from the main listing;
surface them through the parent's Compositions tab.
7. **Derived TemplateEdit UI.** Banner, inherited/override badges,
readonly-when-locked, override/revert actions.
8. **Base TemplateEdit UI.** Add the LockedInDerived toggle column.
9. **Editor metadata simplification.** Replace the multi-parent picker with
the single-parent resolver. Base templates suppress `Parent.*` assistance
and warn on use.
## Out of scope (for now)
- Versioning of base templates with explicit "update derived templates to
base v2" workflow.
- Reverse-flow: editing a derived value and asking "promote to base."
- Multiple inheritance levels for derivation (e.g., `$Sensor → $Sensor.Pump →
$Sensor.Pump.HighTemp`) — the data model supports it via
`ParentTemplateId`, but the UX hasn't been designed.
- Cross-tenant template libraries.
## Decisions
- **Naming**: dot-separated (`Pump.TempSensor`). Matches the canonical-path
format used in flattening. Visible in audit logs / error messages.
- **Delete base with derivatives**: block the delete and list the derivatives.
User must remove or repoint them first.
- **Migration of existing data**: EF Core migration on next startup
auto-derives every existing composition. After deploy all compositions are
derived; no mixed-mode code paths.
- **Tree UX**: derived templates hidden by default. "Show derived templates"
toggle on the tree page reveals them indented under their base. Always
reachable from the parent's Compositions tab.
## Confirmed semantics
- **Re-composing the same base on the same parent in two slots** (e.g. Pump
composes Sensor twice as `IntakeSensor` and `OutletSensor`) produces two
derived templates: `Pump.IntakeSensor` and `Pump.OutletSensor`, both
inheriting from `Sensor`.
- **Inheritance updates flow downward**: if a base attribute changes value
later and the derivative has `IsInherited = true` for that attribute, the
derived value updates. Once overridden (`IsInherited = false`), changes to
the base no longer affect that field.
- **Subsequent `LockedInDerived` after overrides exist**: surface as a
validation error at deploy time; do not force-revert silently.
@@ -0,0 +1,184 @@
# Derive-on-compose: implementation status
> **For Claude resuming later:** All nine phases are implemented. This
> file is the change-record for the work, not a plan. See the companion
> design doc `2026-05-12-derive-on-compose-design.md` for rationale.
## Where we are
**Branch**: `feature/templates-folder-hierarchy`.
**Last commit on this feature**: `a965d4a` — *Phase 9 complete,
single-parent editor context*.
**All nine phases done**. Live verification against SQL Server (phase-3
migration shape) and a UI smoke test are still recommended before merge.
**All test suites currently green**:
- `tests/ScadaLink.CentralUI.Tests` — 159 passing
- `tests/ScadaLink.SiteRuntime.Tests` — 129 passing
- `tests/ScadaLink.TemplateEngine.Tests` — 212 passing (+13 derive-on-compose tests)
## Design decisions already made (from the brainstorm)
User picked the **full Aveva model** with all four customization scopes:
- **Naming**: dot-separated → `Pump.TempSensor`
- **Delete base with derivatives**: block with a list of the dependents
- **Migration of existing compositions**: auto-migrate all on the EF Core
migration step in Phase 3
- **Tree UX**: derived templates hidden by default; toggle to reveal
- **Customization scope**: override attribute values, override script bodies,
add new attrs/scripts per slot, lock fields against override
## Done — Phase 1: Additive schema
Commits: `6854843` (design doc) + `a968cef` (decisions recorded) + `5615f3d`.
## Done — Phase 2: Compose flow change
Commit: `fa86750`.
- `TemplateService.AddCompositionAsync` builds a derived template
(`"<parent>.<slot>"`), copies base attributes/scripts with
`IsInherited=true`, then composes the derived (not the base). Sets
`OwnerCompositionId` back-ref after the composition's Id is known.
- Composing a derived template is rejected — only bases can be composed.
- `DeleteCompositionAsync` cascade-deletes the slot-owned derived
template (`IsDerived=true` and `OwnerCompositionId==compositionId`).
- `DeleteTemplateAsync` blocks direct deletion of derived templates and
splits the inheritor check into regular children vs. derivatives — the
derivative branch labels each by `'OwnerName' (as 'SlotName')`.
- `TemplateDeletionService.CanDeleteTemplateAsync` mirrors the same
derivative-aware checks.
## Done — Phase 3: Migration of existing compositions
Commit: `03a8c4a`. Migration `20260512122746_MigrateCompositionsToDerived`.
- Pre-flight aborts with a descriptive error if any
`<parent>.<slot>` derived name would collide.
- Cursor-walks every `TemplateComposition` whose target is `IsDerived=0`,
inserts a derived template, copies attributes/scripts with
`IsInherited=1`, then repoints `ComposedTemplateId`.
- Idempotent (only touches non-derived targets), so re-runs are safe.
- `Down()` reverses by repointing compositions to `ParentTemplateId` and
dropping the derived templates.
The migration was NOT verified against a live SQL Server in this
session — run `bash docker/deploy.sh` (or `dotnet ef database update`)
once with seeded test data to confirm shape.
Files touched in `5615f3d`:
- `src/ScadaLink.Commons/Entities/Templates/Template.cs`
- Added `IsDerived: bool`
- Added `OwnerCompositionId: int?` (plain int — not an EF nav prop)
- `src/ScadaLink.Commons/Entities/Templates/TemplateAttribute.cs`
- Added `IsInherited: bool`
- Added `LockedInDerived: bool`
- `src/ScadaLink.Commons/Entities/Templates/TemplateScript.cs`
- Same two fields
- `src/ScadaLink.ConfigurationDatabase/Migrations/20260512121446_AddDerivedTemplateFields.cs`
- EF Core migration. Six new columns, all NOT NULL DEFAULT 0 (or nullable
int). No data transform — existing rows get defaults.
- `ScadaLinkDbContextModelSnapshot.cs` regenerated.
**No behavior changes**. New fields are never read or written yet.
## Done — Phase 4+5: Flattening + lock enforcement
Commit: `f599809`.
- `FlatteningService.ResolveInheritedAttributes` / `ResolveInheritedScripts`
treat `IsInherited=true` rows as placeholders that don't shadow the
resolved base value. Override (`IsInherited=false`) wins as before.
- `ValidateLockedInDerived` runs once per chain (main + every composed
chain) and returns a flatten-time failure if a derived row overrides
a `LockedInDerived` base member.
- `TemplateService.UpdateAttributeAsync` / `UpdateScriptAsync` reject
derived-side overrides of `LockedInDerived` base members, and now
persist `IsInherited` (on derived) / `LockedInDerived` (on base) from
the proposed payload so the UI can drive override state.
## Done — Phase 6: Template tree hides derived
Commit: `f05b03f` (combined with phases 7+8).
`Templates.razor` filters `t.IsDerived` from the main tree. A "Show
derived" form-switch in the page header flips the filter — derived
templates surface in the flat list so users can still reach them.
## Done — Phase 7+8: Derived/base TemplateEdit UI
Commit: `f05b03f`.
- Derived banner: links to base + slot owner / instance name from
`OwnerCompositionId`.
- Attributes / Scripts tables grew a context-aware column:
* Derived: Source badge (Inherited / Override / Local), plus a
"🔒 Base-locked" badge when `LockedInDerived`.
* Base: a form-switch that flips `LockedInDerived` through
`UpdateAttribute` / `UpdateScript`.
- Effective Value / Code resolves from the base when the derived row
carries an inherited (potentially stale) copy — matches the runtime
flatten behavior so the UI doesn't lie.
- Override and Revert-to-base actions live on the row kebab. Delete is
hidden on inherited rows (the base owns those).
- "When a base toggles LockedInDerived while derivatives override the
field, warn via toast" is NOT implemented — kept out of scope; flatten
validation already surfaces it at deploy time.
## Done — Phase 9: Single-parent editor context
Commit: `a965d4a`.
- `BuildParentContextsAsync` resolves the editor's `Parent.*` context
to exactly one entry for derived templates (via `OwnerCompositionId`)
and to an empty list for base templates.
- Multi-parent `<select>` dropdown removed from the Add Script form.
- `_selectedParentIndex` / `OnParentContextChanged` deleted;
`ActiveEditorParent` collapses to `_editorParents.FirstOrDefault()`.
- The SCADA008 hint diagnostic on `Parent.*` use within base templates
was NOT added in this pass — the analyzer simply emits no completions
when the parent context is empty. Add it later if users want a
positive nudge.
## Still to verify
- Apply the Phase-3 migration against a real SQL Server (run
`bash docker/deploy.sh` or `dotnet ef database update`) with seeded
data to confirm `MigrateCompositionsToDerived` produces the right
shape and respects the collision pre-check.
- Smoke-test the UI flows: add a composition, override an attribute,
revert, toggle `LockedInDerived` on a base, edit a script on a
derived template (single-parent context).
## How to resume
A future session should:
1. Read this file and the design doc.
2. Run `git log --oneline -15` to confirm the branch is at `a965d4a` or
later.
3. Run the three test suites named above.
4. Ask the user whether to ship or to address one of the deferred items
("when base toggles LockedInDerived while derivatives override",
SCADA008 base-Parent hint, or the live-DB / UI smoke verifications).
## Quick sanity script
```bash
git status --short # should be clean
git log --oneline -10 # top should include a965d4a
dotnet build src/ScadaLink.CentralUI src/ScadaLink.TemplateEngine src/ScadaLink.ConfigurationDatabase
dotnet test tests/ScadaLink.TemplateEngine.Tests/ScadaLink.TemplateEngine.Tests.csproj
dotnet test tests/ScadaLink.CentralUI.Tests/ScadaLink.CentralUI.Tests.csproj
dotnet test tests/ScadaLink.SiteRuntime.Tests/ScadaLink.SiteRuntime.Tests.csproj
```
Note: the full `dotnet build` of the solution fails with NU1608 in
`ScadaLink.IntegrationTests` and `ScadaLink.Host.Tests` due to a
pre-existing `Microsoft.CodeAnalysis.Common` 4.13 vs 5.0 mismatch — not
related to the derive-on-compose work. Build the three suites listed in
"Where we are" individually.
@@ -0,0 +1,293 @@
# OPC UA Endpoint Config Model & Form Refactor — Design
**Date**: 2026-05-12
**Branch**: `feature/templates-folder-hierarchy` (and successors)
**Status**: Design approved, ready for implementation planning
## Problem
`DataConnection.PrimaryConfiguration` and `BackupConfiguration` are free-form JSON strings. Today:
- The site-side runtime (`OpcUaDataConnection.cs:44-90`) parses them as a flat `IDictionary<string,string>` and string-fishes ~12 keys (`endpoint` / `EndpointUrl`, `SessionTimeoutMs`, `SecurityMode`, `AutoAcceptUntrustedCerts`, etc.).
- The Central UI form (`DataConnectionForm.razor`) edits them as plain textareas. Its placeholder hints are inconsistent: `{"endpoint":"opc.tcp://..."}` for primary but `{"Host":"backup-host","Port":50101}` for backup — the latter is **not** actually parsed by the runtime.
- There is no schema, no validator, no documentation that's actually checked by code.
- The form's Protocol dropdown still offers "Custom" although no backend adapter exists — selecting it produces a deploy-time `"Unknown protocol type: Custom"` failure.
We want a strongly-typed model for OPC UA endpoint configuration, a validator that's the single source of truth for what's legal, and a form that renders typed controls per field instead of a JSON blob.
## Decision summary
| # | Decision | Choice |
|---|----------|--------|
| 1 | Scope of the model | **Single source of truth** — used by both UI and runtime. Drops the dictionary-key string-fishing in `OpcUaDataConnection.cs`. |
| 2 | Field coverage in the form | **All fields, grouped**: Connection / Timing / Subscription / Heartbeat. Sensible defaults pre-filled. |
| 3 | Custom protocol option | **Remove from dropdown**. OPC UA is the only supported protocol today. |
| 4 | Storage format | **Typed nested JSON** via System.Text.Json with camelCase + `JsonStringEnumConverter`. |
| 5 | Model location | **`ScadaLink.Commons/Types/DataConnections/`** plus a sibling Validators/Serialization namespace. |
| 6 | Validator return type | **`ValidationResult` + `ValidationEntry`** — matches `SemanticValidator` convention. |
| 7 | Form structure | **Shared `<OpcUaEndpointEditor>` Blazor component**, used twice (primary + backup). |
| 8 | Protocol field in UI | **Hidden**; entity field set to `"OpcUa"` implicitly on save. |
| 9 | Validation timing | **On Save click only**. No live per-field validation. |
| 10 | Legacy-row handling | **Best-effort parse + warning banner**. Save rewrites to the new shape. |
## Architecture
```
┌──────────────────────────────────────┐
│ ScadaLink.Commons │
│ Types/DataConnections/ │
│ OpcUaEndpointConfig.cs (POCO) │
│ OpcUaHeartbeatConfig.cs (POCO) │
│ OpcUaSecurityMode.cs (enum) │
│ Validators/ │
│ OpcUaEndpointConfigValidator.cs │
│ Serialization/ │
│ OpcUaEndpointConfigSerializer.cs │
└──────────────────────────────────────┘
│ (referenced by both)
┌───────┴────────────────────────┐
▼ ▼
┌──────────────────────────┐ ┌────────────────────────────┐
│ ScadaLink.CentralUI │ │ ScadaLink.SiteRuntime │
│ Components/Forms/ │ │ Actors/ │
│ OpcUaEndpointEditor │ │ DeploymentManagerActor │
│ .razor (shared) │ │ (passes raw JSON to │
│ │ │ DataConnectionFactory)│
│ Pages/Admin/ │ │ │
│ DataConnectionForm │ │ DataConnections.OpcUa/ │
│ .razor │ │ OpcUaDataConnection.cs │
└──────────────────────────┘ │ (consumes typed model) │
└────────────────────────────┘
```
Both sides deserialize from `DataConnection.PrimaryConfiguration` / `BackupConfiguration` strings into the same `OpcUaEndpointConfig` instance. The DB column type does not change.
## The model
```csharp
// ScadaLink.Commons/Types/DataConnections/OpcUaEndpointConfig.cs
namespace ScadaLink.Commons.Types.DataConnections;
public sealed class OpcUaEndpointConfig
{
// Connection
public string EndpointUrl { get; set; } = "";
public OpcUaSecurityMode SecurityMode { get; set; } = OpcUaSecurityMode.None;
public bool AutoAcceptUntrustedCerts { get; set; } = true;
// Timing
public int SessionTimeoutMs { get; set; } = 60000;
public int OperationTimeoutMs { get; set; } = 15000;
// Subscription
public int PublishingIntervalMs { get; set; } = 1000;
public int SamplingIntervalMs { get; set; } = 1000;
public int QueueSize { get; set; } = 10;
public int KeepAliveCount { get; set; } = 10;
public int LifetimeCount { get; set; } = 30;
public int MaxNotificationsPerPublish { get; set; } = 100;
// Heartbeat (optional)
public OpcUaHeartbeatConfig? Heartbeat { get; set; }
}
public sealed class OpcUaHeartbeatConfig
{
public string TagPath { get; set; } = "";
public int MaxSilenceSeconds { get; set; } = 30;
}
public enum OpcUaSecurityMode { None, Sign, SignAndEncrypt }
```
Defaults match the runtime's current fallbacks so a default-constructed config equals the empty/missing-JSON case. Settable properties (not `init`) so the form can `@bind` directly.
## The validator
```csharp
// ScadaLink.Commons/Validators/OpcUaEndpointConfigValidator.cs
public static class OpcUaEndpointConfigValidator
{
public static ValidationResult Validate(OpcUaEndpointConfig config, string fieldPrefix = "")
{
var errors = new List<ValidationEntry>();
if (string.IsNullOrWhiteSpace(config.EndpointUrl))
errors.Add(Err("EndpointUrl", "Endpoint URL is required."));
else if (!Uri.TryCreate(config.EndpointUrl, UriKind.Absolute, out var uri)
|| uri.Scheme != "opc.tcp")
errors.Add(Err("EndpointUrl",
"Endpoint URL must be a valid opc.tcp:// URI."));
if (config.SessionTimeoutMs <= 0)
errors.Add(Err("SessionTimeoutMs", "Must be > 0."));
if (config.OperationTimeoutMs <= 0)
errors.Add(Err("OperationTimeoutMs", "Must be > 0."));
if (config.PublishingIntervalMs <= 0)
errors.Add(Err("PublishingIntervalMs", "Must be > 0."));
if (config.SamplingIntervalMs <= 0)
errors.Add(Err("SamplingIntervalMs", "Must be > 0."));
if (config.QueueSize < 1)
errors.Add(Err("QueueSize", "Must be ≥ 1."));
if (config.KeepAliveCount < 1)
errors.Add(Err("KeepAliveCount", "Must be ≥ 1."));
if (config.LifetimeCount < config.KeepAliveCount * 3)
errors.Add(Err("LifetimeCount",
"Must be at least 3× KeepAliveCount per OPC UA spec."));
if (config.MaxNotificationsPerPublish < 1)
errors.Add(Err("MaxNotificationsPerPublish", "Must be ≥ 1."));
if (config.Heartbeat is { } hb)
{
if (string.IsNullOrWhiteSpace(hb.TagPath))
errors.Add(Err("Heartbeat.TagPath",
"Tag path is required when heartbeat is enabled."));
if (hb.MaxSilenceSeconds <= 0)
errors.Add(Err("Heartbeat.MaxSilenceSeconds", "Must be > 0."));
}
return errors.Count == 0
? ValidationResult.Success()
: ValidationResult.FromErrors(errors);
ValidationEntry Err(string field, string msg) =>
new(Field: $"{fieldPrefix}{field}",
Message: msg,
Category: ValidationCategory.Schema);
}
}
```
Key points:
- `fieldPrefix` parameter — form passes `"Primary."` / `"Backup."` so error messages disambiguate.
- `LifetimeCount ≥ 3 × KeepAliveCount` is an actual OPC UA spec constraint and exemplifies the "domain knowledge in the validator" win.
- Static, pure, no DI — trivial to unit-test.
## Serialization & legacy fallback
```csharp
// ScadaLink.Commons/Serialization/OpcUaEndpointConfigSerializer.cs
public static class OpcUaEndpointConfigSerializer
{
private static readonly JsonSerializerOptions JsonOpts = new()
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
WriteIndented = false,
Converters = { new JsonStringEnumConverter(JsonNamingPolicy.CamelCase) }
};
public static string Serialize(OpcUaEndpointConfig config)
=> JsonSerializer.Serialize(config, JsonOpts);
public static (OpcUaEndpointConfig Config, bool IsLegacy) Deserialize(string? json)
{
if (string.IsNullOrWhiteSpace(json))
return (new OpcUaEndpointConfig(), false);
try
{
using var doc = JsonDocument.Parse(json);
if (doc.RootElement.TryGetProperty("endpointUrl", out _))
return (JsonSerializer.Deserialize<OpcUaEndpointConfig>(json, JsonOpts)!, false);
}
catch (JsonException) { /* fall through */ }
return (LoadLegacy(json), IsLegacy: true);
}
private static OpcUaEndpointConfig LoadLegacy(string json)
{
var dict = JsonSerializer.Deserialize<Dictionary<string, string>>(json)
?? new();
var c = new OpcUaEndpointConfig
{
EndpointUrl = dict.GetValueOrDefault("endpoint")
?? dict.GetValueOrDefault("EndpointUrl") ?? "",
SecurityMode = Enum.TryParse<OpcUaSecurityMode>(
dict.GetValueOrDefault("SecurityMode"), out var sm) ? sm : OpcUaSecurityMode.None,
AutoAcceptUntrustedCerts = ParseBool(dict, "AutoAcceptUntrustedCerts", true),
SessionTimeoutMs = ParseInt(dict, "SessionTimeoutMs", 60000),
OperationTimeoutMs = ParseInt(dict, "OperationTimeoutMs", 15000),
PublishingIntervalMs = ParseInt(dict, "PublishingIntervalMs", 1000),
SamplingIntervalMs = ParseInt(dict, "SamplingIntervalMs", 1000),
QueueSize = ParseInt(dict, "QueueSize", 10),
KeepAliveCount = ParseInt(dict, "KeepAliveCount", 10),
LifetimeCount = ParseInt(dict, "LifetimeCount", 30),
MaxNotificationsPerPublish = ParseInt(dict, "MaxNotificationsPerPublish", 100)
};
var hbPath = dict.GetValueOrDefault("HeartbeatTagPath");
if (!string.IsNullOrWhiteSpace(hbPath))
c.Heartbeat = new OpcUaHeartbeatConfig
{
TagPath = hbPath,
MaxSilenceSeconds = ParseInt(dict, "HeartbeatMaxSilence", 30)
};
return c;
}
}
```
`Deserialize` returns `(Config, IsLegacy)`. The form raises a Bootstrap warning banner when `IsLegacy=true`. On Save we always `Serialize` — the row gets rewritten to the new shape and the banner disappears on next edit.
## The shared Blazor component
`src/ScadaLink.CentralUI/Components/Forms/OpcUaEndpointEditor.razor`
Parameters:
- `Config` (`[EditorRequired]`) — bound by reference; parent owns the instance.
- `Title` — header text (e.g. "Primary Endpoint").
- `IdPrefix` — disambiguates `for=` attributes when the component appears twice.
- `IsLegacy` — toggles the warning banner.
- `Errors` (`ValidationResult?`) — drives per-field red text via `EndsWith("." + field)` match against `ValidationEntry.Field`.
Rendering: four section labels (Connection, Timing, Subscription, Heartbeat) with Bootstrap `row g-2` grids. Heartbeat starts collapsed behind an "Enable Heartbeat" button; once shown it has a "Remove Heartbeat" button. Per-field error text appears immediately below each control.
## DataConnectionForm changes
- **Removed**: Protocol `<select>`, the JSON `<textarea>` for primary, the JSON `<textarea>` for backup.
- **Added**: Two `<OpcUaEndpointEditor>` instances. The backup one is still gated behind "Add Backup Endpoint" / "Remove Backup" buttons, and `Failover Retry Count` stays in the backup subsection.
- **Code-behind**: `_primaryConfig` and `_backupConfig` (`OpcUaEndpointConfig` instances), `_primaryIsLegacy`/`_backupIsLegacy` flags, `_primaryErrors`/`_backupErrors` (`ValidationResult?`). Save runs the validator on both, bails out on failure, serializes via `OpcUaEndpointConfigSerializer.Serialize`.
- **Protocol on the entity** is set to the literal `"OpcUa"` on create. The column stays so the runtime's protocol-dispatch (`DataConnectionFactory`) is untouched.
## Runtime parser swap
`src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:426-456` — today this code parses both JSON strings into `Dictionary<string, string>` and hands the dict to `DataConnectionFactory`.
After the change:
- `DeploymentManagerActor` no longer parses JSON. It passes the raw `PrimaryConfiguration` / `BackupConfiguration` strings straight to the factory.
- `DataConnectionFactory.Create` (OPC UA branch) calls `OpcUaEndpointConfigSerializer.Deserialize(...)`, gets the typed model, and constructs `OpcUaDataConnection` with it.
- `OpcUaDataConnection.cs:44-90` is rewritten to take `OpcUaEndpointConfig` directly. The `connectionDetails.TryGetValue(...)` ladder and the `ParseInt` / `ParseBool` helpers go away. Heartbeat becomes `if (cfg.Heartbeat is { } hb) { ... }`.
Pre-refactor deployment artifacts still load: the serializer's legacy-dict fallback handles them. `IsLegacy` is discarded by the runtime (only the form cares).
## Tests
| Project | New / changed tests |
|---|---|
| `ScadaLink.Commons.Tests` | `OpcUaEndpointConfigSerializerTests`: typed-JSON roundtrip preserves all fields; legacy flat-dict deserializes correctly and sets `IsLegacy=true`; empty/null JSON returns defaults; unknown JSON shape falls back cleanly. |
| `ScadaLink.Commons.Tests` | `OpcUaEndpointConfigValidatorTests`: missing URL → error; bad scheme → error; `LifetimeCount < 3×KeepAliveCount` → error; heartbeat-enabled-but-no-tag-path → error; valid config → `IsValid=true`; `fieldPrefix` applied to every error's `Field`. |
| `ScadaLink.CentralUI.Tests` | `OpcUaEndpointEditorTests` (bUnit): renders all grouped sections; binding mutates the passed `Config`; Enable/Remove Heartbeat toggles the sub-object; passing `Errors` renders per-field red text; `IsLegacy=true` shows the warning banner. |
| `ScadaLink.CentralUI.Tests` | `DataConnectionsFormTests` (bUnit, add if missing): Save with invalid primary URL → no navigation, validator error shown; Save with valid config → repo `AddDataConnectionAsync` called with `Protocol="OpcUa"` and JSON containing `"endpointUrl"` in camelCase. |
| Site/DCL test project | Update existing tests to construct `OpcUaDataConnection` from `OpcUaEndpointConfig` instead of `IDictionary<string,string>`. |
## Out of scope
- No EF Core migration; the legacy-parse path handles pre-existing rows.
- No new protocols. Custom dropdown option is removed. If/when a second protocol lands, the form re-introduces a protocol dropdown and a `if (Protocol == "OpcUa")` branch around the editor component.
- No live (debounced) validation.
- No certificate management UI beyond `AutoAcceptUntrustedCerts`.
- No "Verify endpoint" button.
- No rewrite of `docs/requirements/Component-DataConnectionLayer.md` — a short note pointing at `OpcUaEndpointConfig` as the canonical schema is enough.
## Verification
1. `dotnet build` clean.
2. `dotnet test` for Commons + CentralUI + SiteRuntime/DCL — all green, including new tests.
3. `bash docker/deploy.sh` — rebuild cluster.
4. Browser smoke at `http://localhost:9000/admin/connections`:
- New connection via site context menu → form shows the OPC UA endpoint editor; no Protocol dropdown.
- Bad URL → Save → red error under Endpoint URL.
- Valid config, toggle heartbeat, set timing knobs → Save → row created; reload → fields round-trip.
- Edit a pre-refactor row → warning banner appears, fields populated from legacy dict; Save rewrites; second edit no banner.
- Add backup endpoint, save, deploy a template that uses the connection → site logs show primary online and failover settings honored.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,20 @@
{
"planPath": "docs/plans/2026-05-12-opcua-config-model.md",
"tasks": [
{"id": 45, "subject": "Task 1: Create OPC UA config POCOs + ValidationCategory.ConnectionConfig", "status": "pending"},
{"id": 46, "subject": "Task 2: TDD failing tests for OpcUaEndpointConfigSerializer", "status": "pending", "blockedBy": [45]},
{"id": 47, "subject": "Task 3: Implement OpcUaEndpointConfigSerializer", "status": "pending", "blockedBy": [46]},
{"id": 48, "subject": "Task 4: TDD failing tests for OpcUaEndpointConfigValidator", "status": "pending", "blockedBy": [45]},
{"id": 49, "subject": "Task 5: Implement OpcUaEndpointConfigValidator", "status": "pending", "blockedBy": [48]},
{"id": 50, "subject": "Task 6: Refactor OpcUaDataConnection.ConnectAsync to use FromFlatDict", "status": "pending", "blockedBy": [47]},
{"id": 51, "subject": "Task 7: Refactor DeploymentManagerActor.EnsureDclConnections", "status": "pending", "blockedBy": [47]},
{"id": 52, "subject": "Task 8: TDD failing bUnit tests for OpcUaEndpointEditor", "status": "pending", "blockedBy": [45, 49]},
{"id": 53, "subject": "Task 9: Implement OpcUaEndpointEditor.razor", "status": "pending", "blockedBy": [52]},
{"id": 54, "subject": "Task 10: TDD failing bUnit tests for DataConnectionForm refactor", "status": "pending", "blockedBy": [47, 49]},
{"id": 55, "subject": "Task 11: Refactor DataConnectionForm.razor", "status": "pending", "blockedBy": [53, 54]},
{"id": 56, "subject": "Task 12: Solution build + all test suites green", "status": "pending", "blockedBy": [50, 51, 55]},
{"id": 57, "subject": "Task 13: Docker deploy + browser smoke", "status": "pending", "blockedBy": [56]},
{"id": 58, "subject": "Task 14: Push to origin", "status": "pending", "blockedBy": [57]}
],
"lastUpdated": "2026-05-12T04:33:33Z"
}
@@ -0,0 +1,177 @@
# Script parameter / return: JSON Schema + JSONJoy editor
**Date:** 2026-05-12
**Status:** Superseded — see "Reversal: native Blazor SchemaBuilder" below.
## Decision
Replace the custom `ParameterListEditor` / `ReturnTypeEditor` Blazor components
with [`jsonjoy-builder`](https://github.com/lovasoa/jsonjoy-builder) (`SchemaVisualEditor`),
embedded as a React island. The on-disk format for `TemplateScript.ParameterDefinitions`
and `TemplateScript.ReturnDefinition` changes from the project-local flat shape
(`[{name,type,required,itemType?}]` / `{type,itemType?}`) to standard JSON Schema.
## Rationale
The existing flat shape lacked descriptions, defaults, enums, nested objects,
and arrays of structured items. JSON Schema covers all of that, is the
industry vocabulary other tooling already speaks (OpenAPI 3.1, function-calling
APIs, validators), and `jsonjoy-builder` is a polished pre-built visual editor
for it.
## Trade-offs
- **Breaks the no-UI-framework rule for this feature.** `jsonjoy-builder` is
React 19 + Radix UI + Tailwind. Accepted: the island is isolated to one
modal panel, Tailwind is shipped pre-built (no toolchain shared with the
Blazor side), and the visual delta is contained.
- **New build pipeline.** A small Vite project under `src/ScadaLink.CentralUI/Schema.Editor/`
builds a single IIFE bundle into `wwwroot/lib/schema-editor/`. Output is
committed so `dotnet build` doesn't require Node.
- **Monaco overlap.** `jsonjoy-builder` depends on `@monaco-editor/react`,
which depends on `monaco-editor`. We already load Monaco globally for the
script code editor. The island calls `@monaco-editor/react`'s `loader.config({ monaco: window.monaco })`
at boot to reuse the same instance — no duplicate Monaco download.
## Storage format change
| Field | Before | After |
| ---------------------- | --------------------------------- | ----------------------------------------------------------- |
| `ParameterDefinitions` | `[{name,type,required,itemType?}]` | `{"type":"object","properties":{...},"required":[...]}` |
| `ReturnDefinition` | `{type,itemType?}` | Any JSON Schema (root `type` describes the returned value) |
Per the chosen rollout: **one-shot migration** rewrites all existing rows on
deploy. After the migration, the analysis pipeline reads JSON Schema only —
no dual-format support code.
Type mapping (flat → JSON Schema):
| Flat type | JSON Schema |
| --------- | ----------- |
| `Boolean` | `{"type":"boolean"}` |
| `Integer` | `{"type":"integer"}` |
| `Float` | `{"type":"number"}` |
| `String` | `{"type":"string"}` |
| `Object` | `{"type":"object"}` |
| `List` of X | `{"type":"array","items":{"type":<X>}}` |
`required: false` ⇒ name omitted from the `required` array.
`required: true` (default) ⇒ name added to `required`.
## Component layout
```
src/ScadaLink.CentralUI/Schema.Editor/ ← new Vite project (committed)
package.json
vite.config.ts
tsconfig.json
src/main.tsx ← exposes window.ScadaSchemaEditor
src/SchemaEditorApp.tsx
src/index.css
.gitignore ← node_modules only
dist/ ← (Vite outputs to wwwroot, not here)
src/ScadaLink.CentralUI/wwwroot/lib/schema-editor/
schema-editor.js ← built IIFE, committed
schema-editor.css
src/ScadaLink.CentralUI/Components/Shared/
SchemaEditor.razor ← Blazor wrapper; mirrors MonacoEditor.razor
src/ScadaLink.CentralUI/ScriptAnalysis/
ScriptShapeParser.cs ← rewrite to read JSON Schema
src/ScadaLink.CentralUI/Components/Shared/
ScriptParameterNames.cs ← rewrite to read JSON Schema
```
Removed after rollout: `ParameterListEditor.razor`, `ReturnTypeEditor.razor`.
## JS interop contract
```ts
window.ScadaSchemaEditor = {
mount(id: string, host: HTMLElement, options: {
value: string; // current schema JSON (may be empty)
mode: 'parameters' | 'return';
readOnly?: boolean;
}, dotNetRef: { invokeMethodAsync(name: 'OnValueChanged', json: string): Promise<void> }): void;
setValue(id: string, value: string): void;
dispose(id: string): void;
}
```
## Migration
EF Core migration in `ScadaLink.ConfigurationDatabase` reads
`TemplateScripts.ParameterDefinitions` and `ReturnDefinition` from every row,
sniffs format (array vs object), translates if legacy, writes back. Idempotent:
re-running a row already in JSON Schema is a no-op. Runs once at deploy via
the existing auto-apply path.
## Out of scope (deferred)
- Schema-driven value-entry forms (e.g. Inbound API tester) — would also use
`jsonjoy-builder`'s value-editor mode, but no caller surface needs it today.
- Hover/completion enhancements derived from JSON Schema descriptions or
defaults. Today's pipeline only needs name + type + required.
- Reuse of JSON Schema `$ref` across templates — could be a future template-level
schema library.
---
## Reversal: native Blazor SchemaBuilder (2026-05-12, same day)
JSONJoy worked but felt heavy for the actual data we author here. Specifically:
- The "Add Field" modal flow is two clicks per parameter where the legacy
inline-row editor was zero. For the common 1-3 scalar-param case, a visible
modal dialog every time is friction.
- JSONJoy's value-mode UX is awkward — it always renders an "Add Field" button
even when the schema's root type is `string` / `integer` / etc., so the
Return-type tab is mismatched to the underlying single-value model.
- React 19 + Radix + Tailwind for one form field is a lot of build pipeline
surface to maintain.
**Decision:** replace JSONJoy with a Bootstrap-only Blazor component
(`SchemaBuilder.razor`) that recurses through its own render methods.
Storage format unchanged — still JSON Schema. The migration, parser, and
downstream analysis code are untouched.
**Scope decisions (from refinement session):**
- Type set: only the six JSON Schema primitives
(`string · integer · number · boolean · object · array`). No `date-time` /
`format`, no `enum` / `pattern` / `min/max`, no `$ref` / `oneOf` /
`anyOf` / `allOf`, no `additionalProperties`. Power-user expansion can
come later behind a per-row "more options" toggle.
- No description support per property. The row stays a single horizontal
line: name + type + (items: type if array) + required + remove.
- Nested objects and arrays-of-objects recurse — same editor renders at any
depth.
**Files added:**
- `src/ScadaLink.CentralUI/Components/Shared/SchemaBuilderModel.cs`
in-memory `SchemaNode` / `SchemaProperty` tree plus pure-static
parse / serialize. Round-trips through the canonical JSON Schema text and
tolerates legacy flat-array shape as a parse fallback.
- `src/ScadaLink.CentralUI/Components/Shared/SchemaBuilder.razor`
recursive renderer driven by `Mode="object"` (parameter list) or
`Mode="value"` (single value, with object/array falling back to the
property editor).
- `tests/ScadaLink.CentralUI.Tests/Shared/SchemaBuilderModelTests.cs`
parse / serialize / round-trip / legacy-array coverage.
**Files removed:**
- `src/ScadaLink.CentralUI/Schema.Editor/` (Vite project, node_modules, etc.)
- `src/ScadaLink.CentralUI/wwwroot/lib/schema-editor/` (built bundle)
- `src/ScadaLink.CentralUI/Components/Shared/SchemaEditor.razor` (Blazor wrapper)
- `<script>` / `<link>` references to schema-editor in `App.razor`
- `<DefaultItemExcludes>Schema.Editor/**` from CentralUI csproj
**Forms updated:** `TemplateEdit.razor`, `SharedScriptForm.razor`,
`ApiMethodForm.razor` now use `<SchemaBuilder>` directly.
The original `jsonjoy-builder` integration sections above are kept for
historical context but no longer reflect what's in the codebase.
@@ -0,0 +1,184 @@
# Script scope access: self / child / parent
## Goal
Template scripts get an ergonomic read/write API for:
- The current template's attributes (`Attributes["X"]`).
- Child composition attributes (`Children["TempSensor"].Attributes["Temperature"]`).
- Child composition scripts (`Children["TempSensor"].CallScript("Sample")`).
- The parent composition (when this template is composed inside another):
`Parent.Attributes["SpeedRPM"]`, `Parent.CallScript("Trip")`.
Editor (Monaco) provides completion, hover, and diagnostics on all the above.
## What already exists
- Each `Template` has `Attributes`, `Compositions` (named sub-template references),
`Scripts`, `Alarms`.
- Flattening produces `ResolvedAttribute.CanonicalName` as the path-qualified name:
direct attrs are bare, composed attrs are `"InstanceName.MemberName"`.
- `InstanceActor` stores `_attributes[canonicalName]` — flat dict keyed by the
fully composed canonical name.
- `ScriptRuntimeContext.GetAttribute(name)` does a flat lookup. So
`GetAttribute("TempSensor.Temperature")` already works if the canonical name
is in the dict. **What's missing is scope-relative access** — a script on
`TempSensor` cannot say "my Temperature" without knowing it's composed under
some parent path.
- `ScriptRuntimeContext.CallScript(name)` Ask-pattern-routes to a Script Actor.
Cross-composition / parent routing is **not** implemented.
- The actor topology is one Instance Actor per top-level instance — composed
sub-templates are **flattened into the parent's actor state**, not separate
actors. This is good news: parent/child access is path arithmetic, not
ActorRef hopping.
## Runtime API (new)
Three accessors layered on `ScriptGlobals` (in addition to the existing
`Instance.*`, `Parameters`, `Scripts.CallShared`, etc.):
```csharp
Attributes["X"] // read; throws if missing
Attributes["X"] = value // write
Attributes.TryGet<T>("X", out v) // typed read with fallback
Children["TempSensor"].Attributes["Temperature"]
Children["TempSensor"].CallScript("Sample", new { count = 3 })
Parent.Attributes["SpeedRPM"] // null check: Parent is null at the root
Parent.CallScript("Trip")
```
Internally each is a thin wrapper holding a `ScopePath` (string) plus a
reference to `ScriptRuntimeContext`. The indexer / `CallScript` prepend the
scope path to the key and delegate to the existing `Instance.GetAttribute` /
`Instance.SetAttribute` / `Instance.CallScript`. No new actor messages, no
new lookup pathway.
`Children["X"]` returns a new accessor with prefix `SelfPath + "." + X`.
`Parent` returns an accessor with the parent prefix (`null` if no parent).
Chained child/parent navigation works naturally because each accessor is the
same type returning the same type.
## Compile-time scope injection
Every compiled script needs to know its own `ScopePath`. That's captured by
the flattening pipeline and passed into `ScriptGlobals` at execution time:
```csharp
public record ScriptScope(
string SelfPath, // "" for root, "TempSensor" for composed
string? ParentPath, // null if SelfPath == ""
IReadOnlyList<string> ChildInstanceNames);
```
`ResolvedScript` gains a `Scope: ScriptScope` field. The flattening service
already walks the composition tree to compute canonical names — extending it
to emit the scope per script is mechanical.
`ScriptCompilationService.Compile` reads the scope and seeds `ScriptGlobals.
Attributes`, `Children`, `Parent` before the script runs. No code-generation;
the accessors close over the scope path at construction time.
## Editor surface
The editor side carries the same metadata that the runtime gets:
- The current template's attribute set (names + types).
- Each composition: instance name → resolved child template's attribute set
AND script list. The form already loads compositions in `TemplateEdit`.
- The parent template's attribute and script lists, ONLY when the open
template is composed inside another. We surface this as `null` otherwise.
New completion contexts:
| In code | Suggests |
|---|---|
| `Attributes["X"]` | declared attribute names of current template |
| `Children["X"]` | composition instance names |
| `Children["X"].Attributes["Y"]` | attribute names of the resolved child template |
| `Children["X"].CallScript("Y"` | script names of the resolved child template |
| `Parent.Attributes["X"]` | parent template's attribute names |
| `Parent.CallScript("X"` | parent template's script names |
New diagnostics:
- **SCADA006**: unknown attribute name on the appropriate scope.
- **SCADA007**: unknown child composition name in `Children["X"]`.
Existing `Instance.GetAttribute("X")` / `Instance.CallScript("X")` keep working
unchanged. Editor support for those can fall out of the same metadata if we
want it.
## Hover + signature help
- Hover `Attributes["X"]``attribute X: <Type> on <TemplateName>`.
- Hover `Children["X"]``composition X: <ChildTemplateName>`.
- Signature help for `Children["X"].CallScript(...)` reuses the existing
shape pipeline once the child template's scripts are reachable as
`ScriptShape[]`.
## What needs to be passed from the form
`TemplateEdit` already loads the open template's attributes and scripts.
Two new pieces:
1. **Resolved child compositions**: for each `Composition` row, fetch the
composed template's `Attributes` and `Scripts`. The repository already
has `GetTemplateByIdAsync` — call it for each composition.
2. **Parent template (if any)**: query the repository for templates that
compose this one. If exactly one, pass its shape. If multiple or none,
pass `null` and emit `Parent.X` accesses as diagnostics-by-the-user (since
the parent context is ambiguous in design time — the runtime knows because
it's running inside one specific deployment, but the editor doesn't).
Edge case: a template composed into multiple parents has no single parent at
edit time. Acceptable behaviour: `Parent` autocompletion is suppressed; using
it still compiles but emits a warning at deploy time. Document this clearly.
## Phased rollout
1. **Runtime first**. Add `Attributes` / `Children` / `Parent` accessors.
Wire scope into `ResolvedScript` and the flattening pipeline. Test with
existing flat templates (Scope.SelfPath = ""). Verify a composed script's
`Attributes["Temperature"]` reads through correctly. Site-runtime tests.
2. **Flattening + deployment**. Verify the deployed artifact carries the new
`Scope` field through `ResolvedScript``FlattenedScript` → site-side.
Run a round-trip deploy + execute.
3. **Editor metadata**. `TemplateEdit` fetches child template shapes for each
composition and optionally the parent. New Monaco context fields.
4. **Editor completion + diagnostics**. New string-literal completion
contexts. SCADA006 / SCADA007 diagnostics on the Diagnose path. Hover.
Each phase is a separate commit and independently shippable.
## Out of scope
- Composing the same template multiple times under different names on the
same parent (already supported by the data model; the editor just lists
each composition).
- Sibling-of-sibling access (`Children["A"].Parent.Children["B"]`). The
accessor API supports it naturally but we don't actively suggest it.
- Locking-aware writes (`Attributes["X"] = v` when X is locked). The
attribute lock is enforced at deployment validation, not at script-write
time; runtime writes that hit a locked attribute should reject. Out of
scope for this design — covered by the existing lock-enforcement pass.
- A formal type for `Children` / `Parent` in the editor's Roslyn analysis
(the strict Roslyn route would auto-generate per-template accessor types).
We use a dictionary-style indexer for both runtime and editor, with editor
awareness coming from the metadata pipeline, not from per-template C#
types.
## Open questions
- Should `Attributes["X"]` throw or return `null` on unknown key? The
existing `GetAttribute` logs a warning and returns `null`. Same here for
consistency.
- ~~Async vs sync indexer?~~ **Decided: both.** Sync `Attributes["X"]` /
`Attributes["X"] = v` indexer for ergonomics, plus
`Attributes.GetAsync("X")` / `Attributes.SetAsync("X", v)` for callers
that want to be explicit about the actor Ask. The sync path internally
blocks on `.GetAwaiter().GetResult()` — acceptable because all script
bodies already run on a dedicated blocking-I/O dispatcher per the project
conventions in CLAUDE.md.
+554
View File
@@ -0,0 +1,554 @@
# ScadaLink Central UI — Design & UX Audit
**Date:** 2026-05-12
**Branch at audit time:** `feature/templates-folder-hierarchy` (after `Sites.razor` redesign, commit `0805e18`)
**Scope:** All Razor pages, layout, and shared components in `src/ScadaLink.CentralUI`.
**Reference pattern:** `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor` — 2-column responsive card grid, header flex row, kebab menus, search filter, Bootstrap collapse for noisy details, `@key=` on iterated cards, "No X match the filter." and empty-state CTAs.
## Constraints (recap)
- Blazor Server + Bootstrap 5 only. **No third-party component frameworks** (no MudBlazor / Radzen / Blazorise / Syncfusion).
- Clean, corporate, internal-use aesthetic. Not flashy.
- Form pages: vertical stacking; read-only fields first; subsections stacked; buttons at bottom.
- Accessibility: aria-labels on icon buttons; labels paired with inputs; semantic headings; never use color as the only state cue.
---
## Severity summary
| Severity | Count | Pages |
|---|---|---|
| **High** | 7 | LdapMappingForm · DataConnections (header/a11y) · SharedScripts · ExternalSystems · TemplateEdit · DebugView · EventLogs |
| **Medium** | 11 | LdapMappings · ApiKeys · DataConnections · DataConnectionForm · ApiKeyForm (partial) · Templates · Topology · Deployments · Dashboard · Health · ParkedMessages · AuditLog · MainLayout / NavMenu · ConfirmDialog · Toast · global CSS |
| **Low** | 7+ | Most form pages (TemplateCreate, ExternalSystemForm, SharedScriptForm, DbConnectionForm, ApiMethodForm, NotificationListForm) · Login error feedback · NotAuthorizedView · LoadingSpinner contrast · DataTable clear-button |
**Suggested implementation order** (high impact / low risk first):
1. **Shared shell fixes** (ConfirmDialog scroll-lock + Escape + default button color, Toast `aria-live` + custom delay, NavMenu scroll container, login vertical centering) — these unblock everything else and are mostly small.
2. **List-page pattern roll-out:** apply the Sites.razor card grid + search + kebab template to LdapMappings, ApiKeys, SharedScripts. These are mechanical.
3. **DebugView guardrails:** scroll-lock, max-row cap, `aria-live`, filter — this is high-severity and isolated.
4. **EventLogs:** message expand, pagination clarity, filter accessibility.
5. **ExternalSystems + TemplateEdit refactors** — biggest scope, leave for last because they need design discussion before implementation.
---
## Cross-cutting findings (apply to many pages)
These show up everywhere. Fix at the pattern level first, then tour every page once to apply:
1. **`<h4>` page title in a flex header.** Sites.razor sets the standard at line 16. Currently Templates (`<h6>`), Topology (`<h6>`), Dashboard (`<h3>`), and most form pages mix levels. Adopt `<h4 class="mb-0">` inside `d-flex justify-content-between align-items-center mb-3`.
2. **Search input above the list.** `max-width: 320px`, bound to `_search` with `@bind:event="oninput"`, plus the "No X match the filter." inline message. Missing on: LdapMappings, ApiKeys, SharedScripts, EventLogs, ParkedMessages (per-site only), AuditLog.
3. **Kebab (⋮) menu for less-frequent actions.** Edit stays as a primary button; Delete/Disable/Deploy move into the dropdown. Missing on: LdapMappings, ApiKeys, SharedScripts, TemplateEdit member rows, ParkedMessages.
4. **`@key="entity.Id"` on iterated rows / cards.** Prevents Bootstrap collapse state leaks (the bug caught in smoke on Sites). Apply anywhere `@foreach` renders elements with Bootstrap stateful classes (`show`, `collapsed`, `active`).
5. **State badges must not rely on color alone.** Add either icon + text or `aria-label="State: …"`. Affected: Health node Online/Offline, Topology Stale, Deployments row colors, DebugView Quality / Alarm State, AuditLog action badges.
6. **`TimestampDisplay` component consistency.** EventLogs / ParkedMessages / AuditLog use it; Health and DebugView format inline. Pick the component, give it a single rendering of "HH:mm:ss UTC" or relative+absolute, retrofit everywhere.
7. **Empty-state CTA when count is 0.** Sites.razor lines 53-60 are the template. Missing on: SharedScripts, Templates (tree), ExternalSystems tabs, ParkedMessages, AuditLog.
8. **`aria-label` on icon-only buttons** (`⋮`, `📋`, copy, expand/collapse). Almost universally missing today.
9. **Truncate-and-expand pattern.** AuditLog has the cleanest pattern (`View` toggle for state JSON). Apply to long message strings (EventLogs, ParkedMessages, Deployments errors) instead of mid-string CSS truncation.
---
## Admin section
### LdapMappings.razor — `/admin/ldap-mappings` — **Medium**
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/LdapMappings.razor`
**What it does:** Lists LDAP group → role mappings with inline Edit/Delete and Site Scope hints.
**Issues**
1. *Consistency:* Header (line 12) lacks the Sites flex layout + Bulk actions dropdown next to the primary Add button.
2. *Density:* 5-column table; "Site Scope Rules" cell jams multiple badges into a narrow column.
3. *Consistency:* No search filter. Sites uses one at lines 67-69.
4. *Consistency:* Edit + Delete rendered as twin buttons in the row; Sites uses kebab.
5. *Other:* "Site Scope Rules" preview in the row + the "(manage on edit page)" hint creates a confusing duality — the list page promises something it can't deliver.
**Recommendations**
1. Add header flex layout + search input.
2. Replace Edit/Delete pair with `Edit` button + `⋮` dropdown containing Delete.
3. Either drop the Site Scope column from the list entirely (show a `n rule(s)` badge instead) or expand it into a collapse panel on the row.
4. If keeping table layout, add `@key="m.Id"`.
---
### LdapMappingForm.razor — `/admin/ldap-mappings/create` and `/{Id}/edit` — **High**
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/LdapMappingForm.razor`
**What it does:** Create/edit a single mapping, plus a secondary panel for Site Scope Rules in edit mode.
**Issues**
1. *Form-layout:* Two distinct sub-forms on one page (mapping basics + scope rules) with no visual separation. Scope rules only become editable after Save, but the UI doesn't communicate that workflow.
2. *Hierarchy:* Both sections use `<h6>` inside `card-title`; no primary/secondary hierarchy.
3. *Form-layout:* Scope-rule entry uses a nested table inside the card; visually heavy.
4. *Accessibility:* Role `<select>` has no `aria-describedby` / help text explaining why "Deployment" surfaces the scope rules section.
**Recommendations**
1. Restructure: top card "Mapping" stacked vertically (Name, LDAP Group, Role, [Save]); below it, a card "Site Scope Rules" that's disabled-with-explanation in create mode and editable in edit mode.
2. Replace the nested scope-rule table with a tag-style chip list: each scope rule renders as a removable chip; an inline "Add scope rule" form sits below.
3. Add `form-text` under Role: "Deployment role: configure site scope below after saving."
---
### DataConnections.razor — `/admin/connections`**High** for header / a11y, **Medium** overall
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor`
**What it does:** Treeview of sites and their data connections with context menu CRUD.
**Issues**
1. *Hierarchy:* Page title is `<h6>` (line 24). Promote to `<h4>` with flex header to match Sites.
2. *Consistency:* Inline `btn-group` with Refresh / Expand / Collapse buttons next to search; visually busy. Sites uses Bulk actions dropdown + Add button only.
3. *Accessibility:* Tree node kebab toggles lack `aria-label="More actions for {name}"`.
4. *Other:* Right-click context menu has no visible hover affordance — easy to miss.
5. *Other:* When search returns no matches, the tree silently collapses; no empty-state message.
**Recommendations**
1. Promote heading, adopt flex header. Move Expand/Collapse into a Bulk actions dropdown; drop Refresh (navigation reload covers it).
2. Add visible kebab on tree-node hover so the context menu is discoverable.
3. Add `aria-label` to every kebab toggle (interpolate the node name).
4. Add "No connections match the filter." inline when search clears the tree.
---
### DataConnectionForm.razor — `/admin/connections/create` and `/{Id}/edit` — **Medium**
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor`
**What it does:** Create/edit a connection with primary + optional backup endpoint editors (OPC UA only today).
**Issues**
1. *Form-layout:* Site field is disabled in edit mode but rendered as a disabled `<select>` with no read-only styling cue beyond gray.
2. *Hierarchy:* "Backup endpoint" `<h6>` uses `border-bottom`; primary endpoint has no parallel heading. Hierarchy is one-sided.
3. *Density:* "Add Backup Endpoint" button buried inside the card with no signposting that backup is optional.
4. *Accessibility:* No `form-text` on Primary Endpoint / Site / failover knobs.
**Recommendations**
1. Use `<input class="form-control-plaintext" readonly>` for the Site field in edit mode and add a small explanatory line ("Site is locked after creation").
2. Mirror the heading pattern: both Primary and Backup get `<h6>` headers; Backup also gets a clear "Optional" badge.
3. Add `form-text` help under each tuning knob (PublishingIntervalMs, SamplingIntervalMs, FailoverRetryCount, etc.).
---
### ApiKeys.razor — `/admin/api-keys` — **Medium**
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/ApiKeys.razor`
**What it does:** Lists API keys with Edit / Disable-Enable / Delete actions; masked key value.
**Issues**
1. *Consistency:* No search filter.
2. *Density:* 5-column table; Status column is redundant with the Disable/Enable button.
3. *Consistency:* Three buttons in the Actions cell (Edit / Disable / Delete) — should be Edit + kebab.
4. *Other:* No `@key="k.Id"` on rows.
**Recommendations**
1. Add search filter and `@key`.
2. Drop the Status column; let the kebab item read "Disable" or "Enable" depending on state.
3. Either keep the table and adopt the kebab pattern, or move to the Sites card grid — for ~5 keys per environment the table is fine; for 50+ the card grid would scan better.
---
### ApiKeyForm.razor — `/admin/api-keys/create` and `/{Id}/edit` — **Low**
**File:** `src/ScadaLink.CentralUI/Components/Pages/Admin/ApiKeyForm.razor`
**What it does:** Create an API key (showing the secret once) or rename an existing one.
**Issues**
1. *Form-layout:* Header has conditional "Back to API Keys" vs "Back" text.
2. *Other:* Copy button on the one-shot secret reveal is wired to a comment / no-op.
3. *Density:* Form is one field but wrapped in card-inside-card.
**Recommendations**
1. Fixed header: `← Back · Add / Edit API Key`.
2. Implement the copy via `IJSRuntime` + `navigator.clipboard.writeText` (mirror Sites.razor's `CopyAsync`).
3. Remove redundant card nesting; render the input + buttons directly in `<div class="container-fluid mt-3">`.
---
## Design section
Files discovered:
```
Components/Pages/Design/Templates.razor @page /design/templates
Components/Pages/Design/TemplateCreate.razor @page /design/templates/create
Components/Pages/Design/TemplateEdit.razor @page /design/templates/{Id:int}
Components/Pages/Design/SharedScripts.razor @page /design/shared-scripts
Components/Pages/Design/SharedScriptForm.razor @page /design/shared-scripts/{create|edit}
Components/Pages/Design/ExternalSystems.razor @page /design/external-systems
Components/Pages/Design/ExternalSystemForm.razor @page /design/external-systems/{create|edit}
Components/Pages/Design/DbConnectionForm.razor @page /design/db-connections/{create|edit}
Components/Pages/Design/ApiMethodForm.razor @page /design/api-methods/{create|edit}
Components/Pages/Design/NotificationListForm.razor @page /design/notification-lists/{create|edit}
```
### Templates.razor — **Medium**
**What it does:** Folder-tree view of templates with context-menu CRUD.
**Issues**
1. *Hierarchy:* Page title is `<h6>` (line 53) — should be `<h4>` in flex header.
2. *Consistency:* `btn-group-sm` of outline buttons for Expand/Collapse — push these into a Bulk actions dropdown.
3. *Accessibility:* Context-menu buttons (lines 271-288) lack `aria-label`.
4. *Density:* Treeview height is hardcoded `calc(100vh - 160px)` with no scroll affordance.
5. *Other:* No breadcrumb when an edit page navigates away from the tree context.
**Recommendations**
1. Promote heading, adopt flex header pattern.
2. Move Expand/Collapse into the Bulk actions dropdown.
3. Add aria-labels on every context-menu button (interpolate node name).
4. Add a top breadcrumb on TemplateEdit so users know which folder they're editing inside.
---
### SharedScripts.razor — **High**
**What it does:** Table of shared scripts with name, code preview, parameters, returns.
**Issues**
1. *Consistency:* Table instead of card grid — and code preview is rendered as truncated monospace inline, which is unreadable beyond ~40 chars.
2. *Density:* 6 columns (ID, Name, Code preview, Parameters, Returns, Actions). ID is internal-only.
3. *Consistency:* No search, no empty-state CTA.
4. *Accessibility:* Truncated code preview has no `title=` tooltip.
**Recommendations**
1. Migrate to a card grid (col-lg-6) mirroring Sites: title = Name, body = small code snippet (first 80 chars) + parameter/return counts as chips, footer = Edit + ⋮ Delete.
2. Drop ID column entirely.
3. Add search by name + code substring.
4. Add "No shared scripts configured. Create your first script." CTA.
---
### ExternalSystems.razor — **High**
**What it does:** Tabbed hub for External Systems, DB Connections, Notification Lists, Inbound API Methods, SMTP Config, API Keys.
**Issues**
1. *Density:* Six subsections on one page with no search per tab; SMTP form crams 6+ inputs in one `row g-2 align-items-end` flex row.
2. *Consistency:* Tabs use mixed renderings — External Systems / DB / API Methods use tables; Notification Lists and SMTP use cards. Same-level data, inconsistent shape.
3. *Form-layout:* SMTP form violates the vertical-stacking rule.
4. *Hierarchy:* Subsection headings are `<h6>` with badge counts — heading level is too small.
5. *Accessibility:* Tab buttons lack `role="tab"` / `aria-selected`.
6. *Other:* No per-tab empty state.
**Recommendations**
1. Split SMTP off as a standalone `/admin/smtp` (it's a single-row global config, not list data).
2. Unify all tabs on the same card-grid pattern.
3. Reformat the remaining SMTP page to vertical-stacked fields per `feedback_form_layout`.
4. Add `role="tablist"` / `role="tab"` / `aria-selected` and `aria-controls` on the tab nav.
5. Add per-tab search + empty-state CTAs.
---
### TemplateEdit.razor — **High**
**What it does:** Edit a template's properties plus Attributes / Alarms / Scripts / Compositions in tabs.
**Issues**
1. *Density:* Template Properties card uses a 4-column row; Parent Template renders as `form-control-plaintext` next to live inputs, then a Save button at col-md-2. Save ends up mid-row instead of at the bottom.
2. *Form-layout:* "Add Attribute / Alarm / Script" inline forms use `row g-2 align-items-end` — the Scripts row stuffs 4 inputs + a textarea horizontally.
3. *Consistency:* Card headers inconsistent — some "card-title" h6 inside `card-body`, some bare h6 above a section.
4. *Hierarchy:* Validation result alerts mix strong-heading + bare `<li>` items.
5. *Accessibility:* Lock-state badges render as cryptic single letters "L"/"U" with no `aria-label`. Tabs lack `role="tab"` / `aria-selected`.
6. *Other:* Per-row Delete buttons scattered; many tables.
**Recommendations**
1. Reflow Template Properties to vertical-stack (col-12 each), put Save at the bottom following the form-layout rule.
2. Reformat add-forms into a card with stacked col-12 inputs; Scripts gets a full-width Monaco-ish textarea (rows≥10) below the metadata fields.
3. Replace L/U badges with full text + `aria-label`: `<span class="badge bg-light text-dark" aria-label="Unlocked">Unlocked</span>`.
4. Per-row kebab menu replacing Delete (with future Duplicate / Move options).
5. Add `role`/`aria-selected` to all tab buttons.
---
### TemplateCreate.razor — **Low**
1. Use `form-control` not `form-control-sm` for the primary Name field.
2. Replace the `&larr;` arrow on the Back button with text `← Back` and add `aria-label="Back to Templates"`.
---
### ExternalSystemForm.razor — **Low**
1. Auth Config field: add a JSON example placeholder matching the chosen AuthType.
---
### SharedScriptForm.razor — **Low**
1. Add a small `bi-question-circle` icon next to Parameters / Return Definition linking to a tooltip with schema reference.
2. When syntax check fails, surface line/column position in the error message.
---
### DbConnectionForm.razor — **Low**
1. Add reassurance text under Connection String: "Stored encrypted; not displayed after save." (only if the back end actually does this; otherwise drop the claim.)
---
### ApiMethodForm.razor — **Low**
1. Script textarea bumped from rows=5 to rows≥10.
2. Add JSON example placeholders for Params and Returns.
---
### NotificationListForm.razor — **Low**
1. Resize the Name input to `form-control` (not `form-control-sm`).
2. Recipients `<thead class="table-dark">``table-light` for consistency.
---
## Deployment section
Files discovered:
```
Components/Pages/Deployment/Topology.razor @page /deployment/topology (and /deployment/instances)
Components/Pages/Deployment/Deployments.razor @page /deployment/deployments
Components/Pages/Deployment/DebugView.razor @page /deployment/debug-view
(+ InstanceCreate, InstanceConfigure, CreateAreaDialog, MoveAreaDialog, MoveInstanceDialog)
```
### Topology.razor — **Medium**
1. *Hierarchy:* `<h6>` page title (line 63) — promote to `<h4>` in flex header.
2. *Accessibility:* Expand / Collapse / Refresh / Search / tree-kebab buttons all lack `aria-label`. Inline rename input has no label.
3. *Live-data UX:* No "pause live updates" toggle; tree can repaint while user is renaming or moving a node.
4. *Density:* Instance counts footer text — could be a summary card above the tree.
5. *State cues:* Stale badge is yellow-only; pair with text or icon.
6. *Consistency:* Diff modal is hand-rolled Bootstrap modal markup — should be a reusable `<DiffDialog>` mirroring `<ConfirmDialog>`.
**Recommendations**
1. Promote heading, adopt flex header.
2. Add aria-labels everywhere (treat the kebab and rename input as the priority).
3. Add a "Live updates: on/off" toggle button next to Refresh; pause auto-refresh during edits.
4. Move counts to a small summary card above the tree.
5. Pair Stale badge with `aria-label="State: Stale"` and a 🟡 dot or "STALE" text.
6. Extract `<DiffDialog>` into `Components/Shared/`.
---
### Deployments.razor — **Medium**
1. *Density:* 8 columns (Deployment ID, Instance, Status, Deployed By, Started, Completed, Revision, Error). Both Deployment ID and Revision are truncated hashes; Error can be a stack trace.
2. *Live-data UX:* Auto-refresh runs every 10s with no pause control — if a user is reading an error message, the row can swap underneath them.
3. *Consistency:* Summary cards use `col-md-3` only (no `col-sm-6` fallback for tablet); cards are styled differently from Sites.
4. *Accessibility:* Spinner inside the status badge has no `role="status"` / `aria-label`. "Auto-refresh: 10s" text is decorative, not a control.
5. *State cues:* Row colors (`table-danger`, `table-info`) without an icon or stripe.
6. *Other:* Empty state is a single line of text.
**Recommendations**
1. Collapse Error column into a `View error` button that pops a `<DiffDialog>`-style modal (or inline collapse row).
2. Add `Live updates: 10s [pause]` toggle.
3. Make summary cards `col-lg-3 col-md-6 col-12`.
4. Add aria-labels on the spinner and the toggle.
5. Add `border-start border-3 border-danger` or icon to failed rows.
6. Either fold Deployment ID + Revision into one cell or hide one behind the detail modal.
---
### DebugView.razor — **High**
1. *Live-data UX:* No scroll-lock on the streaming tables. Auto-scroll behavior is implicit. No max-row cap → tab can balloon in memory.
2. *Live-data UX:* Timestamps shown to milliseconds; noisy at sustained update rates.
3. *Live-data UX:* No stream filter (e.g., "only alarms with state=Active") — once subscribed, you watch everything.
4. *Accessibility:* Quality / Alarm State badges are color-only. No `aria-live="polite"` on the streaming table bodies.
5. *Consistency:* "Snapshot received at …" is a tiny muted footer; should be a header-level status strip.
6. *UX risk:* Page persists session in `localStorage` and auto-reconnects on refresh, with no user-visible notice.
**Recommendations**
1. Add per-table `🔒 Lock scroll` toggle.
2. Cap rows at e.g. 200; add a `Clear` button.
3. Add per-table filter input.
4. Display timestamps as `HH:mm:ss` by default; `.fff` only inside an "Expanded row" view.
5. Add `aria-live="polite" aria-atomic="false"` on the streaming table bodies.
6. Pair every Quality and Alarm State badge with `aria-label`.
7. Replace the snapshot footer with a status strip: instance · connection state · last snapshot time.
8. On auto-reconnect, toast "Auto-reconnected to {instance}" with a `Start fresh` button.
---
## Monitoring section + Dashboard
Files discovered:
```
Components/Pages/Dashboard.razor @page /
Components/Pages/Monitoring/Health.razor @page /monitoring/health
Components/Pages/Monitoring/EventLogs.razor @page /monitoring/event-logs
Components/Pages/Monitoring/ParkedMessages.razor @page /monitoring/parked-messages
Components/Pages/Monitoring/AuditLog.razor @page /monitoring/audit-log
```
### Dashboard.razor — **Medium**
1. *Dashboard UX:* It is currently just a user-info card. For a central SCADA console the landing page should show system KPIs first (sites online/offline, errors, queue depths, parked-message count) — the things you'd want to see in <5 seconds.
2. *Hierarchy:* `<h3>` heading; rest of the site is `<h4>`.
3. *Consistency:* Inline `style="max-width:500px"` instead of Bootstrap utilities.
**Recommendations**
1. Repurpose as a "Glance" page: KPI cards across the top (Sites, Errors, Parked Messages, Latest deployments status), a sites-by-health small list, recent audit events.
2. Move the user-info card to a secondary panel or drop it (it's already in the top-right of the layout).
3. `<h3>``<h4>` for site-wide consistency, replace inline styles with utility classes.
---
### Health.razor — **Medium**
1. *KPI choices:* Sites Online + Sites Offline + Total Sites is redundant; Total Script Errors is global and not actionable. Promote "Sites with active errors" / "Cluster degraded" instead.
2. *Hierarchy:* Header is `<h4>` left-aligned with no flex header; doesn't match Sites.
3. *Density:* Per-site cards use a 4-column inner grid that breaks on narrow viewports.
4. *Time format:* `HH:mm:ss` only, no timezone, no relative.
5. *State cues:* Online/Offline / Primary/Standby badges are color-only.
**Recommendations**
1. Replace "Total Sites" KPI with "Sites with active errors" or "Cluster health %".
2. Adopt flex header layout.
3. Reduce per-site card to 2 columns (col-md-6) or wrap each subsection in a collapse à la Sites.razor "Cluster nodes".
4. Use `TimestampDisplay` with UTC suffix; consider adding a relative time hint ("3 minutes ago").
5. Add `aria-label` and an icon to every Online/Offline/Primary/Standby badge.
---
### EventLogs.razor — **High**
1. *Density:* "Message" column truncates long error strings mid-string with no expand.
2. *Pagination:* "Load more" + continuation token, no total count shown.
3. *Filter affordance:* 7 filter inputs in one row; "Keyword" label is vague.
4. *Accessibility:* Labels are not linked to inputs via `for`/`id`; row colors are the primary severity cue.
5. *Time:* Uses `<TimestampDisplay>` — confirm it standardises with the other log pages.
**Recommendations**
1. Apply AuditLog's `View` / `Hide` toggle pattern for the Message cell.
2. Switch to numeric pagination ("Page X of Y, N total") or surface a total count next to the Load More button.
3. Move the filter row into a Bootstrap collapse with label `Filter options (n active)`.
4. Add `id`/`for` pairings, `aria-label`s, and pair the row color with an icon stripe.
5. Standardise on `TimestampDisplay` across all log pages.
---
### ParkedMessages.razor — **Medium**
1. *Density:* Message ID is truncated to 12 chars with no copy or expand affordance.
2. *Density:* Error message field can be long; no expand.
3. *Accessibility:* Retry / Discard buttons have `title=` only, no `aria-label`.
4. *State:* No spinner / disabled affordance while a Retry is in flight.
**Recommendations**
1. Render Message ID as a `<code>` with a `📋 Copy` button or expand row showing the full ID + error.
2. Apply AuditLog's expand toggle for error messages.
3. Add `aria-label="Retry message {id}"` and `aria-label="Discard message {id}"`.
4. Replace each action button's normal/disabled state with a small spinner during the action.
---
### AuditLog.razor — **Medium**
1. *Pagination bug:* `Next` is disabled when `_entries.Count < _pageSize`; this misfires when the last page has exactly `_pageSize` rows (will show enabled Next that returns empty).
2. *Filter affordance:* 5 filter inputs in one row; no `Clear filters` button.
3. *Density:* Entity ID is a full GUID with no copy / expand.
4. *State expansion:* JSON detail has `max-height: 200px` with no "expand to full size" affordance.
5. *Accessibility:* `View`/`Hide` button has no `aria-label`.
**Recommendations**
1. Fix the pagination logic: rely on a "has more" flag from the API, not a length compare.
2. Add a `Clear filters` button next to the filter row.
3. Add a copy button or expand-on-click for Entity ID.
4. Make the JSON detail panel resizable, or open in a `<DiffDialog>`-style modal when content exceeds 1 KB.
5. Add `aria-label` to the toggle (interpolate entry id).
---
## Layout, shared components, global CSS
### MainLayout.razor / NavMenu.razor / App.razor
**Issues**
1. *Responsive:* Sidebar is fixed `min-width: 220px / max-width: 220px` in `App.razor` lines 13-14. No `d-none d-lg-flex` or hamburger toggle for narrow viewports. **High.**
2. *Scrolling:* `<ul class="nav flex-column flex-grow-1">` has no overflow boundary. If role-driven nav becomes long enough, the footer (username + Sign Out) will scroll off-screen. **Medium.**
3. *Semantics:* Section headers (Admin, Design, …) render as bare `<li class="nav-section-header">` — not focusable / not semantic. **Medium.**
4. *Active state:* Active blue (#0d6efd) and hover gray (#343a40) are similar enough to confuse — pair active with a left border or underline. **Low.**
**Recommendations**
1. Wrap the sidebar in `d-none d-lg-flex` + add a hamburger button in the top bar for `<lg` viewports. Replace fixed widths with `flex-basis: 220px` and let it collapse off-canvas on mobile.
2. Wrap `<ul>` in `<div style="overflow-y:auto; flex:1 1 auto;">` so the footer is always anchored.
3. Convert section headers to `<li role="presentation"><span class="nav-section-header">Admin</span></li>` or just `<div role="separator" aria-label="Admin section">`.
4. Add `border-left: 3px solid var(--bs-primary)` to `.nav-link.active`.
---
### Login.razor — **Medium** / **Low**
**Issues**
1. *Centering:* `margin-top: 10vh;` on the container — on short viewports the card pushes below the fold. **Medium.**
2. *Validation:* No client-side validation feedback for empty fields; only server-side via `?error=` query param. **Low.**
**Recommendations**
1. Wrap in `<div class="d-flex align-items-center justify-content-center min-vh-100">` for true vertical centering.
2. Add HTML5 `required` and `:invalid` styling; keep the server-side error banner for actual auth failures.
---
### NotAuthorizedView.razor — **Low**
1. Wrap in the same centered layout as Login, with the "ScadaLink" brand heading on top — currently feels orphaned.
---
### ToastNotification.razor — **Medium**
**Issues**
1. *z-index:* Toasts are at `z-index: 1090`; Bootstrap modal backdrop defaults to 1040 and the modal element itself to 1055. Currently OK, but ConfirmDialog markup doesn't set explicit z-index on the modal element — document the hierarchy or set explicit values.
2. *Auto-dismiss:* Hardcoded 5 s. No way to extend for important messages.
3. *Accessibility:* `role="alert"` is set but `aria-live="polite"` / `aria-atomic="true"` are missing.
**Recommendations**
1. Document the z-index ladder in a comment at the top of the component; set explicit z-index in `ConfirmDialog` too.
2. Add `[Parameter] public int AutoDismissMs { get; set; } = 6000;`.
3. Add `aria-live="polite" aria-atomic="true"` to the container.
---
### ConfirmDialog.razor — **High** / **Medium**
**Issues**
1. *Scroll:* Backdrop doesn't add `overflow: hidden` to `<body>` — the page behind scrolls under the dialog. **High.**
2. *Keyboard:* No `Escape`-to-close handler. No focus trap. **Medium.**
3. *Defaults:* `ConfirmButtonClass` defaults to `btn-danger` — wrong for non-destructive confirms. **Medium.**
**Recommendations**
1. On `ShowAsync`, JS-interop add `overflow:hidden` to `body`; remove on close.
2. Add `@onkeydown="..."` for Escape → Cancel; on show, focus the cancel button (or the safer button) and on close return focus to the trigger.
3. Default `ConfirmButtonClass` to `btn-primary`; explicit `btn-danger` on destructive call sites only.
---
### LoadingSpinner.razor — **Low**
1. `text-muted` on a light background may not meet 4.5:1. Switch to `text-secondary`.
---
### DataTable.razor — **Low**
1. Search input has no clear (✕) button.
2. Pagination disabled state is on the parent `<li>` not the button — apply `disabled` directly + `aria-disabled="true"`.
---
### NewFolderDialog.razor — **Low**
1. Uses combined modal + inline background style instead of a separate `<div class="modal-backdrop fade show">` like ConfirmDialog. Refactor to match.
---
### TreeView.razor / TreeView.razor.css
1. Reliance on `var(--bs-*)` is good; no change.
2. Same a11y caveats as Topology — hover/focus visuals must reach kebab toggles.
---
### Global CSS — **Medium**
**Issues**
1. *Inline:* ~60 lines of `<style>` are inline in `App.razor` instead of in a `wwwroot/css/site.css` file.
2. *Theming:* Sidebar uses hardcoded hex colors (#212529, #343a40, #adb5bd, #fff); blocks any future light-mode / brand variation work.
3. *Reconnect modal:* Uses ad-hoc flex centering; could just be `.modal-dialog-centered`.
**Recommendations**
1. Move inline styles to `wwwroot/css/site.css` and link in `App.razor`.
2. Replace hex with `var(--bs-dark)` / `var(--bs-light)` etc.
3. Use Bootstrap's `.modal-dialog-centered` for the reconnect overlay.
---
## Cross-cutting strategic recommendations
These are bigger investments that pay back across many pages:
1. **Dialog/Modal service.** A single `IDialogService` that owns z-index stacking, body scroll lock, focus trap, Escape-to-close. Replace per-component ad-hoc backdrops. Fixes ConfirmDialog scroll-lock, focus-trap, and z-index collisions in one stroke; also unblocks the planned `<DiffDialog>` for Topology and the error-detail modal for Deployments.
2. **Accessibility pass.** Adopt a single rule: every icon-only button has `aria-label`; every state badge is colour + text + icon; every form input has linked label and optional `aria-describedby`. Most pages need ~5 minutes of edits to comply.
3. **Design tokens via CSS variables.** Pull the sidebar palette + the few custom colors into `:root` custom properties. Adopt Bootstrap's CSS variables (`--bs-*`) for everything else. Unblocks light/dark mode and any future rebrand.
4. **Pagination + filter component.** EventLogs / ParkedMessages / AuditLog / Deployments all roll their own. Extract one `<PagedTable TItem>` or at least a `<Paginator>` that takes (page, pageSize, total) and emits standard events.
5. **`TimestampDisplay` audit.** Make sure every consumer goes through it; standardise on UTC display + tooltip with relative time. Eliminate inline `.ToString("HH:mm:ss")` calls.
6. **One reference page for list patterns.** Use `Sites.razor` as the reference; add a comment at the top of it pointing future implementers at it (or extract its skeleton into a snippet under `docs/`).
---
## Out of scope / decisions to defer
- Whether to migrate any list page from table-only to card grid (most should, but each is a separate ticket).
- Dark-mode / theming work.
- A real dashboard (KPI page) replacement.
- Replacing the SignalR debug-view streaming model.
+8 -8
View File
@@ -13,8 +13,8 @@ This document defines the phased implementation strategy for the ScadaLink SCADA
1. **Each phase produces a testable, working increment** — no phase ends with unverifiable work.
2. **Dependencies are respected** — no component is built before its dependencies.
3. **Requirements traceability at bullet level** — every individual requirement (each bullet point, sub-bullet, and constraint) in HighLevelReqs.md must map to at least one work package. Section-level mapping is insufficient — a section like "4.4 Script Capabilities" contains ~8 distinct requirements that may land in different phases. See `docs/plans/requirements-traceability.md` for the matrix.
4. **Design decision traceability** — the Key Design Decisions in CLAUDE.md and detailed design in Component-*.md documents contain implementation constraints not present in HighLevelReqs.md (e.g., Become/Stash pattern, staggered startup, Tell vs Ask conventions, forbidden script APIs). Each must trace to a work package.
3. **Requirements traceability at bullet level** — every individual requirement (each bullet point, sub-bullet, and constraint) in docs/requirements/HighLevelReqs.md must map to at least one work package. Section-level mapping is insufficient — a section like "4.4 Script Capabilities" contains ~8 distinct requirements that may land in different phases. See `docs/plans/requirements-traceability.md` for the matrix.
4. **Design decision traceability** — the Key Design Decisions in CLAUDE.md and detailed design in docs/requirements/Component-*.md documents contain implementation constraints not present in docs/requirements/HighLevelReqs.md (e.g., Become/Stash pattern, staggered startup, Tell vs Ask conventions, forbidden script APIs). Each must trace to a work package.
5. **Split-section completeness** — when a HighLevelReqs section spans multiple phases, each phase's plan must explicitly list which bullets from that section it covers. The union across all phases must be the complete section with no gaps.
6. **Questions are tracked, not blocking** — any ambiguity discovered during plan generation is logged in `docs/plans/questions.md` and generation continues. Do not stop or wait for user input during plan generation.
7. **Codex MCP is best-effort** — if the Codex MCP tool is unavailable or errors during verification, note the skip in the plan document and continue. Do not block on external tool availability.
@@ -445,8 +445,8 @@ For each phase, the implementation plan document must contain:
1. **Scope** — Components and features included
2. **Prerequisites** — Which phases/components must be complete
3. **Requirements Checklist** — A bullet-level checklist extracted from HighLevelReqs.md for every section this phase covers (see Bullet-Level Extraction below). Each bullet is a checkbox that must map to a work package.
4. **Design Constraints Checklist** — Applicable constraints from CLAUDE.md Key Design Decisions and Component-*.md documents, each mapped to a work package.
3. **Requirements Checklist** — A bullet-level checklist extracted from docs/requirements/HighLevelReqs.md for every section this phase covers (see Bullet-Level Extraction below). Each bullet is a checkbox that must map to a work package.
4. **Design Constraints Checklist** — Applicable constraints from CLAUDE.md Key Design Decisions and docs/requirements/Component-*.md documents, each mapped to a work package.
5. **Work Packages** — Numbered tasks with:
- Description
- Acceptance criteria (must cover every checklist bullet mapped to this work package)
@@ -489,8 +489,8 @@ These are mapped to work packages and verified in acceptance criteria just like
### Generation Steps
1. Read the phase definition in this document
2. Read all referenced Component-*.md documents
3. Read referenced HighLevelReqs.md sections **line by line** — extract every bullet, sub-bullet, and constraint as a numbered requirement
2. Read all referenced docs/requirements/Component-*.md documents
3. Read referenced docs/requirements/HighLevelReqs.md sections **line by line** — extract every bullet, sub-bullet, and constraint as a numbered requirement
4. Read CLAUDE.md Key Design Decisions — extract constraints relevant to this phase's components
5. Build the Requirements Checklist and Design Constraints Checklist
6. Break sub-tasks into concrete work packages with acceptance criteria, mapping every checklist item
@@ -516,8 +516,8 @@ After the orphan check passes, submit the plan to the Codex MCP tool (model: `gp
**Step 1 — Requirements coverage review**: Submit the following as a single Codex prompt:
- The complete phase plan document
- The full text of every HighLevelReqs.md section this phase covers
- The full text of every Component-*.md document referenced by this phase
- The full text of every docs/requirements/HighLevelReqs.md section this phase covers
- The full text of every docs/requirements/Component-*.md document referenced by this phase
- The relevant Key Design Decisions from CLAUDE.md
Ask Codex: *"Review this implementation plan against the provided requirements, component designs, and design constraints. Identify: (1) any requirement bullet, sub-bullet, or constraint from the source documents that is not covered by a work package or acceptance criterion in the plan, (2) any acceptance criterion that does not actually verify its linked requirement, (3) any contradictions between the plan and the source documents. List each finding with the specific source text and what is missing or wrong."*
+967
View File
@@ -0,0 +1,967 @@
# gRPC Streaming Channel: Site → Central Real-Time Data
## Context
Debug streaming events currently flow through Akka.NET ClusterClient (`InstanceActor → SiteCommunicationActor → ClusterClient.Send → CentralCommunicationActor → bridge actor`). ClusterClient wasn't built for high-throughput value streaming — it's a cluster coordination tool with gossip-based routing. As we scale beyond debug view to health streaming, alarm feeds, or future live dashboards, pushing all real-time data through ClusterClient will become a bottleneck.
**Goal**: Add a dedicated gRPC server-streaming channel on each site node. Central subscribes to sites over gRPC for real-time data. ClusterClient continues to handle command/control (subscribe, unsubscribe, deploy, lifecycle) but all streaming values flow through the gRPC channel.
**Scope**: General-purpose site→central streaming transport. Debug view is the first consumer, but the proto and server are designed so future features (health streaming, alarm feeds, live dashboards) can subscribe with different event types and filters.
## Why gRPC Streaming Instead of ClusterClient
| Concern | ClusterClient | gRPC Server Streaming |
|---------|---------------|----------------------|
| **Purpose** | Cluster coordination, service discovery, request/response | High-throughput data streaming |
| **Sender preservation** | Temporary proxy ref — breaks for stored future Tells | N/A — callback-based, no actor refs cross boundary |
| **Flow control** | None (fire-and-forget Tell) | HTTP/2 flow control + Channel backpressure |
| **Scalability** | Gossip-based routing, single receptionist | Direct TCP/HTTP2 per-site, multiplexed streams |
| **Reconnection** | ClusterClient auto-reconnect (coarse, cluster-level) | gRPC channel-level reconnect per subscription |
| **Serialization** | Akka.NET Hyperion (runtime IL, fragile across versions) | Protocol Buffers (schema-driven, cross-platform) |
gRPC server-streaming is an established pattern for real-time tag value updates; this plan applies the same pattern to site→central communication.
## Architecture
```
Central Cluster Site Cluster
───────────── ────────────
DebugStreamBridgeActor InstanceActor
│ │
│── SubscribeDebugView ──► │ (ClusterClient: command/control)
│◄── DebugViewSnapshot ── │
│ │
│ │ publishes AttributeValueChanged
│ │ publishes AlarmStateChanged
│ ▼
SiteStreamGrpcClient ◄──── gRPC stream ───── SiteStreamGrpcServer
(per-site, on central) (HTTP/2) (Kestrel, on site)
│ │
│ reads from gRPC stream │ receives from SiteStreamManager
│ routes by correlationId │ filters by instance name
▼ │
DebugStreamBridgeActor │
│ │
▼ │
SignalR Hub / Blazor UI │
```
**Key separation**: ClusterClient handles subscribe/unsubscribe/snapshot (request-response). gRPC handles the ongoing value stream (server-streaming).
## Port & Address Configuration
### Site-Side (appsettings)
`ScadaLink:Node:GrpcPort` — explicit config setting, not derived from `RemotingPort`:
```json
"Node": {
"Role": "Site",
"NodeHostname": "scadalink-site-a-a",
"RemotingPort": 8082,
"GrpcPort": 8083
}
```
**Why explicit, not offset**: `RemotingPort` is itself a config value (8081 central, 8082 sites). A rigid offset silently breaks if someone changes `RemotingPort` to a non-standard value. Explicit ports are visible and independently configurable.
Add `GrpcPort` to `NodeOptions` (`src/ScadaLink.Host/NodeOptions.cs`):
```csharp
public int GrpcPort { get; set; } = 8083;
```
Add validation in `StartupValidator` (site role only — central doesn't host a gRPC streaming server).
### Central-Side (Database — Site Entity)
Central needs to know each site node's gRPC endpoint. Add two fields to the `Site` entity:
**Modify**: `src/ScadaLink.Commons/Entities/Sites/Site.cs`
```csharp
public class Site
{
public int Id { get; set; }
public string Name { get; set; }
public string SiteIdentifier { get; set; }
public string? Description { get; set; }
public string? NodeAAddress { get; set; } // Akka: "akka.tcp://scadalink@host:8082"
public string? NodeBAddress { get; set; } // Akka: "akka.tcp://scadalink@host:8082"
public string? GrpcNodeAAddress { get; set; } // gRPC: "http://host:8083"
public string? GrpcNodeBAddress { get; set; } // gRPC: "http://host:8083"
}
```
### Database Migration
Add `GrpcNodeAAddress` and `GrpcNodeBAddress` nullable string columns to the `Sites` table. Existing sites get `NULL` (gRPC streaming unavailable until configured).
### Management Commands
**Modify**: `src/ScadaLink.Commons/Messages/Management/SiteCommands.cs`
```csharp
public record CreateSiteCommand(
string Name, string SiteIdentifier, string? Description,
string? NodeAAddress = null, string? NodeBAddress = null,
string? GrpcNodeAAddress = null, string? GrpcNodeBAddress = null);
public record UpdateSiteCommand(
int SiteId, string Name, string? Description,
string? NodeAAddress = null, string? NodeBAddress = null,
string? GrpcNodeAAddress = null, string? GrpcNodeBAddress = null);
```
### ManagementActor Handlers
**Modify**: `src/ScadaLink.ManagementService/ManagementActor.cs`
Update `HandleCreateSite` and `HandleUpdateSite` to pass gRPC addresses through to the repository.
### CLI
**Modify**: `src/ScadaLink.CLI/Commands/SiteCommands.cs`
Add `--grpc-node-a-address` and `--grpc-node-b-address` options to `site create` and `site update`:
```sh
scadalink site create --name "Site A" --identifier site-a \
--node-a-address "akka.tcp://scadalink@site-a-a:8082" \
--node-b-address "akka.tcp://scadalink@site-a-b:8082" \
--grpc-node-a-address "http://site-a-a:8083" \
--grpc-node-b-address "http://site-a-b:8083"
```
### Central UI
**Modify**: `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor`
Add two form fields below the existing Node A / Node B address inputs in the site create/edit form:
```html
<label class="form-label small">gRPC Node A Address</label>
<input type="text" class="form-control form-control-sm" @bind="_formGrpcNodeAAddress"
placeholder="http://host:8083" />
<label class="form-label small">gRPC Node B Address</label>
<input type="text" class="form-control form-control-sm" @bind="_formGrpcNodeBAddress"
placeholder="http://host:8083" />
```
Add corresponding columns to the sites list table. Wire `_formGrpcNodeAAddress` / `_formGrpcNodeBAddress` into `CreateSiteCommand` / `UpdateSiteCommand` in the save handler.
### SiteStreamGrpcClientFactory
Reads `GrpcNodeAAddress` / `GrpcNodeBAddress` from the `Site` entity (loaded by `CentralCommunicationActor.LoadSiteAddressesFromDb()`) when creating per-site gRPC channels. Falls back to NodeB if NodeA connection fails (same pattern as ClusterClient dual-contact-point failover).
### Docker Compose Port Allocation
**Modify**: `docker/docker-compose.yml`
Expose gRPC ports for each site node (internal 8083):
- Site-A: `9023:8083` / `9024:8083` (nodes A/B)
- Site-B: `9033:8083` / `9034:8083`
- Site-C: `9043:8083` / `9044:8083`
### Files Affected by Port & Address Configuration
| File | Change |
|------|--------|
| `src/ScadaLink.Host/NodeOptions.cs` | Add `GrpcPort` property |
| `src/ScadaLink.Host/StartupValidator.cs` | Validate `GrpcPort` for site role |
| `src/ScadaLink.Host/appsettings.Site.json` | Add `GrpcPort: 8083` |
| `src/ScadaLink.Commons/Entities/Sites/Site.cs` | Add `GrpcNodeAAddress`, `GrpcNodeBAddress` |
| `src/ScadaLink.Commons/Messages/Management/SiteCommands.cs` | Add gRPC address params |
| `src/ScadaLink.ConfigurationDatabase/` | EF migration for new columns |
| `src/ScadaLink.ManagementService/ManagementActor.cs` | Pass gRPC addresses in handlers |
| `src/ScadaLink.CLI/Commands/SiteCommands.cs` | Add `--grpc-node-a-address` / `--grpc-node-b-address` |
| `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor` | Add gRPC address form fields + table columns |
| `docker/docker-compose.yml` | Expose gRPC ports |
## Proto Definition
**File**: `src/ScadaLink.Communication/Protos/sitestream.proto`
The `oneof event` pattern is extensible — future event types (health metrics, connection state changes, etc.) are added as new fields without breaking existing consumers.
```protobuf
syntax = "proto3";
option csharp_namespace = "ScadaLink.Communication.Grpc";
package sitestream;
service SiteStreamService {
// Subscribe to real-time events filtered by instance.
// Server streams events until the client cancels or the site shuts down.
rpc SubscribeInstance(InstanceStreamRequest) returns (stream SiteStreamEvent);
}
message InstanceStreamRequest {
string correlation_id = 1;
string instance_unique_name = 2;
}
message SiteStreamEvent {
string correlation_id = 1;
oneof event {
AttributeValueUpdate attribute_changed = 2;
AlarmStateUpdate alarm_changed = 3;
// Future: HealthMetricUpdate health_metric = 4;
// Future: ConnectionStateUpdate connection_state = 5;
}
}
message AttributeValueUpdate {
string instance_unique_name = 1;
string attribute_path = 2;
string attribute_name = 3;
string value = 4; // string-encoded
string quality = 5; // "Good", "Uncertain", "Bad"
int64 timestamp_utc_ticks = 6;
}
message AlarmStateUpdate {
string instance_unique_name = 1;
string alarm_name = 2;
int32 state = 3; // 0=Normal, 1=Active (maps to AlarmState enum)
int32 priority = 4;
int64 timestamp_utc_ticks = 5;
}
```
Pre-generate C# stubs and check into `src/ScadaLink.Communication/SiteStreamGrpc/` — no `protoc` in Docker for ARM64 compatibility.
## Server-Streaming Pattern (Site Side)
### gRPC Server Implementation
`SiteStreamGrpcServer` inherits from `SiteStreamService.SiteStreamServiceBase`:
```csharp
public override async Task SubscribeInstance(
InstanceStreamRequest request,
IServerStreamWriter<SiteStreamEvent> responseStream,
ServerCallContext context)
{
var channel = Channel.CreateBounded<SiteStreamEvent>(
new BoundedChannelOptions(1000) { FullMode = BoundedChannelFullMode.DropOldest });
// Local actor subscribes to SiteStreamManager, writes to channel
var relayActor = actorSystem.ActorOf(
Props.Create(() => new StreamRelayActor(request, channel.Writer)));
streamManager.Subscribe(request.InstanceUniqueName, relayActor);
try
{
await foreach (var evt in channel.Reader.ReadAllAsync(context.CancellationToken))
{
await responseStream.WriteAsync(evt, context.CancellationToken);
}
}
finally
{
streamManager.RemoveSubscriber(relayActor);
actorSystem.Stop(relayActor);
}
}
```
### Channel\<T\> Bridging Pattern
`IServerStreamWriter<T>` is **not thread-safe**. Multiple Akka actors may publish events concurrently. The `Channel<SiteStreamEvent>` bridges these worlds:
```
Akka Actor Thread(s) gRPC Response Stream
│ ▲
│ channel.Writer.TryWrite(evt) │ await responseStream.WriteAsync(evt)
▼ │
┌─────────────────────────────────────────┐
│ Channel<SiteStreamEvent> │
│ BoundedChannelOptions(1000) │
│ FullMode = DropOldest │
└─────────────────────────────────────────┘
```
- **Bounded capacity** (1000): prevents unbounded memory growth if the gRPC client is slow
- **DropOldest**: matches the existing `SiteStreamManager` overflow strategy
- **ReadAllAsync**: yields items as they arrive, naturally async
### Kestrel HTTP/2 Setup
Site hosts switch from `Host.CreateDefaultBuilder()` to `WebApplicationBuilder` with Kestrel configured for a dedicated gRPC port:
```csharp
builder.WebHost.ConfigureKestrel(options =>
{
options.ListenAnyIP(grpcPort, listenOptions =>
{
listenOptions.Protocols = HttpProtocols.Http2; // gRPC requires HTTP/2
});
});
builder.Services.AddGrpc();
// ... existing site services ...
app.MapGrpcService<SiteStreamGrpcServer>();
```
## Client-Streaming Pattern (Central Side)
### gRPC Client Implementation
`SiteStreamGrpcClient` manages per-site gRPC channels and streaming subscriptions:
```csharp
public async Task<StreamSubscription> SubscribeAsync(
string correlationId, string instanceUniqueName,
Action<object> onEvent, CancellationToken ct)
{
var request = new InstanceStreamRequest
{
CorrelationId = correlationId,
InstanceUniqueName = instanceUniqueName
};
var call = _client.SubscribeInstance(request, cancellationToken: ct);
// Background task reads from the gRPC response stream
_ = Task.Run(async () =>
{
try
{
await foreach (var evt in call.ResponseStream.ReadAllAsync(ct))
{
var domainEvent = ConvertToDomainEvent(evt);
onEvent(domainEvent);
}
}
catch (RpcException ex) when (ex.StatusCode == StatusCode.Cancelled)
{
// Normal cancellation
}
}, ct);
return new StreamSubscription(correlationId, call);
}
```
### Port Resolution
### Client Factory
`SiteStreamGrpcClientFactory` caches per-site `GrpcChannel` instances (same pattern as `CentralCommunicationActor._siteClients` caching per-site ClusterClient instances).
## Failover & Reconnection
Four failure scenarios to handle, each with different behavior:
### 1. Site Node Failover (Active → Standby)
**What happens**: The active site node goes down. The site's Akka cluster promotes the standby to active. The Deployment Manager singleton moves to the new node, recreating Instance Actors from persisted config.
**gRPC impact**: The gRPC stream on the old node breaks — `ResponseStream.MoveNext()` throws `RpcException` on the central client.
**Central response** (`DebugStreamBridgeActor`):
1. gRPC stream breaks → `onStreamError` callback fires
2. Bridge actor receives the error, enters **reconnecting** state
3. Attempts to open a new gRPC stream to the site's **NodeB** address (via `SiteStreamGrpcClientFactory` failover)
4. If NodeB succeeds → stream resumes, new events flow. The consumer (SignalR/Blazor) sees a brief gap but no action needed.
5. If both nodes unreachable → `onTerminated` callback fires, session ends, consumer notified
**Data gap**: Events that occurred between the old node dying and the new stream connecting are lost. This is acceptable for debug view (real-time monitoring, not historical replay). If needed, the consumer can request a fresh snapshot via ClusterClient after reconnection to re-sync state.
**Reconnection timing**: The bridge actor should retry with backoff:
- Immediate retry to NodeB (site failover is fast, ~25s for Akka singleton handover)
- If NodeB fails, retry NodeA after 5s (original node may have restarted)
- Max 3 retries, then give up and terminate the session
```csharp
// In DebugStreamBridgeActor, on gRPC stream error:
private void HandleGrpcStreamError(Exception ex)
{
_log.Warning("gRPC stream broke for {0}: {1}", _instanceUniqueName, ex.Message);
if (_retryCount >= MaxRetries)
{
_onTerminated();
Context.Stop(Self);
return;
}
_retryCount++;
// Try the other node, then cycle back
_currentEndpoint = _currentEndpoint == _grpcNodeA ? _grpcNodeB : _grpcNodeA;
Context.System.Scheduler.ScheduleTellOnce(
TimeSpan.FromSeconds(_retryCount > 1 ? 5 : 0),
Self, new ReconnectGrpcStream(), ActorRefs.NoSender);
}
```
### 2. Central Node Failover (Active → Standby)
**What happens**: The active central node goes down. The standby becomes leader within ~25s (Akka failover). Traefik detects via `/health/active` and routes new traffic to the new leader.
**gRPC impact**: All `SiteStreamGrpcClient` instances, `GrpcChannel`s, and `DebugStreamBridgeActor`s on the old central node are destroyed. The site-side gRPC server detects dead clients via keepalive (see Connection Keepalive section) and cleans up.
**Central response**: Nothing to do — the old node is dead. On the new active node:
- Users reconnect to the debug view (Blazor circuit was lost with the old node)
- CLI clients reconnect via `.WithAutomaticReconnect()` (SignalR)
- Fresh `DebugStreamBridgeActor` + gRPC stream created on demand
**No automatic session migration**: Debug sessions are not persisted. When central fails over, active debug/stream sessions end. Users re-subscribe. This is consistent with how the system already handles central failover for all stateful sessions (Blazor circuits, SignalR connections).
### 3. Network Partition (Central ↔ Site Temporarily Unreachable)
**What happens**: Network between central and site drops but both clusters are running fine.
**gRPC impact**: gRPC keepalive pings fail on both sides:
- **Site side**: Detects dead client within ~25s, tears down subscription (see Keepalive section)
- **Central side**: `ResponseStream.MoveNext()` throws `RpcException` after keepalive timeout
**Central response**: Same as site node failover — bridge actor enters reconnecting state, retries with backoff. When network recovers, reconnection succeeds and streaming resumes.
**ClusterClient behavior**: ClusterClient also detects the partition independently (transport heartbeat failure, 10s threshold). `CentralCommunicationActor` fires `ConnectionStateChanged(isConnected: false)` and sends `DebugStreamTerminated` to the bridge actor. The bridge actor may receive both the gRPC error and the `DebugStreamTerminated` — it should handle both idempotently (first one triggers reconnect/terminate, second is ignored).
### 4. Site Node Restart (Same Node Comes Back)
**What happens**: A site node restarts (e.g., Windows Service restart, container recreation). The Akka cluster reforms, Instance Actors are recreated.
**gRPC impact**: Same as site node failover — the gRPC stream on that node was broken when the process died. The gRPC server starts fresh on the restarted node.
**Central response**: Bridge actor reconnects (same retry logic as scenario 1). The new gRPC stream connects to the restarted node's fresh `SiteStreamGrpcServer`, which subscribes to the newly recreated `SiteStreamManager`.
### Reconnection State Machine (DebugStreamBridgeActor)
```
┌──────────────────┐
│ Streaming │ ◄── Normal state: gRPC stream active
└────────┬─────────┘
│ gRPC stream error / keepalive timeout
┌──────────────────┐
┌──► │ Reconnecting │ ── try other node endpoint
│ └────────┬─────────┘
│ │
│ ┌────────┴─────────┐
│ │ │
│ success failure (retry < max)
│ │ │
│ ▼ │
│ Streaming schedule retry (5s backoff)
│ │
└───────────────────────┘
failure (retry >= max)
┌──────────────────┐
│ Terminated │ ── notify consumer, stop actor
└──────────────────┘
```
### Summary
| Scenario | Site Cleanup | Central Response | Data Gap |
|----------|-------------|-----------------|----------|
| Site failover | Automatic (process death) | Reconnect to NodeB, retry NodeA | Yes (~30s) |
| Central failover | Keepalive timeout (~25s) | Sessions lost, user re-subscribes | Session ends |
| Network partition | Keepalive timeout (~25s) | Reconnect with backoff | Yes (partition duration) |
| Site restart | Automatic (process death) | Reconnect to restarted node | Yes (~30s) |
## Backpressure and Flow Control
Three layers of flow control:
1. **gRPC/HTTP2**: Built-in TCP flow control. If the central client is slow reading, the site's `WriteAsync` eventually blocks.
2. **Channel\<T\>**: Bounded at 1000 with DropOldest. If gRPC is backpressured AND the channel fills, oldest events are dropped (consistent with SiteStreamManager's existing overflow strategy).
3. **SiteStreamManager**: Akka.Streams source with per-subscriber bounded buffer (configurable via `SiteRuntimeOptions.StreamBufferSize`, default 1000, DropHead).
## Connection Keepalive & Orphan Stream Prevention
If the central cluster crashes, loses network, or fails over without unsubscribing, the site must detect the dead client and tear down the streaming subscription. Three complementary layers handle this:
### 1. TCP-Level Detection (seconds — clean disconnect)
When the central process dies or the TCP connection resets cleanly, the site's `IServerStreamWriter.WriteAsync()` throws `RpcException` with `StatusCode.Cancelled`. The gRPC method's `finally` block runs, cleaning up the SiteStreamManager subscription. This fires within seconds on a clean TCP RST.
### 2. gRPC Keepalive Pings (1025s — network partition / silent failure)
gRPC supports HTTP/2 PING frames for proactive liveness detection. Configure on both sides:
**Site server (Kestrel):**
```csharp
builder.Services.AddGrpc(options =>
{
// If the server sends a ping and gets no ACK within this timeout, it closes the connection.
// This catches silent client death (crash without TCP RST, network partition).
options.KeepAliveTimeout = TimeSpan.FromSeconds(20);
});
// Kestrel HTTP/2 keep-alive settings
builder.WebHost.ConfigureKestrel(options =>
{
options.ListenAnyIP(grpcPort, listenOptions =>
{
listenOptions.Protocols = HttpProtocols.Http2;
});
// Kestrel sends HTTP/2 PING frames at this interval
options.Limits.Http2.KeepAlivePingDelay = TimeSpan.FromSeconds(15);
// Close connection if PING ACK not received within this timeout
options.Limits.Http2.KeepAlivePingTimeout = TimeSpan.FromSeconds(10);
});
```
**Central client (GrpcChannel):**
```csharp
var channel = GrpcChannel.ForAddress(endpoint, new GrpcChannelOptions
{
HttpHandler = new SocketsHttpHandler
{
// Client sends PING frames at this interval to keep the connection alive
// and detect server-side death
KeepAlivePingDelay = TimeSpan.FromSeconds(15),
// Close connection if no PING ACK within this timeout
KeepAlivePingTimeout = TimeSpan.FromSeconds(10),
// Send pings even when no active streams (keeps channel warm for fast reconnect)
KeepAlivePingPolicy = HttpKeepAlivePingPolicy.Always
}
});
```
**Detection timeline**: If central dies silently (no TCP RST), the site detects within `KeepAlivePingDelay + KeepAlivePingTimeout` = ~25 seconds. The `ServerCallContext.CancellationToken` fires, unblocking the `ReadAllAsync` loop, and the `finally` block cleans up.
### 3. Server-Side Stream Timeout (safety net — defense in depth)
As a final safety net, the gRPC server method can enforce a maximum stream duration or idle timeout. This catches edge cases where keepalive pings are disabled or misconfigured:
```csharp
// In SiteStreamGrpcServer.SubscribeInstance():
// Link the gRPC cancellation token with a maximum session timeout
using var sessionTimeout = CancellationTokenSource.CreateLinkedTokenSource(
context.CancellationToken);
sessionTimeout.CancelAfter(TimeSpan.FromHours(4)); // max stream lifetime
await foreach (var evt in channel.Reader.ReadAllAsync(sessionTimeout.Token))
{
await responseStream.WriteAsync(evt, sessionTimeout.Token);
}
```
The 4-hour lifetime is a safety net, not the primary detection mechanism. Normal cleanup happens via gRPC keepalive (25s) or TCP reset (seconds).
### Summary of Detection Layers
| Layer | Detects | Timeline | Mechanism |
|-------|---------|----------|-----------|
| TCP RST | Clean process death, connection close | 15s | OS-level TCP, `WriteAsync` throws |
| gRPC keepalive PING | Network partition, silent crash, firewall drop | ~25s | HTTP/2 PING frames, `CancellationToken` fires |
| Session timeout | Misconfigured keepalive, long-lived zombie streams | 4 hours | `CancellationTokenSource.CancelAfter` |
All three trigger the same cleanup path: `CancellationToken` cancels → `ReadAllAsync` exits → `finally` block removes SiteStreamManager subscription and stops relay actor.
### Configuration Defaults
Add to `CommunicationOptions` (`src/ScadaLink.Communication/CommunicationOptions.cs`):
```csharp
public TimeSpan GrpcKeepAlivePingDelay { get; set; } = TimeSpan.FromSeconds(15);
public TimeSpan GrpcKeepAlivePingTimeout { get; set; } = TimeSpan.FromSeconds(10);
public TimeSpan GrpcMaxStreamLifetime { get; set; } = TimeSpan.FromHours(4);
```
Bind from `appsettings.json` under `ScadaLink:Communication` (existing options section).
## Security Considerations
- **Internal Docker network**: Plain HTTP/2 (no TLS) — all site↔central traffic flows within `scadalink-net` Docker bridge network. Same as current Akka.NET remoting (unencrypted by default).
- **Production**: Enable TLS on the Kestrel gRPC endpoint via HTTPS certificate configuration. The `GrpcChannel` on central switches to `https://` scheme.
- **No authentication on the gRPC channel**: The channel is internal infrastructure (site↔central only). Authentication happens at the user-facing boundaries (LDAP/JWT for UI, Basic Auth for CLI/Management API).
## Adding New Event Types
The streaming channel is designed to carry any event type that can be scoped to an instance. Adding a new event type requires changes at four layers: proto, SiteStreamManager, gRPC server relay, and gRPC client conversion.
### Requirements for a New Event Type
1. **Must carry `InstanceUniqueName`**`SiteStreamManager.ForwardToSubscribers()` filters by instance name. Without it, the event can't be routed to the correct subscriber.
2. **Must be published to `SiteStreamManager`** — the gRPC server subscribes to `SiteStreamManager` via a relay actor. Any event that reaches the relay actor gets written to the gRPC stream.
3. **Must have a proto message** — the event crosses the gRPC wire boundary, so it needs a protobuf representation in `sitestream.proto`.
4. **Must be added to the `oneof event` in `SiteStreamEvent`** — this is how the client knows which event type arrived. Existing field numbers are never reused.
### Step-by-Step: Adding a New Event Type
Example: adding `ScriptErrorEvent` to the stream.
#### 1. Define the domain record
**File**: `src/ScadaLink.Commons/Messages/Streaming/ScriptErrorEvent.cs`
```csharp
namespace ScadaLink.Commons.Messages.Streaming;
public record ScriptErrorEvent(
string InstanceUniqueName, // Required for filtering
string ScriptName,
string ErrorMessage,
DateTimeOffset Timestamp);
```
#### 2. Add the proto message
**File**: `src/ScadaLink.Communication/Protos/sitestream.proto`
Add the message definition and a new field to the `oneof`:
```protobuf
message SiteStreamEvent {
string correlation_id = 1;
oneof event {
AttributeValueUpdate attribute_changed = 2;
AlarmStateUpdate alarm_changed = 3;
ScriptErrorUpdate script_error = 4; // ← new field, next available number
}
}
message ScriptErrorUpdate {
string instance_unique_name = 1;
string script_name = 2;
string error_message = 3;
int64 timestamp_utc_ticks = 4;
}
```
Re-generate the C# stubs and check them into `SiteStreamGrpc/`.
#### 3. Publish to SiteStreamManager
**File**: `src/ScadaLink.SiteRuntime/Streaming/SiteStreamManager.cs`
Add a publish method (follows the existing pattern exactly):
```csharp
public void PublishScriptError(ScriptErrorEvent error)
{
_sourceActor?.Tell(error);
ForwardToSubscribers(error.InstanceUniqueName, error);
}
```
`ForwardToSubscribers` already accepts `object message` — no changes needed to the forwarding infrastructure. It filters by `instanceName` and Tells the message to all matching subscribers.
#### 4. Call the publish method from the source actor
**File**: wherever the event originates (e.g., `ScriptExecutionActor.cs`)
```csharp
_streamManager?.PublishScriptError(new ScriptErrorEvent(
_instanceUniqueName, _scriptName, ex.Message, DateTimeOffset.UtcNow));
```
#### 5. Handle in the gRPC server relay actor
**File**: `src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs` (the `StreamRelayActor`)
The relay actor receives events from `SiteStreamManager` and writes them to the `Channel<SiteStreamEvent>`. Add a `Receive<ScriptErrorEvent>` handler that converts to the proto message:
```csharp
Receive<ScriptErrorEvent>(evt =>
{
var protoEvt = new SiteStreamEvent
{
CorrelationId = _correlationId,
ScriptError = new ScriptErrorUpdate
{
InstanceUniqueName = evt.InstanceUniqueName,
ScriptName = evt.ScriptName,
ErrorMessage = evt.ErrorMessage,
TimestampUtcTicks = evt.Timestamp.UtcTicks
}
};
_channel.TryWrite(protoEvt);
});
```
#### 6. Handle in the gRPC client converter
**File**: `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClient.cs` (the `ConvertToDomainEvent` method)
Add a case for the new proto `oneof` variant:
```csharp
private static object ConvertToDomainEvent(SiteStreamEvent evt) => evt.EventCase switch
{
SiteStreamEvent.EventOneofCase.AttributeChanged => /* existing */,
SiteStreamEvent.EventOneofCase.AlarmChanged => /* existing */,
SiteStreamEvent.EventOneofCase.ScriptError => new ScriptErrorEvent(
evt.ScriptError.InstanceUniqueName,
evt.ScriptError.ScriptName,
evt.ScriptError.ErrorMessage,
new DateTimeOffset(evt.ScriptError.TimestampUtcTicks, TimeSpan.Zero)),
_ => evt // Unknown types pass through as raw proto
};
```
#### 7. Handle in the consumer (if needed)
The `DebugStreamBridgeActor` already passes events to its `_onEvent` callback as `object`. Consumers that care about the new type add a pattern match:
```csharp
// In SignalR hub callback:
ScriptErrorEvent error =>
hubClients.Client(connectionId).SendAsync("OnScriptError", error),
// In Blazor component:
case ScriptErrorEvent error:
_scriptErrors.Add(error);
_ = InvokeAsync(StateHasChanged);
break;
```
### Checklist for New Event Types
| Step | File(s) | What to Do |
|------|---------|------------|
| Domain record | `Commons/Messages/Streaming/` | Create record with `InstanceUniqueName` field |
| Proto message | `Communication/Protos/sitestream.proto` | Add message + `oneof` field (next number) |
| Regenerate stubs | `Communication/SiteStreamGrpc/` | Run `protoc`, check in generated files |
| Publish method | `SiteRuntime/Streaming/SiteStreamManager.cs` | Add `PublishXxx()` method |
| Source actor | Wherever event originates | Call `_streamManager?.PublishXxx(...)` |
| Server relay | `Communication/Grpc/SiteStreamGrpcServer.cs` | Add `Receive<T>` → proto conversion |
| Client converter | `Communication/Grpc/SiteStreamGrpcClient.cs` | Add `EventOneofCase` → domain conversion |
| Consumer (optional) | Hub, Blazor, CLI | Add pattern match for new type |
### Proto Versioning Rules
- **Never reuse field numbers** — deleted fields' numbers are reserved forever
- **Add new `oneof` variants with the next available field number** — old clients ignore unknown fields
- **Never change existing field types or numbers** — this breaks wire compatibility
- **The `oneof` pattern guarantees forward compatibility** — a client that doesn't know about `ScriptErrorUpdate` simply sees `EventCase == None` and can skip it
## Existing Codebase References
| Pattern | File | Relevance |
|---------|------|-----------|
| SiteStreamManager | `src/ScadaLink.SiteRuntime/Streaming/SiteStreamManager.cs` | Subscribe/filter by instance, per-subscriber buffer, DropHead overflow |
| Per-site client caching | `CentralCommunicationActor._siteClients` dictionary | One client per site, refresh on address change |
| Bridge actor pattern | `src/ScadaLink.Communication/Actors/DebugStreamBridgeActor.cs` | Per-session actor with callbacks, adapted to use gRPC instead of Akka messages |
## Implementation Summary
### New Files
| File | Purpose |
|------|---------|
| `src/ScadaLink.Communication/Protos/sitestream.proto` | Proto definition |
| `src/ScadaLink.Communication/SiteStreamGrpc/` | Pre-generated C# stubs |
| `src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs` | Site-side gRPC streaming server |
| `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClient.cs` | Central-side gRPC streaming client |
| `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClientFactory.cs` | Per-site client factory/cache |
| `src/ScadaLink.Communication/Grpc/SiteStreamGrpcClientFactory.cs` | Per-site gRPC client cache, reads GrpcNodeA/BAddress from Site entity |
### Modified Files
| File | Change |
|------|--------|
| `src/ScadaLink.Communication/ScadaLink.Communication.csproj` | Add `Grpc.AspNetCore` + `Grpc.Net.Client` packages |
| `src/ScadaLink.Host/Program.cs` | Site: switch to WebApplicationBuilder + Kestrel gRPC |
| `src/ScadaLink.Communication/Actors/DebugStreamBridgeActor.cs` | Use gRPC client for streaming (ClusterClient for snapshot only) |
| `src/ScadaLink.Communication/DebugStreamService.cs` | Inject `SiteStreamGrpcClientFactory` |
| `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs` | Remove `DebugStreamEvent` forwarding (publish to SiteStreamManager only) |
| `src/ScadaLink.Communication/Actors/SiteCommunicationActor.cs` | Remove `Receive<DebugStreamEvent>` handler |
| `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs` | Remove `HandleDebugStreamEvent` |
| `docker/docker-compose.yml` | Expose gRPC ports for site nodes |
### Deleted Files
| File | Reason |
|------|--------|
| `src/ScadaLink.Commons/Messages/DebugView/DebugStreamEvent.cs` | No longer needed — events flow via gRPC, not ClusterClient |
## Design Review Notes
The following concerns were identified during external review. Items marked **[V1]** should be addressed in the initial implementation. Items marked **[Future]** are noted for consideration as the transport scales beyond debug view.
### [V1] Snapshot-to-Stream Handoff Race
The initial snapshot arrives via ClusterClient, then the gRPC stream opens separately. Events between snapshot generation and stream establishment can be missed or duplicated.
**Mitigation**: Open the gRPC stream **first**, then request the snapshot via ClusterClient. The gRPC stream buffers events from the moment it connects. The consumer applies the snapshot as the baseline, then replays any buffered gRPC events with timestamps newer than the snapshot. This is a simple timestamp-based dedup — no sequence numbers needed for V1.
### [V1] Stream Authority — Which Site Node to Connect To
Both site nodes may be running, but only the active node (hosting the Deployment Manager singleton) has live Instance Actors and a populated SiteStreamManager.
**Rule**: Central connects to the site node whose Akka address matches the singleton owner. In practice, `CentralCommunicationActor` already tracks which site node is reachable via ClusterClient — the gRPC client factory should use the same node selection. Try `GrpcNodeAAddress` first; on failure, try `GrpcNodeBAddress`.
The standby site node's gRPC server will accept connections but its SiteStreamManager will have no subscribers and no events. A connected stream to the standby will simply be idle (no data). On site failover, the bridge actor reconnects and picks up the new active node.
### [V1] Startup/Shutdown Ordering
Switching site host to `WebApplicationBuilder` requires coordinating ASP.NET Core and Akka.NET lifecycles:
- **Startup**: Actor system and SiteStreamManager must be initialized before `MapGrpcService<SiteStreamGrpcServer>()` begins accepting connections. Gate gRPC readiness on actor system startup (reject streams with `StatusCode.Unavailable` until ready).
- **Shutdown**: On `CoordinatedShutdown`, stop accepting new gRPC streams first, cancel all active streams (triggering client reconnect), then tear down actors. Use `IHostApplicationLifetime.ApplicationStopping` to signal the gRPC server.
### [V1] Duplicate Stream Prevention
Central bugs or reconnect storms could create multiple gRPC streams for the same instance/correlationId.
**Rule**: `SiteStreamGrpcServer` tracks active subscriptions by `correlation_id`. If a new `SubscribeInstance` arrives with a `correlation_id` that's already active, cancel the old stream before starting the new one. One stream per correlation ID.
### [V1] Observability
Add metrics for silent degradation detection:
| Metric | Source | Purpose |
|--------|--------|---------|
| `grpc_streams_active` | SiteStreamGrpcServer | Active stream count per site node |
| `grpc_streams_events_sent` | SiteStreamGrpcServer | Events written to gRPC, by type |
| `grpc_streams_events_dropped` | Channel writer | Events dropped due to bounded buffer overflow |
| `grpc_streams_reconnects` | DebugStreamBridgeActor | Reconnection count and reason |
| `grpc_streams_duration` | SiteStreamGrpcServer | Stream lifetime (histogram) |
Emit via Serilog structured logging (existing infrastructure). Consider Prometheus counters if metrics export is added later.
### [V1] Proto Improvements
Use protobuf-native types instead of .NET-specific scalars:
```protobuf
import "google/protobuf/timestamp.proto";
enum Quality {
QUALITY_UNSPECIFIED = 0;
QUALITY_GOOD = 1;
QUALITY_UNCERTAIN = 2;
QUALITY_BAD = 3;
}
enum AlarmState {
ALARM_STATE_UNSPECIFIED = 0;
ALARM_STATE_NORMAL = 1;
ALARM_STATE_ACTIVE = 2;
}
message AttributeValueUpdate {
string instance_unique_name = 1;
string attribute_path = 2;
string attribute_name = 3;
string value = 4;
Quality quality = 5;
google.protobuf.Timestamp timestamp = 6;
}
message AlarmStateUpdate {
string instance_unique_name = 1;
string alarm_name = 2;
AlarmState state = 3;
int32 priority = 4;
google.protobuf.Timestamp timestamp = 5;
}
```
Reserve enum zero as `UNSPECIFIED` per proto3 convention. Use `google.protobuf.Timestamp` instead of `int64` ticks for cross-platform compatibility.
### [V1] Max Concurrent Streams
Define limits to prevent resource exhaustion:
- Max 100 concurrent gRPC streams per site node (configurable via `CommunicationOptions`)
- Server rejects with `StatusCode.ResourceExhausted` when limit reached
- One stream per `correlation_id` (duplicate prevention above)
## Documentation Updates
All documentation changes must be completed as part of the implementation — not deferred. The design docs are the source of truth for this system's architecture.
### High-Level Requirements
**Modify**: `docs/requirements/HighLevelReqs.md`
Update the following sections:
- **Section 5 (CentralSite Communication)**: Add gRPC streaming as a transport alongside ClusterClient. Clarify that ClusterClient handles command/control and gRPC handles real-time data streaming.
- **Section 13 (Non-Functional / Performance)**: Add gRPC streaming throughput expectations and backpressure behavior if applicable.
### Component-Level Requirements
| Document | Changes |
|----------|---------|
| `docs/requirements/Component-Communication.md` | Pattern 6 (Debug Streaming): Replace ClusterClient streaming path with gRPC. Add `SiteStreamGrpcServer`, `SiteStreamGrpcClient`, `SiteStreamGrpcClientFactory` to component responsibilities. Add gRPC port configuration to shared settings. Update dependencies/interactions. |
| `docs/requirements/Component-SiteRuntime.md` | Update SiteStreamManager section: note that gRPC server subscribes to the stream for cross-cluster delivery. InstanceActor no longer forwards `DebugStreamEvent` directly. |
| `docs/requirements/Component-Host.md` | Site host now uses `WebApplicationBuilder` with Kestrel HTTP/2 for gRPC. Document `GrpcPort` config, startup/shutdown ordering with Akka.NET. |
| `docs/requirements/Component-CentralUI.md` | Debug view streaming path updated (gRPC, not ClusterClient for events). |
| `docs/requirements/Component-CLI.md` | `site create` / `site update` commands updated with `--grpc-node-a-address` / `--grpc-node-b-address`. |
| `docs/requirements/Component-ConfigurationDatabase.md` | Migration for `GrpcNodeAAddress` / `GrpcNodeBAddress` on Sites table. |
| `docs/requirements/Component-ClusterInfrastructure.md` | Note gRPC port alongside Akka remoting port in node configuration. |
### CLAUDE.md
Update key design decisions:
- Add "gRPC streaming for site→central real-time data; ClusterClient for command/control only" under Data & Communication
- Add gRPC port convention under Architecture & Runtime
- Update current component count if a new component is introduced
### README.md
Update the architecture diagram to show the gRPC streaming channel between site and central clusters.
## Testing Strategy
### Tests to Update (Existing)
| Test File | Change |
|-----------|--------|
| `tests/ScadaLink.SiteRuntime.Tests/Actors/InstanceActorIntegrationTests.cs` | Remove `DebugStreamEventForwarder` test helper and `DebugStreamEvent` expectations. Debug subscriber tests should verify events reach SiteStreamManager only (not direct forwarding). |
| `tests/ScadaLink.Communication.Tests/` | Remove any tests for `DebugStreamEvent` routing through `CentralCommunicationActor` and `SiteCommunicationActor`. |
| `tests/ScadaLink.Host.Tests/HealthCheckTests.cs` | May need updates if site host builder changes affect test factory setup. |
### New Tests Required
| Test | Project | What to Verify |
|------|---------|----------------|
| `SiteStreamGrpcServerTests` | `ScadaLink.Communication.Tests` | Server accepts subscription, relays events from mock SiteStreamManager to gRPC stream, cleans up on cancellation, rejects duplicate `correlation_id`, enforces max concurrent streams limit, rejects before actor system ready. |
| `SiteStreamGrpcClientTests` | `ScadaLink.Communication.Tests` | Client connects, reads stream, converts proto→domain types, invokes callback, handles stream errors, reconnects to NodeB on failure. |
| `SiteStreamGrpcClientFactoryTests` | `ScadaLink.Communication.Tests` | Creates and caches per-site clients, derives endpoints from Site entity, disposes on site removal. |
| `DebugStreamBridgeActorTests` (update) | `ScadaLink.Communication.Tests` | Verify bridge actor opens gRPC stream after snapshot, receives events via gRPC callback (not Akka messages), reconnects on stream error with node failover, terminates after max retries. |
| `GrpcStreamIntegrationTest` | `ScadaLink.IntegrationTests` | End-to-end: site gRPC server → central gRPC client → bridge actor → callback. Use in-process test server (`WebApplicationFactory` or `TestServer`). Verify event delivery, cancellation cleanup, keepalive behavior. |
| `SiteHostStartupTests` | `ScadaLink.Host.Tests` | Site host starts with `WebApplicationBuilder`, gRPC port configured, `MapGrpcService` registered, rejects streams before actor system ready. |
| `ProtoRoundtripTests` | `ScadaLink.Communication.Tests` | Serialize/deserialize each proto message type, verify `oneof` discrimination, verify enum mappings, verify `google.protobuf.Timestamp` conversion. |
### Test Coverage Guardrails
To ensure the implementation stays compliant with this plan:
1. **Proto contract tests**: A test that loads `sitestream.proto` and verifies all `oneof` variants have handlers in both `StreamRelayActor` (server) and `ConvertToDomainEvent` (client). If a new proto field is added without handlers, the test fails. Prevents silent event type gaps.
2. **Architectural constraint test**: Add to `ScadaLink.Commons.Tests/ArchitecturalConstraintTests.cs` — verify that `DebugStreamEvent` type no longer exists in the assembly (ensures the ClusterClient streaming path is fully removed and doesn't creep back).
3. **Startup validation test**: Site host integration test that verifies gRPC server rejects `SubscribeInstance` calls with `StatusCode.Unavailable` before the actor system is ready, and accepts them after.
4. **Cleanup verification test**: gRPC server test that verifies after stream cancellation, the SiteStreamManager subscription count returns to its pre-test value (no leaked subscriptions).
5. **No ClusterClient streaming regression**: Integration test that subscribes via gRPC, triggers attribute changes, and verifies events arrive via gRPC callback — NOT via `DebugStreamEvent` through `CentralCommunicationActor`. This prevents accidental reintroduction of the ClusterClient streaming path.
## Implementation Plan Guardrails
When creating implementation plans (work packages) from this document, each plan must include:
1. **Pre-implementation checklist**:
- [ ] Identify all requirement docs affected by this work package
- [ ] Identify all existing tests that need updating
- [ ] List new tests required before marking the work package complete
2. **Per-step requirements**:
- Every new public class/interface must have covering unit tests
- Every modified actor must have its existing tests updated to reflect new behavior
- Every proto change must have roundtrip serialization tests
- Every config change must be validated in `StartupValidator` with a covering test
3. **Completion criteria** (a work package is not done until):
- [ ] All identified requirement docs are updated
- [ ] All existing tests pass (no regressions)
- [ ] All new tests listed in this document are implemented and passing
- [ ] `dotnet build` succeeds with zero warnings
- [ ] CLAUDE.md key design decisions are updated if architectural choices changed
- [ ] `docker/deploy.sh` succeeds and end-to-end streaming verified manually
4. **Review checkpoint**: After each implementation phase (site server, central client, cleanup, config), run the full test suite and verify the streaming path end-to-end before proceeding to the next phase. Do not batch all phases into a single commit.
+3 -3
View File
@@ -191,13 +191,13 @@ No business logic, actor systems, database connectivity, or web endpoints are im
- `[KDD-code-5]` Per-component configuration via appsettings.json sections bound to options classes. → Directly maps to REQ-HOST-3.
- `[KDD-code-6]` Options classes owned by component projects, not Commons. → Directly maps to HOST-3-4.
### From Component-Commons.md
### From docs/requirements/Component-Commons.md
- `[CD-Commons-1]` Commons is referenced by all component libraries and the Host — project reference structure must reflect this.
- `[CD-Commons-2]` No EF navigation property annotations on POCOs (Fluent API only in Configuration Database).
- `[CD-Commons-3]` Configuration Database implements repository interfaces and maps POCOs — Phase 0 establishes the interface contract; implementation deferred.
### From Component-Host.md
### From docs/requirements/Component-Host.md
- `[CD-Host-1]` Host is the composition root — references every component project to call their extension methods.
- `[CD-Host-2]` Configuration Database registration (DbContext, repository wiring) is a Host responsibility — Phase 0 includes ConfigurationDatabase in Host's `AddXxx()` call chain (skeleton); full DbContext/repository wiring in Phase 1.
@@ -614,7 +614,7 @@ Phase 0 covers REQ-COM and REQ-HOST requirements. The following are split with o
| REQ ID | Phase 0 Scope | Other Phase(s) Scope |
|--------|---------------|---------------------|
| REQ-COM-2 | Interface definition only | Phase 3B: OPC UA and LmxProxy implementations |
| REQ-COM-2 | Interface definition only | Phase 3B: OPC UA implementation |
| REQ-COM-4a | Interface definition only | Phase 1: `IAuditService` implementation in Configuration Database |
| REQ-COM-5a-4 | Noted in plan; versioning rules documented | Phase 1/3A: Akka serialization binding configuration |
| REQ-HOST-2 | Skeleton role branching with stub `AddXxx()` calls | Phase 1: Full service registration with real implementations |
+5 -5
View File
@@ -116,7 +116,7 @@
| KDD-code-7 | Host readiness gating: /health/ready endpoint, no traffic until operational. | WP-12 |
| KDD-code-8 | EF Core migrations: auto-apply in dev, manual SQL scripts for production. | WP-1 |
### From Component-ConfigurationDatabase.md
### From docs/requirements/Component-ConfigurationDatabase.md
| ID | Constraint | Work Package |
|----|-----------|--------------|
@@ -131,7 +131,7 @@
| CD-ConfigDB-9 | Connection strings from Host's DatabaseOptions (bound from appsettings.json). | WP-1 |
| CD-ConfigDB-10 | Production startup validates database schema version matches expected migration level; fail fast if not. | WP-1, WP-11 |
### From Component-Security.md
### From docs/requirements/Component-Security.md
| ID | Constraint | Work Package |
|----|-----------|--------------|
@@ -150,7 +150,7 @@
| CD-Security-13 | Unauthorized actions return appropriate error and are not logged as audit events. | WP-9 |
| CD-Security-14 | LDAP group mappings stored in configuration database, managed via Central UI (Admin role). | WP-2, WP-18 |
### From Component-CentralUI.md
### From docs/requirements/Component-CentralUI.md
| ID | Constraint | Work Package |
|----|-----------|--------------|
@@ -160,7 +160,7 @@
| CD-CentralUI-4 | Both nodes share ASP.NET Data Protection keys (config DB or shared config). | WP-10, WP-21 |
| CD-CentralUI-5 | Active debug view streams and in-progress subscriptions are lost on failover; user must re-open. | WP-21 |
### From Component-Host.md
### From docs/requirements/Component-Host.md
| ID | Constraint | Work Package |
|----|-----------|--------------|
@@ -901,7 +901,7 @@ Most findings arose because the Codex review operated on a condensed summary tha
25. **"Section 9.3 claimed as covered but permissions not verifiable"** — Dismissed. Phase 1 defines the authorization policies (WP-9). The actual permission checks are exercised when each component's workflows are built. The policies define the permission boundaries; enforcement is cross-cutting.
26. **"WP-4 conflicts with Component-ConfigurationDatabase (instance lifecycle missing)"** — Dismissed. WP-4 AC#2 includes instance lifecycle.
26. **"WP-4 conflicts with docs/requirements/Component-ConfigurationDatabase (instance lifecycle missing)"** — Dismissed. WP-4 AC#2 includes instance lifecycle.
### Conclusion
+12 -12
View File
@@ -36,7 +36,7 @@ Specifically required from earlier phases:
## 3. Requirements Checklist
Each bullet extracted from HighLevelReqs.md sections covered by this phase. IDs follow the pattern `[section-N]`.
Each bullet extracted from docs/requirements/HighLevelReqs.md sections covered by this phase. IDs follow the pattern `[section-N]`.
### 3.1 Template Structure
- `[3.1-1]` Machines are modeled as instances of templates.
@@ -174,7 +174,7 @@ Each bullet extracted from HighLevelReqs.md sections covered by this phase. IDs
## 4. Design Constraints Checklist
Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md / Component-ConfigurationDatabase.md.
Constraints from CLAUDE.md Key Design Decisions and docs/requirements/Component-TemplateEngine.md / docs/requirements/Component-ConfigurationDatabase.md.
### From CLAUDE.md Key Design Decisions
@@ -186,7 +186,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md
- `[KDD-deploy-10]` Last-write-wins for concurrent template editing (no optimistic concurrency on templates).
- `[KDD-deploy-12]` Naming collisions in composed feature modules are design-time errors.
### From Component-TemplateEngine.md
### From docs/requirements/Component-TemplateEngine.md
- `[CD-TE-1]` Template has unique name/ID.
- `[CD-TE-2]` Template cannot be deleted if referenced by instances or child templates.
@@ -215,7 +215,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md
- `[CD-TE-25]` Script has trigger configuration: Interval, Value Change, Conditional, or invoked by alarm/other script.
- `[CD-TE-26]` Script has optional minimum time between runs.
### From Component-ConfigurationDatabase.md
### From docs/requirements/Component-ConfigurationDatabase.md
- `[CD-CDB-1]` ITemplateEngineRepository covers: templates, attributes, alarms, scripts, compositions, instances, overrides, connection bindings, areas.
- `[CD-CDB-2]` IDeploymentManagerRepository covers: current deployment status per instance, deployed configuration snapshots, system-wide artifact deployment status per site.
@@ -488,11 +488,11 @@ Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md
**Acceptance Criteria**:
- Resolution order: Instance → Child Template (most derived first) → Composing Template → Composed Module (recursively). (`[3.6R-1]`, `[3.6R-2]`)
- Walk inheritance chain applying overrides at each level, respecting locks. (Component-TE flattening step 2)
- Resolve composed feature modules, applying overrides from composing templates, respecting locks. (Component-TE flattening step 3)
- Apply instance-level overrides, respecting locks. (Component-TE flattening step 4)
- Walk inheritance chain applying overrides at each level, respecting locks. (docs/requirements/Component-TE flattening step 2)
- Resolve composed feature modules, applying overrides from composing templates, respecting locks. (docs/requirements/Component-TE flattening step 3)
- Apply instance-level overrides, respecting locks. (docs/requirements/Component-TE flattening step 4)
- Resolve data connection bindings — replace connection name references with concrete connection details from site. (`[3.3-8]`, `[CD-TE-11]`)
- Output a flat structure: list of attributes with resolved values and data source addresses, list of alarms with resolved trigger definitions, list of scripts with resolved code and triggers. (Component-TE flattening step 6)
- Output a flat structure: list of attributes with resolved values and data source addresses, list of alarms with resolved trigger definitions, list of scripts with resolved code and triggers. (docs/requirements/Component-TE flattening step 6)
- Flattening success is a pre-deployment validation check. (`[3.11-1]`)
- Test: multi-level inheritance with overrides and locks → correct resolution.
- Test: nested composition with overrides → correct canonical names.
@@ -547,7 +547,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md
**Acceptance Criteria**:
- Deployment package format is explicitly defined and versioned.
- Package contains: all resolved attributes (values, data types, data source addresses with connection details), all resolved alarms (trigger definitions with resolved attribute references), all resolved scripts (source code, trigger configuration, parameter/return definitions). (Component-TE flattening step 6)
- Package contains: all resolved attributes (values, data types, data source addresses with connection details), all resolved alarms (trigger definitions with resolved attribute references), all resolved scripts (source code, trigger configuration, parameter/return definitions). (docs/requirements/Component-TE flattening step 6)
- Package includes the revision hash. (`[KDD-deploy-5]`)
- Scripts are included for deployment to sites as part of flattened config. (`[4.1-3]`)
- Pre-compilation validation occurs at central; actual compilation at site. (`[4.1-4]`)
@@ -671,7 +671,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-TemplateEngine.md
**Description**: Implement the EF Core repository for all template domain entities.
**Acceptance Criteria**:
- Repository covers: templates, attributes, alarms, scripts, compositions, instances, overrides, connection bindings, areas. (`[CD-CDB-1]`) Note: shared scripts (`[CD-CDB-7]`) may be added to this repository or served by a separate interface — decide during implementation. The Component-ConfigurationDatabase.md scope for ITemplateEngineRepository does not explicitly include shared scripts; if a separate interface is warranted, create one.
- Repository covers: templates, attributes, alarms, scripts, compositions, instances, overrides, connection bindings, areas. (`[CD-CDB-1]`) Note: shared scripts (`[CD-CDB-7]`) may be added to this repository or served by a separate interface — decide during implementation. The docs/requirements/Component-ConfigurationDatabase.md scope for ITemplateEngineRepository does not explicitly include shared scripts; if a separate interface is warranted, create one.
- Implementation uses DbContext internally with POCO entities from Commons. (`[CD-CDB-5]`)
- Consuming components depend only on Commons interfaces. (`[CD-CDB-6]`)
- Unit-of-work support: multiple operations commit in a single transaction.
@@ -1096,7 +1096,7 @@ Codex MCP external verification was performed using model `gpt-5.4`. The review
2. **3.9 individual-instance deployment bullet not extracted** — Added `[3.9-6]` to Requirements Checklist. Split-section check updated.
3. **System-wide artifact deployment status in WP-24** — WP-24 is a stub; system-wide artifact deployment status is Phase 3C scope. Accepted as-is (stub level sufficient for Phase 2).
4. **Instance lifecycle concurrency missing from WP-20** — Added optimistic concurrency via rowversion to WP-20 acceptance criteria. Updated `[CD-CDB-4]` forward trace.
5. **Shared scripts repository assignment** — Clarified in WP-5 and WP-23 that shared scripts may use ITemplateEngineRepository or a separate interface; the Component-ConfigurationDatabase.md ITemplateEngineRepository scope does not explicitly include shared scripts.
5. **Shared scripts repository assignment** — Clarified in WP-5 and WP-23 that shared scripts may use ITemplateEngineRepository or a separate interface; the docs/requirements/Component-ConfigurationDatabase.md ITemplateEngineRepository scope does not explicitly include shared scripts.
6. **"No rollback" vs deployment history records** — Clarified in WP-15 and `[3.9-4]` that "no rollback" means no rollback mechanism/operation, not absence of deployment history records. Deployment records exist for audit per Configuration Database schema.
7. **Composition deletion constraint in WP-25** — Added clarifying note that composed-template deletion constraint is a logical implication of `[CD-TE-2]` (stricter but consistent interpretation).
@@ -1112,6 +1112,6 @@ Codex MCP external verification was performed using model `gpt-5.4`. The review
12. **`[3.11-8]` Central UI / Design role not verified in WP-18** — Dismissed. Phase 2 provides the on-demand validation API. The Central UI integration and Design role enforcement for the validation UI are Phase 5 concerns. WP-18 correctly verifies the pipeline can be invoked without deployment; UI wiring is out of scope.
13. **`[4.5-3]` Design role gating** — Partially addressed: added Design role enforcement note to WP-5. Full UI-level role enforcement is Phase 5.
14. **`[CD-TE-9]` stream topics and UI display not verified** — Dismissed. Stream topics are Phase 3B (Akka stream); UI display is Phase 5. Phase 2 covers canonical names in triggers, scripts, and diffs which are the Phase 2 concern.
15. **Naming collision with canonical names contradicts HLR** — Dismissed. The HighLevelReqs statement "two feature modules that each define an attribute with the same name" is refined by Component-TemplateEngine.md which introduces canonical naming with module instance name prefixes. The component design is authoritative for implementation details; the HLR describes the user-facing intent (collisions are errors) while the component design specifies the mechanism (canonical names prevent false collisions). No contradiction — the component design is a refinement.
15. **Naming collision with canonical names contradicts HLR** — Dismissed. The HighLevelReqs statement "two feature modules that each define an attribute with the same name" is refined by docs/requirements/Component-TemplateEngine.md which introduces canonical naming with module instance name prefixes. The component design is authoritative for implementation details; the HLR describes the user-facing intent (collisions are errors) while the component design specifies the mechanism (canonical names prevent false collisions). No contradiction — the component design is a refinement.
**Status**: Pass with corrections. All findings either addressed in the plan or dismissed with rationale.
+4 -4
View File
@@ -103,7 +103,7 @@
- [ ] `[KDD-cluster-4]` CoordinatedShutdown for graceful singleton handover.
- [ ] `[KDD-cluster-5]` Automatic dual-node recovery from persistent storage.
### From Component-ClusterInfrastructure.md
### From docs/requirements/Component-ClusterInfrastructure.md
- [ ] `[CD-CI-1]` Two-node cluster (active/standby) using Akka.NET Cluster.
- [ ] `[CD-CI-2]` Leader election and role assignment (active vs. standby).
@@ -120,7 +120,7 @@
- *Phase 3A scope*: Establish the pattern — no alarm persistence. Alarm Actors are Phase 3B, but the design must not persist alarm state.
- [ ] `[CD-CI-13]` Keep-oldest SBR rationale: with two nodes, quorum-based strategies cause total shutdown. Keep-oldest with `down-if-alone` ensures at most one node runs the singleton.
### From Component-SiteRuntime.md
### From docs/requirements/Component-SiteRuntime.md
- [ ] `[CD-SR-1]` Deployment Manager is an Akka.NET cluster singleton — guaranteed to run on exactly one node.
- [ ] `[CD-SR-2]` Startup behavior step 1: Read all deployed configurations from local SQLite.
@@ -133,7 +133,7 @@
- *Phase 3A scope*: Skeleton lifecycle — disable/enable/delete message handling in Deployment Manager. Full lifecycle with DCL/scripts is Phase 3B/3C.
- [ ] `[CD-SR-9]` When Instance Actor is stopped (disable, delete, redeployment), Akka.NET automatically stops all child actors.
### From Component-Host.md
### From docs/requirements/Component-Host.md
- [ ] `[CD-HOST-1]` REQ-HOST-6: Site-role Akka bootstrap with Remoting, Clustering, Persistence (SQLite), Split-Brain Resolver.
- [ ] `[CD-HOST-2]` REQ-HOST-7: Site nodes use `Host.CreateDefaultBuilder` — generic `IHost`, **not** `WebApplication`. No Kestrel, no HTTP port, no web endpoints.
@@ -418,7 +418,7 @@ Phase 3A is complete when **all** of the following pass:
| # | Question | Context | Impact | Status |
|---|----------|---------|--------|--------|
| Q-P3A-1 | What is the optimal batch size and delay for staggered Instance Actor startup? | Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity. | Performance tuning. Default to 20/100ms, make configurable. | Deferred — tune during Phase 3B when DCL is integrated. |
| Q-P3A-1 | What is the optimal batch size and delay for staggered Instance Actor startup? | docs/requirements/Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity. | Performance tuning. Default to 20/100ms, make configurable. | Deferred — tune during Phase 3B when DCL is integrated. |
| Q-P3A-2 | Should the SQLite schema use a single database file or separate files per concern (configs, overrides, S&F, events)? | Single file is simpler. Separate files isolate concerns and allow independent backup/maintenance. | Schema design. | Recommend single file with separate tables. Simpler transaction management. Final decision during implementation. |
| Q-P3A-3 | Should Akka.Persistence (event sourcing / snapshotting) be used for the Deployment Manager singleton, or is direct SQLite access sufficient? | Akka.Persistence adds complexity (journal, snapshots) but provides built-in recovery. Direct SQLite is simpler for this use case (singleton reads all configs on startup). | Architecture. | Recommend direct SQLite — Deployment Manager recovery is a full read-all-configs-and-rebuild pattern, not event replay. Akka.Persistence is overkill here. |
+8 -46
View File
@@ -11,7 +11,7 @@
Phase 3B brings the site cluster to life as a fully operational data collection, scripting, alarm evaluation, and health reporting platform. Upon completion, a site can:
- Communicate bidirectionally with the central cluster using all 8 message patterns.
- Connect to OPC UA servers and LmxProxy endpoints, subscribe to tags, and deliver values to Instance Actors.
- Connect to OPC UA servers, subscribe to tags, and deliver values to Instance Actors.
- Execute scripts in response to triggers (interval, value change, conditional).
- Evaluate alarm conditions, manage alarm state, and execute on-trigger scripts.
- Compile and execute shared scripts inline.
@@ -25,7 +25,7 @@ Phase 3B brings the site cluster to life as a fully operational data collection,
| Component | Scope |
|-----------|-------|
| Central-Site Communication | Full — all 8 message patterns, correlation IDs, per-pattern timeouts, transport heartbeat |
| Data Connection Layer | Full — IDataConnection, OPC UA adapter, LmxProxy adapter, connection actor, auto-reconnect, write-back, tag path resolution, health reporting |
| Data Connection Layer | Full — IDataConnection, OPC UA adapter, connection actor, auto-reconnect, write-back, tag path resolution, health reporting |
| Site Runtime | Full runtime — Script Actor, Alarm Actor, shared scripts, Script Runtime API (core operations), script trust model, site-wide Akka stream |
| Health Monitoring | Site-side collection + central-side aggregation and offline detection |
| Site Event Logging | Event recording, retention/purge, remote query with pagination |
@@ -47,7 +47,7 @@ Phase 3B brings the site cluster to life as a fully operational data collection,
## Requirements Checklist
Each bullet extracted from HighLevelReqs.md at the individual requirement level. Checkbox items must each map to at least one work package.
Each bullet extracted from docs/requirements/HighLevelReqs.md at the individual requirement level. Checkbox items must each map to at least one work package.
### Section 2.2 — Communication: Central <-> Site
@@ -66,8 +66,8 @@ Each bullet extracted from HighLevelReqs.md at the individual requirement level.
### Section 2.4 — Data Connection Protocols
- [ ] `[2.4-1]` System supports OPC UA and LmxProxy (gRPC-based custom protocol with existing client SDK).
- [ ] `[2.4-2]` Both protocols implement a common interface supporting: connect, subscribe to tag paths, receive value updates, and write values.
- [ ] `[2.4-1]` System supports OPC UA.
- [ ] `[2.4-2]` Protocol adapters implement a common interface supporting: connect, subscribe to tag paths, receive value updates, and write values.
- [ ] `[2.4-3]` Additional protocols can be added by implementing the common interface.
- [ ] `[2.4-4]` Data Connection Layer is a clean data pipe — publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions.
@@ -221,15 +221,6 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
- [ ] `[KDD-ui-4]` Dead letter monitoring as a health metric.
- [ ] `[KDD-ui-5]` Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search.
### LmxProxy Protocol Details
- [ ] `[CD-DCL-1]` LmxProxy: gRPC/HTTP/2 transport, protobuf-net code-first, port 5050.
- [ ] `[CD-DCL-2]` LmxProxy: API key auth, session-based (SessionId), 30s keep-alive heartbeat via `GetConnectionStateAsync`.
- [ ] `[CD-DCL-3]` LmxProxy: Server-streaming gRPC for subscriptions (`IAsyncEnumerable<VtqMessage>`), 1000ms default sampling, on-change with 0.
- [ ] `[CD-DCL-4]` LmxProxy: SDK retry policy (exponential backoff via Polly) complements DCL's fixed-interval reconnect. SDK handles operation-level transient failures; DCL handles connection-level recovery.
- [ ] `[CD-DCL-5]` LmxProxy: Batch read/write capabilities (ReadBatchAsync, WriteBatchAsync, WriteBatchAndWaitAsync).
- [ ] `[CD-DCL-6]` LmxProxy: TLS 1.2/1.3, mutual TLS (client cert + key PEM), custom CA trust, self-signed for dev.
### Communication Component Design
- [ ] `[CD-Comm-1]` 8 distinct message patterns: Deployment, Instance Lifecycle, System-Wide Artifact, Integration Routing, Recipe/Command Delivery, Debug Streaming, Health Reporting, Remote Queries.
@@ -282,7 +273,6 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
- [ ] `[CD-DCL-12]` Value update message format: tag path, value, quality (good/bad/uncertain), timestamp.
- [ ] `[CD-DCL-13]` When Instance Actor stopped, DCL cleans up associated subscriptions.
- [ ] `[CD-DCL-14]` On redeployment, subscriptions established fresh based on new configuration.
- [ ] `[CD-DCL-15]` LmxProxy connection actor holds SessionId, starts 30s keep-alive timer on Connected state. On keep-alive failure, transitions to Reconnecting, client disposes subscriptions.
---
@@ -411,30 +401,6 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
---
### WP-8: Data Connection Layer — LmxProxy Adapter
**Description**: Implement the LmxProxy adapter wrapping the existing `LmxProxyClient` SDK behind IDataConnection.
**Acceptance Criteria**:
- Implements all IDataConnection methods mapped per Component-DCL concrete type mappings.
- Connect: calls `ConnectAsync`, stores SessionId.
- Subscribe: calls `SubscribeAsync`, processes `IAsyncEnumerable<VtqMessage>` stream, forwards updates.
- Write: calls `WriteAsync`.
- Read: calls `ReadAsync`.
- Configurable sampling interval (default 1000ms, 0 = on-change).
- gRPC/HTTP/2 transport on configured port (default 5050).
- API key authentication passed in ConnectRequest.
- TLS support: TLS 1.2/1.3, mutual TLS, custom CA trust, self-signed for dev.
- 30s keep-alive heartbeat via `GetConnectionStateAsync`. On failure, marks disconnected, disposes subscriptions.
- SDK retry policy (Polly exponential backoff) retained for operation-level transient failures.
- Batch operations exposed (ReadBatchAsync, WriteBatchAsync) for future use.
**Estimated Complexity**: L
**Requirements Traced**: `[2.4-1]`, `[2.4-2]`, `[CD-DCL-1]`, `[CD-DCL-2]`, `[CD-DCL-3]`, `[CD-DCL-4]`, `[CD-DCL-5]`, `[CD-DCL-6]`, `[CD-DCL-15]`
---
### WP-9: Data Connection Layer — Auto-Reconnect & Bad Quality Propagation
**Description**: Implement auto-reconnection at fixed interval with immediate bad quality propagation on disconnect.
@@ -460,7 +426,6 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
**Acceptance Criteria**:
- After reconnection, all subscriptions that were active before disconnect are re-subscribed.
- Instance Actors require no action — they see quality return to good as fresh values arrive.
- LmxProxy adapter: new session established, new subscriptions created (old session/subscriptions were disposed on disconnect).
- OPC UA adapter: new session established, monitored items re-created.
- Test: disconnect OPC UA server, reconnect, verify values resume without Instance Actor intervention.
@@ -476,7 +441,7 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
**Acceptance Criteria**:
- Instance Actor sends write request to DCL when script calls SetAttribute for data-connected attribute.
- DCL writes value via appropriate protocol (OPC UA Write / LmxProxy WriteAsync).
- DCL writes value via the appropriate protocol (e.g., OPC UA Write).
- Write failure (connection down, device rejection, timeout) returned synchronously to calling script.
- Successful write: in-memory value NOT optimistically updated. Value updates only when device confirms via existing subscription.
- Write failures also logged to Site Event Logging.
@@ -531,7 +496,7 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
- Tag value updates delivered directly to requesting Instance Actor.
- When Instance Actor stopped (disable, delete, redeployment): DCL cleans up associated subscriptions.
- On redeployment: subscriptions established fresh based on new configuration.
- Protocol-agnostic — works for both OPC UA and LmxProxy.
- Protocol-agnostic — works for OPC UA and any future protocol adapter.
**Estimated Complexity**: M
@@ -896,7 +861,7 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
**Acceptance Criteria**:
- IDataConnection interface defined in Commons (Phase 0 — REQ-COM-2).
- OPC UA adapter and LmxProxy adapter both implement IDataConnection.
- The OPC UA adapter implements IDataConnection.
- Connection actor instantiates the correct adapter based on data connection protocol type from configuration.
- Adding a new protocol requires only implementing IDataConnection and registering the adapter — no changes to connection actor or Instance Actor.
@@ -933,7 +898,6 @@ Constraints from CLAUDE.md Key Design Decisions (KDD) and Component-*.md (CD) th
|------|---------------|
| Connection Actor | State machine transitions (Connecting -> Connected -> Reconnecting), stash/unstash behavior, bad quality propagation on disconnect |
| OPC UA Adapter | IDataConnection contract compliance, subscribe/unsubscribe, write |
| LmxProxy Adapter | IDataConnection contract compliance, SessionId management, keep-alive, subscription stream processing |
| Script Actor | Trigger evaluation (interval, value change, conditional), minimum time between runs, concurrent execution |
| Alarm Actor | Condition evaluation (Value Match, Range Violation, Rate of Change), state transitions (normal->active, active->normal), no script on clear |
| Script Runtime API | GetAttribute, SetAttribute (data-connected + static), CallScript, CallShared |
@@ -1003,7 +967,6 @@ Phase 3B is complete when ALL of the following pass:
| # | Question | Context | Impact | Status |
|---|----------|---------|--------|--------|
| Q-P3B-1 | What is the exact dedicated blocking I/O dispatcher configuration for Script Execution Actors? | KDD-runtime-3 says "dedicated blocking I/O dispatcher" — need Akka.NET HOCON config (thread pool size, throughput settings). | WP-15. Sensible defaults can be set; tuned in Phase 8. | Deferred — use Akka.NET default blocking-io-dispatcher config; tune during Phase 8 performance testing. |
| Q-P3B-2 | Should LmxProxy adapter expose WriteBatchAndWaitAsync (write-and-poll handshake) through IDataConnection or as a protocol-specific extension? | CD-DCL-5 lists WriteBatchAndWaitAsync but IDataConnection only defines simple Write. | WP-8. Does not block core functionality. | Deferred — expose as protocol-specific extension method; not part of IDataConnection core contract. |
| Q-P3B-3 | What is the Rate of Change alarm evaluation time window? | Section 3.4 says "changes faster than a defined threshold" but does not specify the time window (per-second? per-minute? configurable?). | WP-16. Needs a design decision for the evaluation algorithm. | Deferred — implement as configurable window (default: per-second rate). Document in alarm definition schema. |
| Q-P3B-4 | How does the health report sequence number behave across failover? | Sequence number is monotonic within a singleton lifecycle. After failover, the new singleton starts at 1. Central must handle this. | WP-27, WP-28. Central should accept any report from a site marked offline regardless of sequence number. | Resolved in design — central accepts report when site is offline; for online sites, requires seq > last. On failover, site goes offline first (missed reports), so the reset is naturally handled. |
@@ -1123,7 +1086,6 @@ Codex received work package titles (not full acceptance criteria due to prompt s
| 9 | UTC timestamps not covered | **False positive** — UTC is a Phase 0 convention (KDD-data-6). Message contracts in WP-1 specify "All timestamps in message contracts are UTC." Health report in WP-27 specifies "UTC from site clock." |
| 10 | Event log schema and active-node behavior uncovered | **False positive** — WP-29 acceptance criteria list full schema and "Only active node generates and stores events. Event logs not replicated to standby." |
| 11 | Remote query filters/pagination details uncovered | **False positive** — WP-31 acceptance criteria list all filter types, "default 500 events," and "continuation token." |
| 12 | LmxProxy details uncovered in WP-8 | **False positive** — WP-8 acceptance criteria explicitly cover port, API key, SessionId, keep-alive, TLS, batch ops, Polly retry. |
### Step 2 — Negative Requirement Review
@@ -35,7 +35,7 @@
## Requirements Checklist
Each bullet is extracted from the referenced HighLevelReqs.md sections. Items marked with a phase note indicate split-section bullets owned by another phase.
Each bullet is extracted from the referenced docs/requirements/HighLevelReqs.md sections. Items marked with a phase note indicate split-section bullets owned by another phase.
### Section 1.3 — Store-and-Forward Persistence (Site Clusters Only)
@@ -134,7 +134,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-*.md documents rel
- `[KDD-sf-3]` Messages not cleared on instance deletion.
- `[KDD-sf-4]` CachedCall idempotency is the caller's responsibility. *(Documented in Phase 3C; enforced in Phase 7 integration.)*
### Component Design Constraints (from Component-DeploymentManager.md)
### Component Design Constraints (from docs/requirements/Component-DeploymentManager.md)
- `[CD-DM-1]` Deployment flow: validate -> flatten -> send -> track. Validation failures stop the pipeline before anything is sent.
- `[CD-DM-2]` Site-side idempotency on deployment ID — duplicate deployment receives "already applied" response.
@@ -155,7 +155,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-*.md documents rel
- `[CD-DM-17]` Enable: re-activates a disabled instance.
- `[CD-DM-18]` Delete: removes running config, destroys Instance Actor and children. S&F messages not cleared. Fails if site unreachable — central does not mark deleted until site confirms.
### Component Design Constraints (from Component-StoreAndForward.md)
### Component Design Constraints (from docs/requirements/Component-StoreAndForward.md)
- `[CD-SF-1]` Three message categories: external system calls, email notifications, cached database writes.
- `[CD-SF-2]` Retry settings defined on the source entity (external system def, SMTP config, DB connection def), not per-message.
@@ -170,7 +170,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-*.md documents rel
- `[CD-SF-11]` Message format stores: message ID, category, target, payload, retry count, created at, last attempt at, status (pending/retrying/parked).
- `[CD-SF-12]` Message lifecycle: attempt immediate delivery -> success removes; failure buffers -> retry loop -> success removes + notify standby; max retries exhausted -> park.
### Component Design Constraints (from Component-SiteRuntime.md — deployment-related)
### Component Design Constraints (from docs/requirements/Component-SiteRuntime.md — deployment-related)
- `[CD-SR-1]` Deployment handling: receive config -> store in SQLite -> compile scripts -> create/update Instance Actor -> report result.
- `[CD-SR-2]` For redeployments: existing Instance Actor and children stopped, then new Instance Actor created with updated config. Subscriptions re-established.
@@ -179,7 +179,7 @@ Constraints from CLAUDE.md Key Design Decisions and Component-*.md documents rel
- `[CD-SR-5]` Delete: stops Instance Actor and children, removes deployed config from SQLite. Does not clear S&F messages.
- `[CD-SR-6]` Script compilation failure during deployment rejects entire deployment. No partial state applied. Failure reported to central.
### Component Design Constraints (from Component-Communication.md — deployment-related)
### Component Design Constraints (from docs/requirements/Component-Communication.md — deployment-related)
- `[CD-COM-1]` Deployment pattern: request/response. No buffering at central. Unreachable site = immediate failure.
- `[CD-COM-2]` Instance lifecycle pattern: request/response. Unreachable site = immediate failure.
@@ -464,12 +464,12 @@ Constraints from CLAUDE.md Key Design Decisions and Component-*.md documents rel
**Acceptance Criteria**:
- S&F buffer depth reported as health metric (broken down by category) — integrates with Phase 3B Health Monitoring
- S&F activity logged to site event log: message queued, delivered, retried, parked (per Component-StoreAndForward.md Dependencies)
- S&F activity logged to site event log: message queued, delivered, retried, parked (per docs/requirements/Component-StoreAndForward.md Dependencies)
- S&F buffer depth visible in health reports sent to central
**Estimated Complexity**: S
**Requirements Traced**: `[CD-SF-1]` (categories), Component-StoreAndForward.md Dependencies (Site Event Logging, Health Monitoring)
**Requirements Traced**: `[CD-SF-1]` (categories), docs/requirements/Component-StoreAndForward.md Dependencies (Site Event Logging, Health Monitoring)
---
@@ -648,7 +648,7 @@ Every work package traces to at least one requirement or design constraint:
| WP-11 | `[1.3-2]`, `[1.3-4]`, `[1.3-5]`, `[KDD-sf-2]`, `[CD-SF-5]`, `[CD-SF-6]`, `[CD-SF-7]` |
| WP-12 | `[5.4-1]` through `[5.4-4]`, `[KDD-sf-3]`, `[CD-SF-8]`, `[CD-SF-9]`, `[CD-SF-10]`, `[CD-COM-8]`, `[3.8.1-6]` |
| WP-13 | `[3.8.1-4]`, `[3.8.1-6]`, `[KDD-sf-3]`, `[CD-SF-10]` |
| WP-14 | `[CD-SF-1]`, Component-StoreAndForward.md Dependencies |
| WP-14 | `[CD-SF-1]`, docs/requirements/Component-StoreAndForward.md Dependencies |
| WP-15 | `[KDD-sf-4]`, `[CD-SF-7]` |
| WP-16 | `[3.9-6]`, `[KDD-deploy-11]` |
+9 -9
View File
@@ -128,7 +128,7 @@
- [ ] `[KDD-sec-4a]` Load balancer in front of central UI — UI must work behind load balancer (no sticky sessions, JWT-based).
- [ ] `[KDD-deploy-10a]` Deployment status view shows current status only (no deployment history table — audit log provides history).
### From Component-CentralUI.md
### From docs/requirements/Component-CentralUI.md
- [ ] `[CD-CentralUI-1]` No live machine data visualization — UI is focused on system management (except debug views, which are Phase 6).
- [ ] `[CD-CentralUI-2]` Role-based access control enforced in UI: Admin, Design, Deployment with site scoping.
@@ -136,7 +136,7 @@
- [ ] `[CD-CentralUI-4]` Central UI accesses configuration data via `ICentralUiRepository` (read-oriented queries).
- [ ] `[CD-CentralUI-5]` Health dashboard: no historical data — current/latest status only (in-memory at central).
### From Component-HealthMonitoring.md
### From docs/requirements/Component-HealthMonitoring.md
- [ ] `[CD-Health-1]` Health metrics held in memory at central — dashboard shows current/latest status only.
- [ ] `[CD-Health-2]` Online recovery: site automatically marked online when health report received after offline period.
@@ -145,19 +145,19 @@
- [ ] `[CD-Health-5]` No alerting — health monitoring is display-only.
- [ ] `[CD-Health-6]` Error rate metrics: script errors include unhandled exceptions, timeouts, recursion limit violations. Alarm evaluation errors include all failures during condition evaluation.
### From Component-Security.md
### From docs/requirements/Component-Security.md
- [ ] `[CD-Sec-1]` Admin role permissions include: manage sites, data connections, areas, LDAP mappings, API keys, system config, view audit logs. **Phase 4 covers**: sites, data connections, areas, LDAP mappings, API keys. **Phase 6 covers**: audit log viewer. System config is not a separate page — it is covered by the individual admin workflows.
- [ ] `[CD-Sec-2]` Deployment role permissions include: manage instances (lifecycle), deploy, view deployment status, debug view, parked messages, event logs. Site-scoped Deployment only sees their permitted sites. **Phase 4 covers**: instance lifecycle, instance list, deployment status. **Phase 5 covers**: instance create/overrides/binding. **Phase 6 covers**: deploy action, debug view, parked messages, event logs.
- [ ] `[CD-Sec-3]` Every UI action checks authenticated user's roles before proceeding.
- [ ] `[CD-Sec-4]` Site-scoped Deployment checks verify target site is within user's permitted sites.
### From Component-InboundAPI.md
### From docs/requirements/Component-InboundAPI.md
- [ ] `[CD-Inbound-1]` API key properties: name/label, key value, enabled/disabled flag.
- [ ] `[CD-Inbound-2]` All key changes (create, enable/disable, delete) are audit logged.
### From Component-DeploymentManager.md
### From docs/requirements/Component-DeploymentManager.md
- [ ] `[CD-Deploy-1]` Deployment status: pending, in-progress, success, failed — only current status stored.
- [ ] `[CD-Deploy-2]` Per-instance operation lock — UI must handle "operation in progress" error gracefully.
@@ -196,7 +196,7 @@
- Admin can assign/unassign data connections to/from sites.
- Admin can edit data connection details.
- Admin can delete a data connection (blocked if bound to any instance attribute).
- Protocol type selection (OPC UA, LmxProxy).
- Protocol type selection (OPC UA).
- Connection details form varies by protocol type.
- Non-Admin users cannot access data connection management.
- Data connection changes are audit logged.
@@ -491,9 +491,9 @@ The phase is complete when all of the following pass:
| # | Question | Context | Impact | Status |
|---|----------|---------|--------|--------|
| Q-P4-1 | Should the API key value be auto-generated (GUID/random) or allow user-provided values? | Component-InboundAPI.md says "key value" but does not specify generation. | Phase 4, WP-5. | Open — assume auto-generated with optional copy-to-clipboard; user can regenerate. |
| Q-P4-2 | Should the health dashboard support configurable refresh intervals or always use the 30s report interval? | Component-HealthMonitoring.md specifies 30s default interval. | Phase 4, WP-9. | Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config. |
| Q-P4-3 | Should area deletion cascade to child areas or require bottom-up deletion? | HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior. | Phase 4, WP-3. | Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree). |
| Q-P4-1 | Should the API key value be auto-generated (GUID/random) or allow user-provided values? | docs/requirements/Component-InboundAPI.md says "key value" but does not specify generation. | Phase 4, WP-5. | Open — assume auto-generated with optional copy-to-clipboard; user can regenerate. |
| Q-P4-2 | Should the health dashboard support configurable refresh intervals or always use the 30s report interval? | docs/requirements/Component-HealthMonitoring.md specifies 30s default interval. | Phase 4, WP-9. | Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config. |
| Q-P4-3 | Should area deletion cascade to child areas or require bottom-up deletion? | docs/requirements/HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior. | Phase 4, WP-3. | Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree). |
---
+3 -5
View File
@@ -22,12 +22,11 @@
| Q2 | What Akka.NET version? | Latest stable 1.5.x (currently 1.5.62). | 2026-03-16 |
| Q3 | Monorepo or separate repos? | Single monorepo with SLNX solution file (`.slnx`, the new XML-based format default in .NET 10). | 2026-03-16 |
| Q4 | What CI/CD platform? | None for now. No CI/CD pipeline. | 2026-03-16 |
| Q5 | What LDAP server for dev/test? | GLAuth (lightweight LDAP) in Docker. See `infra/glauth/config.toml` and `test_infra_ldap.md`. | 2026-03-16 |
| Q6 | What MS SQL version and hosting? | SQL Server 2022 Developer Edition in Docker. See `infra/docker-compose.yml` and `test_infra_db.md`. | 2026-03-16 |
| Q5 | What LDAP server for dev/test? | GLAuth (lightweight LDAP) in Docker. See `infra/glauth/config.toml` and `docs/test_infra/test_infra_ldap.md`. | 2026-03-16 |
| Q6 | What MS SQL version and hosting? | SQL Server 2022 Developer Edition in Docker. See `infra/docker-compose.yml` and `docs/test_infra/test_infra_db.md`. | 2026-03-16 |
| Q7 | JWT signing key storage? | `appsettings.json` (per environment). | 2026-03-16 |
| Q8 | OPC UA server for dev/test? | Azure IoT OPC PLC simulator in Docker. See `infra/opcua/nodes.json` and `test_infra_opcua.md`. | 2026-03-16 |
| Q8 | OPC UA server for dev/test? | Azure IoT OPC PLC simulator in Docker. See `infra/opcua/nodes.json` and `docs/test_infra/test_infra_opcua.md`. | 2026-03-16 |
| Q10 | Target site hardware? | Windows Server 2022, 24 GB RAM, 1 TB drive, 16-core Xeon. | 2026-03-16 |
| Q9 | What is the custom protocol? Is there an existing specification or SDK? | LmxProxy — gRPC-based protocol (protobuf-net code-first, port 5050, API key auth). Client SDK: `LmxProxyClient` NuGet package. See Component-DataConnectionLayer.md for full API mapping and protocol details. | 2026-03-16 |
| Q11 | Are there specific external systems (MES, recipe manager) to integrate with for initial testing? | REST API test server (`infra/restapi/`) provides simulated external endpoints for External System Gateway and Inbound API testing. No real MES/recipe system needed for initial phases. | 2026-03-16 |
| Q15 | Should the Machine Data Database schema be designed in this project, or is it out of scope? | Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in `infra/mssql/machinedata_seed.sql`. | 2026-03-16 |
| Q13 | Who is the development team? | Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential. | 2026-03-16 |
@@ -45,7 +44,6 @@
| Q-P3A-2 | Single SQLite file or separate files per concern? | Single file with separate tables. Simpler transaction management. | 2026-03-16 |
| Q-P3A-3 | Akka.Persistence or direct SQLite for Deployment Manager singleton? | Direct SQLite. Recovery is full read-all-configs-and-rebuild, not event replay. | 2026-03-16 |
| Q-P3B-1 | Blocking I/O dispatcher config for Script Execution Actors? | Use Akka.NET default blocking-io-dispatcher config. Tune during Phase 8 performance testing. | 2026-03-16 |
| Q-P3B-2 | Should WriteBatchAndWaitAsync be on IDataConnection or protocol-specific? | Add to `IDataConnection` — both OPC UA and LmxProxy can implement it. | 2026-03-16 |
| Q-P3B-3 | Rate of Change alarm evaluation time window? | Configurable window, default per-second rate. Document in alarm definition schema. | 2026-03-16 |
| Q-P3B-4 | Health report sequence number across failover? | Resolved in design — offline detection handles the reset naturally. Central accepts lower seq after site goes offline/online. | 2026-03-16 |
| Q-P3C-1 | S&F retry timers on failover — reset or continue? | Continue from `last_attempt_at` to avoid burst retries. | 2026-03-16 |
+3 -14
View File
@@ -1,6 +1,6 @@
# Requirements Traceability Matrix
**Purpose**: Ensures every requirement from HighLevelReqs.md, every REQ-* identifier, and every design constraint from CLAUDE.md and Component-*.md maps to at least one work package in an implementation phase plan. Updated as plan documents are generated.
**Purpose**: Ensures every requirement from docs/requirements/HighLevelReqs.md, every REQ-* identifier, and every design constraint from CLAUDE.md and docs/requirements/Component-*.md maps to at least one work package in an implementation phase plan. Updated as plan documents are generated.
**Traceability levels**:
- **Section-level** (this document): Maps HighLevelReqs sections, REQ-* IDs, and design constraints to phases. Serves as the index.
@@ -8,7 +8,7 @@
---
## HighLevelReqs.md Sections → Phase Mapping
## docs/requirements/HighLevelReqs.md Sections → Phase Mapping
| Section | Description | Phase(s) | Plan Document | Status |
|---------|-------------|----------|---------------|--------|
@@ -108,7 +108,7 @@
## Design Constraints → Phase Mapping
Design decisions from CLAUDE.md Key Design Decisions and Component-*.md documents that impose implementation constraints beyond what HighLevelReqs specifies. Each is tagged `[KDD-category-N]` (Key Design Decision) or `[CD-Component-N]` (Component Design). Bullet-level extraction happens in the phase plan documents.
Design decisions from CLAUDE.md Key Design Decisions and docs/requirements/Component-*.md documents that impose implementation constraints beyond what docs/requirements/HighLevelReqs specifies. Each is tagged `[KDD-category-N]` (Key Design Decision) or `[CD-Component-N]` (Component Design). Bullet-level extraction happens in the phase plan documents.
### Architecture & Runtime
@@ -216,17 +216,6 @@ Design decisions from CLAUDE.md Key Design Decisions and Component-*.md document
| KDD-code-8 | EF Core migrations: auto-apply in dev, manual SQL scripts for production | CLAUDE.md | 1 | Pending |
| KDD-code-9 | Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network) | CLAUDE.md | 3B | Plan generated |
### LmxProxy Protocol (Component Design)
| ID | Constraint | Source | Phase(s) | Status |
|----|-----------|--------|----------|--------|
| CD-DCL-1 | LmxProxy: gRPC/HTTP/2 transport, protobuf-net code-first, port 5050 | Component-DCL | 3B | Plan generated |
| CD-DCL-2 | LmxProxy: API key auth, session-based (SessionId), 30s keep-alive heartbeat | Component-DCL | 3B | Plan generated |
| CD-DCL-3 | LmxProxy: Server-streaming gRPC for subscriptions, 1000ms default sampling | Component-DCL | 3B | Plan generated |
| CD-DCL-4 | LmxProxy: SDK retry policy (exponential backoff) complements DCL.s fixed-interval reconnect | Component-DCL | 3B | Plan generated |
| CD-DCL-5 | LmxProxy: Batch read/write capabilities (ReadBatchAsync, WriteBatchAsync) | Component-DCL | 3B | Plan generated |
| CD-DCL-6 | LmxProxy: TLS 1.2/1.3, mutual TLS, self-signed for dev | Component-DCL | 3B | Plan generated |
---
## Split-Section Tracking
@@ -2,47 +2,43 @@
## Purpose
The CLI is a standalone command-line tool for scripting and automating administrative operations against the ScadaLink central cluster. It connects to the ManagementActor via Akka.NET ClusterClient — it does not join the cluster as a full member and does not use HTTP/REST. The CLI provides the same administrative capabilities as the Central UI, enabling automation, batch operations, and integration with CI/CD pipelines.
The CLI is a standalone command-line tool for scripting and automating administrative operations against the ScadaLink central cluster. It connects to the Central Host's HTTP Management API (`POST /management`), which dispatches commands to the ManagementActor. Authentication and role resolution are handled server-side — the CLI sends credentials via HTTP Basic Auth. The CLI provides the same administrative capabilities as the Central UI, enabling automation, batch operations, and integration with CI/CD pipelines.
## Location
Standalone executable, not part of the Host binary. Deployed on any Windows machine with network access to the central cluster.
Standalone executable, not part of the Host binary. Deployed on any machine with HTTP access to a central node.
`src/ScadaLink.CLI/`
## Responsibilities
- Parse command-line arguments and dispatch to the appropriate management operation.
- Authenticate the user via LDAP credentials and include identity in every message sent to the ManagementActor.
- Connect to the central cluster via Akka.NET ClusterClient using configured contact points.
- Send management messages to the ManagementActor and display structured responses.
- Send HTTP requests to the Central Host's Management API endpoint with Basic Auth credentials.
- Display structured responses from the Management API.
- Support both JSON and human-readable table output formats.
## Technology
- **Argument parsing**: `System.CommandLine` library for command/subcommand/option parsing with built-in help generation.
- **Transport**: Akka.NET `ClusterClient` connecting to `ClusterClientReceptionist` on the central cluster. The CLI does not join the cluster — it is a lightweight external client.
- **Serialization**: Message contracts from Commons (`Messages/Management/`), same as ManagementActor expects.
- **Transport**: HTTP client connecting to the Central Host's `POST /management` endpoint. Authentication is via HTTP Basic Auth — the server performs LDAP bind and role resolution.
- **Serialization**: Commands serialized as JSON with a type discriminator (`command` field). Message contracts from Commons define the command types.
## Authentication
The CLI authenticates the user against LDAP/AD before any operation:
The CLI sends user credentials to the Management API via HTTP Basic Auth:
1. The user provides credentials via `--username` / `--password` options, or is prompted interactively if omitted.
2. The CLI performs a direct LDAP bind against the configured LDAP server (same mechanism as the Central UI login).
3. On successful bind, the CLI queries group memberships to determine roles and permitted sites.
4. Every message sent to the ManagementActor includes the `AuthenticatedUser` envelope with the user's identity, roles, and site permissions.
5. Credentials are not stored or cached between invocations. Each CLI invocation requires fresh authentication.
LDAP connection settings are read from the CLI configuration (see Configuration section).
1. The user provides credentials via `--username` / `--password` options.
2. On each request, the CLI encodes credentials as a Basic Auth header and sends them with the command.
3. The server performs LDAP authentication, group lookup, and role resolution — the CLI does not communicate with LDAP directly.
4. Credentials are not stored or cached between invocations. Each CLI invocation requires fresh credentials.
## Connection
The CLI uses Akka.NET ClusterClient to connect to the central cluster:
The CLI connects to the Central Host via HTTP:
- **Contact points**: One or more seed node addresses for the ClusterClientReceptionist. The CLI sends an initial contact to these addresses; the receptionist responds with the current set of cluster nodes hosting the ManagementActor.
- **No cluster membership**: The CLI does not join the Akka.NET cluster. It is an external process that communicates via the ClusterClient protocol.
- **Failover**: If the active central node fails over, ClusterClient transparently reconnects to the new active node via the receptionist. In-flight commands may time out and need to be retried.
- **Management URL**: The URL of a central node's web server (e.g., `http://localhost:9001`). The management API is served at `POST /management` on the same host as the Central UI.
- **Failover**: For HA, use a load balancer URL in front of both central nodes. The management API is stateless (Basic Auth per request), so any central node can handle any request without sticky sessions.
- **No Akka.NET dependency**: The CLI is a pure HTTP client with no Akka.NET runtime.
## Command Structure
@@ -92,7 +88,7 @@ scadalink instance delete <code>
```
scadalink site list [--format json|table]
scadalink site get <site-id> [--format json|table]
scadalink site create --name <name> --id <site-id>
scadalink site create --name <name> --id <site-id> [--node-a-address <addr>] [--node-b-address <addr>] [--grpc-node-a-address <addr>] [--grpc-node-b-address <addr>]
scadalink site update <site-id> --file <path>
scadalink site delete <site-id>
scadalink site area list <site-id>
@@ -172,8 +168,21 @@ scadalink health parked-messages --site-identifier <site-id> [--page <n>] [--pag
### Debug Commands
```
scadalink debug snapshot --id <id> [--format json|table]
scadalink debug stream --id <instanceId> [--url ...] [--username ...] [--password ...]
```
The `debug snapshot` command retrieves a point-in-time snapshot via the HTTP Management API.
The `debug stream` command streams live attribute values and alarm state changes in real-time using a SignalR WebSocket connection. The CLI connects to the `/hubs/debug-stream` SignalR hub on the central server, authenticates with Basic Auth, and subscribes to the specified instance. Events are printed as they arrive — JSON format (default) outputs one NDJSON object per event; table format shows streaming rows. Press Ctrl+C to disconnect.
Key behaviors:
- **Automatic reconnection**: Uses SignalR's `.WithAutomaticReconnect()` to re-establish the connection on loss.
- **Re-subscription**: Automatically re-subscribes to the instance after reconnection.
- **Traefik compatible**: Works through the Traefik reverse proxy — WebSocket upgrade is proxied natively.
- **Required role**: `Deployment`.
Unlike `debug snapshot` (which uses the HTTP Management API), `debug stream` uses `Microsoft.AspNetCore.SignalR.Client` as a dependency for its WebSocket transport.
### Shared Script Commands
```
scadalink shared-script list [--format json|table]
@@ -205,28 +214,17 @@ scadalink api-method delete --id <id>
Configuration is resolved in the following priority order (highest wins):
1. **Command-line options**: `--contact-points`, `--username`, `--password`, `--format`.
1. **Command-line options**: `--url`, `--username`, `--password`, `--format`.
2. **Environment variables**:
- `SCADALINK_CONTACT_POINTS` — Comma-separated list of central cluster contact point addresses (e.g., `akka.tcp://ScadaLink@central1:8081,akka.tcp://ScadaLink@central2:8081`).
- `SCADALINK_LDAP_SERVER` — LDAP server address.
- `SCADALINK_LDAP_PORT` — LDAP port (default: 636 for LDAPS).
- `SCADALINK_MANAGEMENT_URL` — Management API URL (e.g., `http://central-host:5000`).
- `SCADALINK_FORMAT` — Default output format (`json` or `table`).
3. **Configuration file**: `~/.scadalink/config.json` — Persistent defaults for contact points, LDAP settings, and output format.
3. **Configuration file**: `~/.scadalink/config.json` — Persistent defaults for management URL and output format.
### Configuration File Format
```json
{
"contactPoints": [
"akka.tcp://ScadaLink@central1:8081",
"akka.tcp://ScadaLink@central2:8081"
],
"ldap": {
"server": "ad.example.com",
"port": 636,
"useTls": true
},
"defaultFormat": "json"
"managementUrl": "http://central-host:5000"
}
```
@@ -240,28 +238,24 @@ Configuration is resolved in the following priority order (highest wins):
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | General error (command failed) |
| 2 | Authentication failure (LDAP bind failed) |
| 3 | Authorization failure (insufficient role) |
| 4 | Connection failure (cannot reach central cluster) |
| 5 | Validation failure (e.g., template validation errors) |
| 1 | General error (command failed, connection failure, or authentication failure) |
| 2 | Authorization failure (insufficient role) |
## Error Handling
- **Connection failure**: If the CLI cannot establish a ClusterClient connection within a timeout (default 10 seconds), it exits with code 4 and a descriptive error message.
- **Command timeout**: If the ManagementActor does not respond within 30 seconds (configurable), the command fails with a timeout error.
- **Authentication failure**: If the LDAP bind fails, the CLI exits with code 2 before sending any commands.
- **Authorization failure**: If the ManagementActor returns an Unauthorized response, the CLI exits with code 3.
- **Connection failure**: If the CLI cannot connect to the management URL (e.g., DNS failure, connection refused), it exits with code 1 and a descriptive error message.
- **Command timeout**: If the server does not respond within 30 seconds, the command fails with a timeout error (HTTP 504).
- **Authentication failure**: If the server returns HTTP 401 (LDAP bind failed), the CLI exits with code 1.
- **Authorization failure**: If the server returns HTTP 403, the CLI exits with code 2.
## Dependencies
- **Commons**: Message contracts (`Messages/Management/`), shared types.
- **Commons**: Message contracts (`Messages/Management/`) for command type definitions and registry.
- **System.CommandLine**: Command-line argument parsing.
- **Akka.NET (Akka.Cluster.Tools)**: ClusterClient for communication with the central cluster.
- **LDAP client library**: For direct LDAP bind authentication (same approach as Security & Auth component).
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
## Interactions
- **Management Service (ManagementActor)**: The CLI's sole runtime dependency. All operations are sent as messages to the ManagementActor via ClusterClient.
- **Security & Auth**: The CLI performs LDAP authentication independently (same LDAP server, same bind mechanism) and passes the authenticated identity to the ManagementActor. The ManagementActor enforces authorization.
- **LDAP/Active Directory**: Direct bind for user authentication before any operation.
- **Management Service (via HTTP)**: The primary runtime dependency. All operations except `debug stream` are sent as HTTP POST requests to the Management API endpoint on a central node, which dispatches to the ManagementActor.
- **Central Host**: Serves the Management API at `POST /management` and the debug stream SignalR hub at `/hubs/debug-stream`. Handles LDAP authentication, role resolution, and ManagementActor dispatch.
- **Debug Stream Hub (via SignalR WebSocket)**: The `debug stream` command connects to the `/hubs/debug-stream` hub on the central server for real-time event streaming. This is the only CLI command that uses a persistent connection rather than request/response.
@@ -18,19 +18,15 @@ Central cluster only. Sites have no user interface.
- A **load balancer** sits in front of the central cluster and routes to the active node.
- On central failover, the Blazor Server SignalR circuit is interrupted. The browser automatically attempts to reconnect via SignalR's built-in reconnection logic.
- Since sessions use **JWT tokens** (not server-side state), the user's authentication survives failover — the new active node validates the same JWT. No re-login required if the token is still valid.
- Active debug view streams and in-progress real-time subscriptions are lost on failover and must be re-opened by the user.
- Since sessions use **authentication cookies** carrying an embedded JWT (not server-side state), the user's authentication survives failover — the new active node validates the same cookie-embedded JWT. No re-login required if the token is still valid.
- Active debug view streams and in-progress deployment status subscriptions are lost on failover and must be re-opened by the user.
- Both central nodes share the same **ASP.NET Data Protection keys** (stored in the configuration database or shared configuration) so that tokens and anti-forgery tokens remain valid across failover.
## Real-Time Updates
All real-time features use **server push via SignalR** (built into Blazor Server):
- **Debug view**: Attribute value and alarm state changes streamed live from sites.
- **Health dashboard**: Site status, connection health, error rates, and buffer depths update automatically when new health reports arrive.
- **Deployment status**: Pending/in-progress/success/failed transitions push to the UI immediately.
No manual refresh or polling is required for any of these features.
- **Debug view**: Real-time display of attribute values and alarm states via **gRPC streaming**. When the user opens a debug view, a `DebugStreamBridgeActor` on the central side opens a gRPC server-streaming subscription to the site's `SiteStreamGrpcServer` for the selected instance, then requests an initial `DebugViewSnapshot` via ClusterClient. Ongoing `AttributeValueChanged` and `AlarmStateChanged` events flow via the gRPC stream (not through ClusterClient) to the bridge actor, which delivers them to the Blazor component via callbacks that call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
- **Health dashboard**: Site status, connection health, error rates, and buffer depths update via a **10-second auto-refresh timer**. Since health reports arrive from sites every 30 seconds, a 10s poll interval catches updates within one reporting cycle without unnecessary overhead.
- **Deployment status**: Pending/in-progress/success/failed transitions **push to the UI immediately** via SignalR (built into Blazor Server). No polling required for deployment tracking.
## Responsibilities
@@ -41,6 +37,9 @@ No manual refresh or polling is required for any of these features.
## Workflows / Pages
### Template Authoring (Design Role)
- The `/design/templates` page uses a **split-pane layout**: a folder/template tree sidebar on the left and the editor on the right.
- The tree shows nested `TemplateFolder` entities with their templates underneath; composition children render inline as leaf nodes beneath their owning template (right-click "Open composed template" reveals and selects the target).
- **Per-kind context menus** on folder, template, and composition nodes expose the relevant operations (new folder, new template, rename, move, delete, move to folder). Native HTML5 **drag-drop** reorganizes templates between folders and reparents folders, with cycle detection rejected via toast on drop. Tree expansion state persists in `sessionStorage`, and deep links (`/design/templates/{id}`) reveal and select the target node.
- Create, edit, and delete templates.
- **Template deletion** is blocked if any instances or child templates reference the template. The UI displays the references preventing deletion.
- Manage template hierarchy (inheritance) — visual tree of parent/child relationships.
@@ -70,8 +69,10 @@ No manual refresh or polling is required for any of these features.
- Configure SMTP settings.
### Site & Data Connection Management (Admin Role)
- Create, edit, and delete site definitions.
- Create, edit, and delete site definitions, including Akka node addresses (NodeA/NodeB) and gRPC node addresses (GrpcNodeA/GrpcNodeB).
- Define data connections and assign them to sites (name, protocol type, connection details).
- **Data connection form**: "Primary Endpoint Configuration" (required JSON text area) and optional "Backup Endpoint Configuration" (collapsible section, hidden by default, revealed via "Add Backup Endpoint" button; "Remove Backup" button when editing an existing backup). "Failover Retry Count" numeric input (default 3, min 1, max 20) is visible only when a backup endpoint is configured.
- **Data connection list page**: Shows Primary Config and Backup Config columns. Active Endpoint column populated from health reports.
### Area Management (Admin Role)
- Define hierarchical area structures per site.
@@ -105,7 +106,10 @@ No manual refresh or polling is required for any of these features.
### Debug View (Deployment Role)
- Select a deployed instance and open a live debug view.
- Real-time streaming of all attribute values (with quality and timestamp) and alarm states for that instance.
- Initial snapshot of current state followed by streaming updates via the site-wide Akka stream.
- The `DebugStreamService` creates a `DebugStreamBridgeActor` on the central side. The bridge actor opens a **gRPC server-streaming subscription** to the site's `SiteStreamGrpcServer` for the selected instance, then requests an initial `DebugViewSnapshot` via ClusterClient.
- Ongoing events (`AttributeValueChanged`, `AlarmStateChanged`) flow via the gRPC stream directly to the bridge actor — they do not pass through ClusterClient.
- Events are delivered to the Blazor component via callbacks, which call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
- A pulsing "Live" indicator replaces the static "Connected" badge when streaming is active.
- Stream includes attribute values formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]` and alarm states formatted as `[InstanceUniqueName].[AlarmName]`.
- Subscribe-on-demand — stream starts when opened, stops when closed.
@@ -32,7 +32,7 @@ Both central and site clusters.
- The Site Runtime Deployment Manager runs as an **Akka.NET cluster singleton** on the active node, owning the full Instance Actor hierarchy.
- One standby node receives replicated store-and-forward data and is ready to take over.
- Connected to local SQLite databases (store-and-forward buffer, event logs, deployed configurations).
- Connected to machines via data connections (OPC UA, LmxProxy).
- Connected to machines via data connections (OPC UA).
## Failover Behavior
@@ -106,7 +106,8 @@ The Host component wires CoordinatedShutdown into the Windows Service lifecycle
Each node is configured with:
- **Cluster seed nodes**: **Both nodes** are seed nodes — each node lists both itself and its partner. Either node can start first and form the cluster; the other joins when it starts. No startup ordering dependency.
- **Cluster role**: Central or Site (plus site identifier for site clusters).
- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication.
- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication (default 8081 central, 8082 site).
- **gRPC port** (site nodes only): Dedicated HTTP/2 port for the SiteStreamGrpcServer (default 8083). Separate from the Akka remoting port — gRPC uses Kestrel, Akka uses its own TCP transport.
- **Local storage paths**: SQLite database locations (site nodes only).
## Windows Service
@@ -32,7 +32,8 @@ Commons must define shared primitive and utility types used across multiple comp
- **`InstanceState` enum**: Enabled, Disabled.
- **`DeploymentStatus` enum**: Pending, InProgress, Success, Failed.
- **`AlarmState` enum**: Active, Normal.
- **`AlarmTriggerType` enum**: ValueMatch, RangeViolation, RateOfChange.
- **`AlarmLevel` enum**: None, Low, LowLow, High, HighHigh. Severity level for an active alarm; always `None` for binary trigger types, set by `HiLo` triggers.
- **`AlarmTriggerType` enum**: ValueMatch, RangeViolation, RateOfChange, HiLo.
- **`ConnectionHealth` enum**: Connected, Disconnected, Connecting, Error.
Types defined here must be immutable and thread-safe.
@@ -2,7 +2,7 @@
## Purpose
The Communication component manages all messaging between the central cluster and site clusters using Akka.NET. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs).
The Communication component manages all messaging between the central cluster and site clusters. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs). Two transports are used: **Akka.NET ClusterClient** for command/control messaging and **gRPC server-streaming** for real-time data (attribute values, alarm states).
## Location
@@ -10,12 +10,15 @@ Both central and site clusters. Each side has communication actors that handle m
## Responsibilities
- Resolve site addresses from the configuration database and maintain a cached address map.
- Establish and maintain cross-cluster connections using Akka.NET ClusterClient/ClusterClientReceptionist.
- Resolve site addresses (Akka remoting and gRPC) from the configuration database and maintain a cached address map.
- Establish and maintain cross-cluster connections using Akka.NET ClusterClient/ClusterClientReceptionist for command/control.
- Establish and maintain per-site gRPC streaming connections for real-time data delivery (site→central).
- Route messages between central and site clusters in a hub-and-spoke topology.
- Broker requests from external systems (via central) to sites and return responses.
- Support multiple concurrent message patterns (request/response, fire-and-forget, streaming).
- Detect site connectivity status for health monitoring.
- Host the **SiteStreamGrpcServer** on site nodes (Kestrel HTTP/2) to serve real-time event streams.
- Manage per-site **SiteStreamGrpcClient** instances on central nodes via **SiteStreamGrpcClientFactory**.
## Communication Patterns
@@ -35,6 +38,7 @@ Both central and site clusters. Each side has communication actors that handle m
- **Pattern**: Broadcast with per-site acknowledgment (deploy to all sites), or targeted to a single site (per-site deployment).
- When shared scripts, external system definitions, database connections, data connections, notification lists, or SMTP configuration are explicitly deployed, central sends them to the target site(s).
- Each site acknowledges receipt and reports success/failure independently.
- **Shared script deployment triggers immediate recompilation on the site** — the site's `SharedScriptLibrary` replaces its in-memory compiled code, making updated shared scripts available to all running instances without redeployment. Other artifact types (external systems, database connections, etc.) are stored but do not require recompilation.
### 4. Integration Routing (External System → Central → Site → Central → External System)
- **Pattern**: Request/Response (brokered).
@@ -50,15 +54,55 @@ Both central and site clusters. Each side has communication actors that handle m
- Site applies and acknowledges.
### 6. Debug Streaming (Site → Central)
- **Pattern**: Subscribe/stream with initial snapshot.
- Central sends a subscribe request for a specific instance (identified by unique name).
- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it to central.
- Site then subscribes to the **site-wide Akka stream** filtered by the instance's unique name and forwards attribute value changes and alarm state changes to central.
- **Pattern**: Subscribe/push with initial snapshot. Two transports: **ClusterClient** for the subscribe/unsubscribe handshake and initial snapshot, **gRPC server-streaming** for ongoing real-time events.
- A **DebugStreamBridgeActor** (one per active debug session) is created on the central cluster by the **DebugStreamService**. The bridge actor first opens a **gRPC server-streaming subscription** to the site via `SiteStreamGrpcClient`, then sends a `SubscribeDebugViewRequest` to the site via `CentralCommunicationActor` (ClusterClient). The site's `InstanceActor` replies with an initial snapshot via the ClusterClient reply path.
- **gRPC stream (real-time events)**: The site's **SiteStreamGrpcServer** receives the gRPC `SubscribeInstance` call and creates a **StreamRelayActor** that subscribes to **SiteStreamManager** for the requested instance. Events (`AttributeValueChanged`, `AlarmStateChanged`) flow from `SiteStreamManager``StreamRelayActor``Channel<SiteStreamEvent>` (bounded, 1000, DropOldest) → gRPC response stream → `SiteStreamGrpcClient` on central → `DebugStreamBridgeActor`.
- The `DebugStreamEvent` message type no longer exists — events are not routed through ClusterClient. `SiteCommunicationActor` and `CentralCommunicationActor` have no role in streaming event delivery.
- The bridge actor forwards received events to the consumer via callbacks (Blazor component or SignalR hub).
- **Snapshot-to-stream handoff**: The gRPC stream is opened **before** the snapshot request to avoid missing events. The consumer applies the snapshot as baseline, then replays buffered gRPC events with timestamps newer than the snapshot (timestamp-based dedup).
- Attribute value stream messages: `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp.
- Alarm state stream messages: `[InstanceUniqueName].[AlarmName]`, state (active/normal), priority, timestamp.
- Central sends an unsubscribe request when the debug view closes. The site removes its stream subscription.
- Central sends an unsubscribe request via ClusterClient when the debug session ends. The gRPC stream is cancelled. The site's `StreamRelayActor` is stopped and the SiteStreamManager subscription is removed.
- The stream is session-based and temporary.
#### Site-Side gRPC Streaming Components
- **SiteStreamGrpcServer**: gRPC service (`SiteStreamService.SiteStreamServiceBase`) hosted on each site node via Kestrel HTTP/2 on a dedicated port (default 8083). Implements the `SubscribeInstance` RPC. For each subscription, creates a `StreamRelayActor` that subscribes to `SiteStreamManager`, bridges events through a `Channel<SiteStreamEvent>` to the gRPC response stream. Tracks active subscriptions by `correlation_id` — duplicate IDs cancel the old stream. Enforces a max concurrent stream limit (default 100). Rejects streams with `StatusCode.Unavailable` before the actor system is ready.
- **StreamRelayActor**: Short-lived actor created per gRPC subscription. Receives domain events (`AttributeValueChanged`, `AlarmStateChanged`) from `SiteStreamManager`, converts them to protobuf `SiteStreamEvent` messages, and writes to the `Channel<SiteStreamEvent>` writer. Stopped when the gRPC stream is cancelled or the client disconnects.
#### Central-Side Debug Stream Components
- **DebugStreamService**: Singleton service that manages debug stream sessions. Resolves instance ID to unique name and site, creates and tears down `DebugStreamBridgeActor` instances, and provides a clean API for both Blazor components and the SignalR hub. Injects `SiteStreamGrpcClientFactory` for gRPC stream creation.
- **DebugStreamBridgeActor**: One per active debug session. Opens a gRPC streaming subscription via `SiteStreamGrpcClient` and receives real-time events via callback. Also receives the initial `DebugViewSnapshot` via ClusterClient. Forwards all events to the consumer via callbacks. Handles gRPC stream errors with reconnection logic: tries the other site node endpoint, retries with backoff (max 3 retries), terminates the session if all retries fail.
- **SiteStreamGrpcClient**: Per-site gRPC client that manages `GrpcChannel` instances and streaming subscriptions. Reads from the gRPC response stream in a background task, converts protobuf messages to domain events, and invokes the `onEvent` callback.
- **SiteStreamGrpcClientFactory**: Caches per-site `SiteStreamGrpcClient` instances. Reads `GrpcNodeAAddress` / `GrpcNodeBAddress` from the `Site` entity (loaded by `CentralCommunicationActor`). Falls back to NodeB if NodeA connection fails. Disposes clients on site removal or address change.
- **DebugStreamHub**: SignalR hub at `/hubs/debug-stream` for external consumers (e.g., CLI). Authenticates via Basic Auth + LDAP and requires the **Deployment** role. Server-to-client methods: `OnSnapshot`, `OnAttributeChanged`, `OnAlarmChanged`, `OnStreamTerminated`.
#### gRPC Proto Definition
The streaming protocol is defined in `sitestream.proto` (`src/ScadaLink.Communication/Protos/sitestream.proto`):
- **Service**: `SiteStreamService` with a single RPC `SubscribeInstance(InstanceStreamRequest) returns (stream SiteStreamEvent)`.
- **Messages**: `InstanceStreamRequest` (correlation_id, instance_unique_name), `SiteStreamEvent` (correlation_id, oneof event: `AttributeValueUpdate`, `AlarmStateUpdate`).
- The `oneof event` pattern is extensible — future event types (health metrics, connection state changes) are added as new fields without breaking existing consumers.
- Proto field numbers are never reused. Old clients ignore unknown `oneof` variants.
#### gRPC Connection Keepalive
Three layers of dead-client detection prevent orphan streams on site nodes:
| Layer | Detects | Timeline | Mechanism |
|-------|---------|----------|-----------|
| TCP RST | Clean process death, connection close | 15s | OS-level TCP, `WriteAsync` throws |
| gRPC keepalive PING | Network partition, silent crash, firewall drop | ~25s | HTTP/2 PING frames, `CancellationToken` fires |
| Session timeout | Misconfigured keepalive, long-lived zombie streams | 4 hours | `CancellationTokenSource.CancelAfter` |
Keepalive settings are configurable via `CommunicationOptions`:
- `GrpcKeepAlivePingDelay`: 15 seconds (default)
- `GrpcKeepAlivePingTimeout`: 10 seconds (default)
- `GrpcMaxStreamLifetime`: 4 hours (default)
- `GrpcMaxConcurrentStreams`: 100 (default)
### 6a. Debug Snapshot (Central → Site)
- **Pattern**: Request/Response (one-shot, no subscription).
- Central sends a `DebugSnapshotRequest` (identified by instance unique name) to the site.
@@ -84,12 +128,17 @@ Both central and site clusters. Each side has communication actors that handle m
```
Central Cluster
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist)
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist)
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist)
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist) [command/control]
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist) [command/control]
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist) [command/control]
├── SiteStreamGrpcClient ◄── gRPC stream ── Site A (SiteStreamGrpcServer) [real-time data]
├── SiteStreamGrpcClient ◄── gRPC stream ── Site B (SiteStreamGrpcServer) [real-time data]
└── SiteStreamGrpcClient ◄── gRPC stream ── Site N (SiteStreamGrpcServer) [real-time data]
Site Clusters
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist)
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist) [command/control]
└── SiteStreamGrpcServer (Kestrel HTTP/2, port 8083) → serves gRPC streams [real-time data]
```
- Sites do **not** communicate with each other.
@@ -100,8 +149,8 @@ Site Clusters
Central discovers site addresses through the **configuration database**, not runtime registration:
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing base Akka addresses of the site's cluster nodes (e.g., `akka.tcp://scadalink@host:port`).
- The **CentralCommunicationActor** loads all site addresses from the database at startup and creates one **ClusterClient per site**, configured with both NodeA and NodeB as contact points.
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing base Akka addresses of the site's cluster nodes (e.g., `akka.tcp://scadalink@host:port`), and optional **GrpcNodeAAddress** and **GrpcNodeBAddress** fields containing gRPC endpoints (e.g., `http://host:8083`).
- The **CentralCommunicationActor** loads all site addresses from the database at startup and creates one **ClusterClient per site**, configured with both NodeA and NodeB as contact points. The **SiteStreamGrpcClientFactory** uses `GrpcNodeAAddress` / `GrpcNodeBAddress` to create per-site gRPC channels for streaming.
- The address cache is **refreshed every 60 seconds** and **on-demand** when site records are added, edited, or deleted via the Central UI or CLI. ClusterClient instances are recreated when contact points change.
- When routing a message to a site, central sends via `ClusterClient.Send("/user/site-communication", msg)`. **ClusterClient handles failover between NodeA and NodeB internally** — there is no application-level NodeA preference/NodeB fallback logic.
- **Heartbeats** from sites serve **health monitoring only** — they do not serve as a registration or address discovery mechanism.
@@ -159,7 +208,7 @@ The ManagementActor is registered at the well-known path `/user/management` on c
## Connection Failure Behavior
- **In-flight messages**: When a connection drops while a request is in flight (e.g., deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages.
- **Debug streams**: Any connection interruption (failover or network blip) kills the debug stream. The engineer must reopen the debug view in the Central UI to re-establish the subscription with a fresh snapshot. There is no auto-resume.
- **Debug streams**: Any gRPC stream interruption triggers reconnection logic in the `DebugStreamBridgeActor`. The bridge actor attempts to reconnect to the other site node endpoint (NodeB if NodeA failed, or vice versa), with up to 3 retries and 5-second backoff. If all retries fail, the consumer is notified via `OnStreamTerminated` and the bridge actor is stopped. Events during the reconnection gap are lost (acceptable for real-time debug view). On successful reconnection, the consumer can request a fresh snapshot to re-sync state.
## Failover Behavior
@@ -168,9 +217,11 @@ The ManagementActor is registered at the well-known path `/user/management` on c
## Dependencies
- **Akka.NET Remoting + ClusterClient**: Provides the transport layer. ClusterClient/ClusterClientReceptionist used for all cross-cluster messaging.
- **Akka.NET Remoting + ClusterClient**: Provides the command/control transport layer. ClusterClient/ClusterClientReceptionist used for cross-cluster command/control messaging (deployments, lifecycle, subscribe/unsubscribe handshake, snapshots).
- **gRPC (Grpc.AspNetCore + Grpc.Net.Client)**: Provides the real-time data streaming transport. Site nodes host a gRPC server (SiteStreamGrpcServer); central nodes create per-site gRPC clients (SiteStreamGrpcClient).
- **Cluster Infrastructure**: Manages node roles and failover detection.
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress) for address resolution.
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress for Akka remoting; GrpcNodeAAddress, GrpcNodeBAddress for gRPC streaming) for address resolution.
- **Site Runtime (SiteStreamManager)**: The SiteStreamGrpcServer subscribes to SiteStreamManager to receive real-time events for gRPC delivery.
## Interactions
@@ -28,7 +28,8 @@ Central cluster only. Site clusters do not access the configuration database —
The configuration database stores all central system data, organized by domain area:
### Template & Modeling
- **Templates**: Template definitions (name, parent template reference, description).
- **Templates**: Template definitions (name, parent template reference, description, nullable `FolderId` FK to `TemplateFolders` — null means the template lives at the tree root).
- **TemplateFolders**: Hierarchical organizational folders for templates (`Id`, `Name`, nullable `ParentFolderId` self-reference, `SortOrder`). Unique index on `(ParentFolderId, Name)` enforces case-insensitive sibling uniqueness. Folders are UI-only — they have no effect on template resolution or flattening.
- **Template Attributes**: Attribute definitions per template (name, value, data type, lock flag, description, data source reference).
- **Template Alarms**: Alarm definitions per template (name, description, priority, lock flag, trigger type, trigger configuration, on-trigger script reference).
- **Template Scripts**: Script definitions per template (name, lock flag, C# source code, trigger type, trigger configuration, minimum time between runs, parameter definitions, return value definitions).
@@ -42,7 +43,7 @@ The configuration database stores all central system data, organized by domain a
- **Shared Scripts**: System-wide reusable script definitions (name, C# source code, parameter definitions, return value definitions).
### Sites & Data Connections
- **Sites**: Site definitions (name, identifier, description).
- **Sites**: Site definitions (name, identifier, description, NodeAAddress, NodeBAddress, GrpcNodeAAddress, GrpcNodeBAddress).
- **Data Connections**: Data connection definitions (name, protocol type, connection details) with site assignments.
### External Systems & Database Connections
@@ -0,0 +1,222 @@
# Component: Data Connection Layer
## Purpose
The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a **clean data pipe** — it performs no evaluation of triggers, alarm conditions, or business logic.
## Location
Site clusters only. Central does not interact with machines directly.
## Responsibilities
- Manage data connections defined centrally and deployed to sites as part of artifact deployment (OPC UA servers). Data connection definitions are stored in local SQLite after deployment.
- Establish and maintain connections to data sources based on deployed instance configurations.
- Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
- Deliver tag value updates to the requesting Instance Actors.
- Support writing values to machines (when Instance Actors forward `SetAttribute` write requests for data-connected attributes).
- Report data connection health status to the Health Monitoring component.
## Common Interface
All protocol adapters implement the same interface:
```
IDataConnection : IAsyncDisposable
├── Connect(connectionDetails) → void
├── Disconnect() → void
├── Subscribe(tagPath, callback) → subscriptionId
├── Unsubscribe(subscriptionId) → void
├── Read(tagPath) → value
├── ReadBatch(tagPaths) → values
├── Write(tagPath, value) → void
├── WriteBatch(values) → void
├── WriteBatchAndWait(values, flagPath, flagValue, responsePath, responseValue, timeout) → bool
├── Status → ConnectionHealth
└── Disconnected → event Action?
```
The `Disconnected` event is raised by an adapter when it detects an unexpected connection loss (server offline, network failure, keep-alive timeout). The `DataConnectionActor` subscribes to this event to trigger the reconnection state machine. Additional protocols can be added by implementing this interface.
### Common Value Type
All protocols produce the same value tuple consumed by Instance Actors. Before the first value update arrives from the DCL, data-sourced attributes are held at **uncertain** quality by the Instance Actor (see Site Runtime — Initialization):
| Concept | ScadaLink Design |
|---|---|
| Value container | `TagValue(Value, Quality, Timestamp)` |
| Quality | `QualityCode` enum: Good / Bad / Uncertain |
| Timestamp | `DateTimeOffset` (UTC) |
| Value type | `object?` |
## Supported Protocols
### OPC UA
- Uses the **OPC Foundation .NET Standard Library** (`OPCFoundation.NetStandard.Opc.Ua.Client`).
- Session-based connection with endpoint discovery, certificate handling, and configurable security modes.
- Subscriptions via OPC UA Monitored Items with data change notifications (1000ms sampling, queue size 10, discard-oldest).
- Read/Write via OPC UA Read/Write services with StatusCode-based quality mapping.
- Disconnect detection via `Session.KeepAlive` event (see Disconnect Detection Pattern below).
## Endpoint Redundancy
Data connections support an optional backup endpoint for automatic failover when the active endpoint becomes unreachable. Both endpoints use the same protocol.
**Entity fields:**
| Field | Type | Notes |
|-------|------|-------|
| `PrimaryConfiguration` | string? (max 4000) | Required. Renamed from `Configuration` |
| `BackupConfiguration` | string? (max 4000) | Optional. Null = no backup |
| `FailoverRetryCount` | int (default 3) | Retries on active endpoint before switching |
**Failover state machine:**
```
Connected → disconnect → push bad quality → retry active endpoint (5s)
→ N failures (≥ FailoverRetryCount) → switch to other endpoint
→ dispose adapter, create fresh adapter with other config
→ reconnect → ReSubscribeAll → Connected
```
- **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working.
- **No auto-failback**: The connection remains on the active endpoint until it fails.
- **Single-endpoint connections** (no backup): Retry indefinitely on the same endpoint, preserving existing behavior.
- **Adapter lifecycle on failover**: The actor disposes the current `IDataConnection` adapter and creates a fresh one via `DataConnectionFactory.Create()` with the other endpoint's configuration. Clean slate — no stale state.
**Health reporting:**
- `DataConnectionHealthReport` includes `ActiveEndpoint`: `"Primary"`, `"Backup"`, or `"Primary (no backup)"`.
**Site event log entries:**
- `DataConnectionFailover` (Warning) — connection name, from-endpoint, to-endpoint, failure count.
- `DataConnectionRestored` (Info) — connection name, active endpoint.
See [`2026-03-22-primary-backup-data-connections-design.md`](../plans/2026-03-22-primary-backup-data-connections-design.md) for the full design.
## Connection Configuration Reference
All settings are parsed from the data connection's configuration JSON dictionaries (`PrimaryConfiguration` and optional `BackupConfiguration`, stored as `IDictionary<string, string>` connection details). Both endpoints use the same protocol-specific keys. Invalid numeric values fall back to defaults silently.
### OPC UA Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `endpoint` / `EndpointUrl` | string | `opc.tcp://localhost:4840` | OPC UA server endpoint URL |
| `SessionTimeoutMs` | int | `60000` | OPC UA session timeout in milliseconds |
| `OperationTimeoutMs` | int | `15000` | Transport operation timeout in milliseconds |
| `PublishingIntervalMs` | int | `1000` | Subscription publishing interval in milliseconds |
| `KeepAliveCount` | int | `10` | Keep-alive frames before session timeout |
| `LifetimeCount` | int | `30` | Subscription lifetime in publish intervals |
| `MaxNotificationsPerPublish` | int | `100` | Max notifications batched per publish cycle |
| `SamplingIntervalMs` | int | `1000` | Per-item server sampling rate in milliseconds |
| `QueueSize` | int | `10` | Per-item notification buffer size |
| `SecurityMode` | string | `None` | Preferred endpoint security: `None`, `Sign`, or `SignAndEncrypt` |
| `AutoAcceptUntrustedCerts` | bool | `true` | Accept untrusted server certificates |
### Shared Settings (appsettings.json)
These are configured via `DataConnectionOptions` in `appsettings.json`, not per-connection:
| Setting | Default | Description |
|---------|---------|-------------|
| `ReconnectInterval` | 5s | Fixed interval between reconnection attempts |
| `TagResolutionRetryInterval` | 10s | Retry interval for unresolved tag paths |
| `WriteTimeout` | 30s | Timeout for write operations |
## Subscription Management
- When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
- The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
- Tag value updates are delivered directly to the requesting Instance Actor.
- When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
- When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.
## Write-Back Support
- When a script calls `Instance.SetAttribute` for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
- The DCL writes the value to the physical device via the appropriate protocol.
- The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
- The Instance Actor's in-memory value is **not** updated until the device confirms the write.
## Value Update Message Format
Each value update delivered to an Instance Actor includes:
- **Tag path**: The relative path of the attribute's data source reference.
- **Value**: The new value from the device.
- **Quality**: Data quality indicator (good, bad, uncertain).
- **Timestamp**: When the value was read from the device.
## Connection Actor Model
Each data connection is managed by a dedicated connection actor that uses the Akka.NET **Become/Stash** pattern to model its lifecycle as a state machine:
- **Connecting**: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are **stashed** (buffered in the actor's stash).
- **Connected**: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
- **Reconnecting**: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.
This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.
**OPC UA-specific notes**: The `RealOpcUaClient` uses the OPC Foundation SDK's `Session.KeepAlive` event for proactive disconnect detection. The SDK sends keep-alive requests at the subscription's `KeepAliveCount × PublishingInterval` (default: 10s). When keep-alive fails, the `ConnectionLost` event fires, triggering the same reconnection flow. On reconnection, the DCL re-creates the OPC UA session and subscription, then re-adds all monitored items.
## Connection Lifecycle & Reconnection
The DCL manages connection lifecycle automatically:
1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. Individual gRPC/OPC UA operations (reads, writes) fail immediately to the caller on error; there is no operation-level retry within the adapter.
3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging.
4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions.
### Disconnect Detection Pattern
Each adapter implements the `IDataConnection.Disconnected` event to proactively signal connection loss to the `DataConnectionActor`. Detection uses two complementary paths:
**Proactive detection** (server goes offline between operations):
- **OPC UA**: The OPC Foundation SDK fires `Session.KeepAlive` events at regular intervals. `RealOpcUaClient` hooks this event; when `ServiceResult.IsBad(e.Status)` (server unreachable, keep-alive timeout), it fires `ConnectionLost`. The `OpcUaDataConnection` adapter translates this into `IDataConnection.Disconnected`.
**Reactive detection** (failure discovered during an operation):
- Both adapters wrap `ReadAsync` (and by extension `ReadBatchAsync`) with exception handling. If a read throws a non-cancellation exception, the adapter calls `RaiseDisconnected()` and re-throws. The `DataConnectionActor`'s existing error handling catches the exception while the disconnect event triggers the reconnection state machine.
**Event marshalling**: The `DataConnectionActor` subscribes to `_adapter.Disconnected` in `PreStart()`. Since `Disconnected` may fire from a background thread (gRPC stream task, OPC UA keep-alive timer), the handler sends an `AdapterDisconnected` message to `Self`, marshalling the notification onto the actor's message loop. This triggers `BecomeReconnecting()` → bad quality push → retry timer.
**Once-only guard**: `OpcUaDataConnection` uses a `volatile bool _disconnectFired` flag to ensure `RaiseDisconnected()` fires exactly once per connection session. The flag resets on successful reconnection (`ConnectAsync`).
## Write Failure Handling
Writes to physical devices are **synchronous** from the script's perspective:
- If the write fails (connection down, device rejection, timeout), the error is **returned to the calling script**. Script authors can catch and handle write errors (log, notify, retry, etc.).
- Write failures are also logged to Site Event Logging.
- There is **no store-and-forward for device writes** — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.
## Tag Path Resolution
When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):
1. The failure is **logged to Site Event Logging**.
2. The attribute is marked with quality `bad`.
3. The DCL **periodically retries resolution** at a configurable interval, accommodating devices that come online in stages or load modules after startup.
4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.
Note: Pre-deployment validation at central does **not** verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.
## Health Reporting
The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:
- **Connection status**: `connected`, `disconnected`, or `reconnecting` per data connection.
- **Tag resolution counts**: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.
## Dependencies
- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests.
- **Health Monitoring**: Reports connection status.
- **Site Event Logging**: Logs connection status changes.
## Interactions
- **Site Runtime (Instance Actors)**: Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
- **Health Monitoring**: Reports connection health periodically.
- **Site Event Logging**: Logs connection/disconnection events.
@@ -45,7 +45,7 @@ The Host must bind configuration sections from `appsettings.json` to strongly-ty
| Section | Options Class | Owner | Contents |
|---------|--------------|-------|----------|
| `ScadaLink:Node` | `NodeOptions` | Host | Role, NodeHostname, SiteId, RemotingPort |
| `ScadaLink:Node` | `NodeOptions` | Host | Role, NodeHostname, SiteId, RemotingPort, GrpcPort (site only, default 8083) |
| `ScadaLink:Cluster` | `ClusterOptions` | ClusterInfrastructure | SeedNodes, SplitBrainResolverStrategy, StableAfter, HeartbeatInterval, FailureDetectionThreshold, MinNrOfMembers |
| `ScadaLink:Database` | `DatabaseOptions` | Host | Central: ConfigurationDb, MachineDataDb connection strings; Site: SQLite paths |
@@ -79,6 +79,7 @@ Before the Akka.NET actor system is created, the Host must validate all required
- `NodeConfiguration.Role` must be a valid `NodeRole` value.
- `NodeConfiguration.NodeHostname` must not be null or empty.
- `NodeConfiguration.RemotingPort` must be in valid port range (165535).
- Site nodes must have `GrpcPort` in valid port range (165535) and different from `RemotingPort`.
- Site nodes must have a non-empty `SiteId`.
- Central nodes must have non-empty `ConfigurationDb` and `MachineDataDb` connection strings.
- Site nodes must have non-empty SQLite path values. Site nodes do **not** require a `ConfigurationDb` connection string — all configuration is received via artifact deployment and read from local SQLite.
@@ -112,14 +113,24 @@ The Host must configure the Akka.NET actor system using Akka.Hosting with:
On central nodes, the Host must configure the Akka.NET **ClusterClientReceptionist** and register the ManagementActor with it. This allows external processes (e.g., the CLI) to discover and communicate with the ManagementActor via ClusterClient without joining the cluster as full members. The receptionist is started as part of the Akka.NET bootstrap (REQ-HOST-6) on central nodes only.
### REQ-HOST-7: ASP.NET Web Endpoints (Central Only)
### REQ-HOST-7: ASP.NET Web Endpoints
On central nodes, the Host must use `WebApplication.CreateBuilder` to produce a full ASP.NET Core host with Kestrel, and must map web endpoints for:
- Central UI (via `MapCentralUI()` extension method).
- Inbound API (via `MapInboundAPI()` extension method).
On site nodes, the Host must use `Host.CreateDefaultBuilder` to produce a generic `IHost`**not** a `WebApplication`. This ensures no Kestrel server is started, no HTTP port is opened, and no web endpoint or middleware pipeline is configured. Site nodes are headless and must never accept inbound HTTP connections.
On site nodes, the Host must also use `WebApplication.CreateBuilder` (not `Host.CreateDefaultBuilder`) to host the **SiteStreamGrpcServer** via Kestrel HTTP/2 on the configured `GrpcPort` (default 8083). Kestrel is configured with `HttpProtocols.Http2` on the gRPC port only — no HTTP/1.1 web endpoints are exposed. The gRPC service is mapped via `MapGrpcService<SiteStreamGrpcServer>()`.
**Startup ordering (site nodes)**:
1. Actor system and SiteStreamManager must be initialized before gRPC begins accepting connections.
2. The gRPC server rejects streams with `StatusCode.Unavailable` until the actor system is ready.
**Shutdown ordering (site nodes)**:
1. On `CoordinatedShutdown`, stop accepting new gRPC streams first.
2. Cancel all active gRPC streams (triggering client-side reconnect).
3. Tear down actors.
4. Use `IHostApplicationLifetime.ApplicationStopping` to signal the gRPC server.
### REQ-HOST-8: Structured Logging
@@ -99,6 +99,21 @@ Each API method definition includes:
- This allows complex request/response structures (e.g., an object containing properties and a list of nested objects).
- Template attributes retain the simpler four-type system. The extended types apply only to Inbound API method definitions and External System Gateway method definitions.
## Script Compilation & Hot-Reload
API method scripts are compiled at central startup — all method definitions are loaded from the configuration database and compiled into in-memory delegates.
### Update Workflow
- Updating a method via the CLI (`api-method update --id <N> --code '...'`) or Management API triggers immediate recompilation (`CompileAndRegister`). The updated script takes effect on the next API call — no node restart is required.
- Creating a new method after startup: if the method is created but not yet compiled, the first invocation triggers lazy (on-demand) compilation.
### Direct SQL Warning
> **Do not edit API method scripts via direct SQL.** The in-memory compiled script will not be updated until the next node restart. Always use the CLI, Management API, or Central UI to modify API method scripts.
---
## API Call Logging
- **Only failures are logged.** Script execution errors (500 responses) are logged centrally.
@@ -141,6 +156,10 @@ Inbound API scripts **cannot** call shared scripts directly — shared scripts a
- **Input parameters** are available as defined in the method definition.
- **Return value** construction matching the defined return structure.
#### Parameter Access
- `Parameters["key"]` — Raw dictionary access.
- `Parameters.Get<T>("key")` — Typed access (same API as site runtime scripts). See Site Runtime component for full type support.
#### Database Access
- `Database.Connection("connectionName")` — Obtain a raw MS SQL client connection for querying the configuration or machine data databases directly from central.
@@ -2,11 +2,11 @@
## Purpose
The Management Service is an Akka.NET actor on the central cluster that provides programmatic access to all administrative operations. It exposes the same capabilities as the Central UI but through an actor-based interface, enabling the CLI (and potentially other tooling) to interact with the system without going through the web UI. The ManagementActor registers with ClusterClientReceptionist so that external processes can reach it via ClusterClient without joining the cluster.
The Management Service is an Akka.NET actor on the central cluster that provides programmatic access to all administrative operations. It exposes the same capabilities as the Central UI but through an actor-based interface, enabling the CLI (and potentially other tooling) to interact with the system without going through the web UI. The ManagementActor registers with ClusterClientReceptionist for cross-cluster access and is also exposed via an HTTP Management API endpoint (`POST /management`) for external tools like the CLI.
## Location
Central cluster only (active node). The ManagementActor runs as a cluster singleton on the central cluster.
Central cluster only. The ManagementActor runs as a plain actor on **every** central node (not a cluster singleton). Because the actor is completely stateless — it holds no locks and no local state, delegating all work to repositories and services — running on all nodes improves availability without requiring coordination between instances. Either node can serve any request independently.
`src/ScadaLink.ManagementService/`
@@ -14,10 +14,11 @@ Central cluster only (active node). The ManagementActor runs as a cluster single
- Provide an actor-based interface to all administrative operations available in the Central UI.
- Register with Akka.NET ClusterClientReceptionist so external tools (CLI) can discover and communicate with it via ClusterClient.
- Expose an HTTP API endpoint (`POST /management`) that accepts JSON commands with Basic Auth, performs LDAP authentication and role resolution, and dispatches to the ManagementActor.
- Validate and authorize all incoming commands using the authenticated user identity carried in message envelopes.
- Delegate to the appropriate services and repositories for each operation.
- Return structured response messages for all commands and queries.
- Failover: The ManagementActor is available on the active central node and fails over with it. ClusterClient handles reconnection transparently.
- Failover: The ManagementActor runs on all central nodes, so no actor-level failover is needed. If one node goes down, the ClusterClient transparently routes to the ManagementActor on the remaining node.
## Key Classes
@@ -25,6 +26,18 @@ Central cluster only (active node). The ManagementActor runs as a cluster single
The central actor that receives and processes all management commands. Registered at a well-known actor path (`/user/management`) and with ClusterClientReceptionist.
### ManagementEndpoints
Minimal API endpoint (`POST /management`) that serves as the HTTP interface to the ManagementActor. Handles Basic Auth decoding, LDAP authentication via `LdapAuthService`, role resolution via `RoleMapper`, command deserialization via `ManagementCommandRegistry`, and ManagementActor dispatch.
### ManagementActorHolder
DI-registered singleton that holds the `IActorRef` for the ManagementActor. Set during actor registration in `AkkaHostedService` and injected into the HTTP endpoint handler.
### ManagementCommandRegistry
Static registry mapping command names (e.g., `"ListSites"`) to command types (e.g., `ListSitesCommand`). Built via reflection at startup. Used by the HTTP endpoint to deserialize JSON payloads into the correct command type.
### Message Contracts
All request/response messages are defined in **Commons** under `Messages/Management/`. Messages follow the existing additive-only evolution rules for version compatibility. Every request message includes:
@@ -36,6 +49,31 @@ All request/response messages are defined in **Commons** under `Messages/Managem
The ManagementActor registers itself with `ClusterClientReceptionist` at startup. This allows external processes using `ClusterClient` to send messages to the ManagementActor without joining the Akka.NET cluster as a full member. The receptionist advertises the actor under its well-known path.
## HTTP Management API
The Management Service also exposes a `POST /management` endpoint on the Central Host's web server. This provides an HTTP interface to the same ManagementActor, enabling the CLI (and other HTTP clients) to interact without Akka.NET dependencies.
**Request format:**
```json
POST /management
Authorization: Basic base64(username:password)
Content-Type: application/json
{
"command": "ListSites",
"payload": {}
}
```
**Response mapping:**
- `ManagementSuccess` → HTTP 200 with JSON body
- `ManagementError` → HTTP 400 with `{ "error": "...", "code": "..." }`
- `ManagementUnauthorized` → HTTP 403 with `{ "error": "...", "code": "UNAUTHORIZED" }`
- Authentication failure → HTTP 401
- Actor timeout → HTTP 504
The endpoint performs LDAP authentication and role resolution server-side, collapsing the CLI's previous two-step flow (ResolveRoles + actual command) into a single HTTP round-trip.
## Message Groups
### Templates
@@ -45,6 +83,15 @@ The ManagementActor registers itself with `ClusterClientReceptionist` at startup
- **ValidateTemplate**: Run on-demand pre-deployment validation (flattening, naming collisions, script compilation).
- **GetTemplateDiff**: Compare deployed vs. template-derived configuration for an instance.
### Template Folders
- **ListTemplateFolders**: List all template folders (read-only; any authenticated user).
- **CreateTemplateFolder** (`Name`, `ParentFolderId?`): Create a folder, optionally nested under a parent (Design role).
- **RenameTemplateFolder** (`FolderId`, `NewName`): Rename a folder; enforces sibling uniqueness (Design role).
- **MoveTemplateFolder** (`FolderId`, `NewParentFolderId?`): Move a folder to a new parent (or root); rejects cycles (Design role).
- **DeleteTemplateFolder** (`FolderId`): Delete a folder; blocked if the folder contains any subfolders or templates (Design role).
- **MoveTemplateToFolder** (`TemplateId`, `NewFolderId?`): Move a template into a folder, or to the root when null (Design role).
### Template Members
- **AddTemplateAttribute** / **UpdateTemplateAttribute** / **DeleteTemplateAttribute**: Manage attributes on a template.
@@ -168,14 +215,14 @@ The ManagementActor receives the following services and repositories via DI (inj
- **Configuration Database (via IAuditService)**: All mutating operations are audit logged through the existing transactional audit mechanism.
- **Communication Layer**: Deployment commands and remote queries (parked messages, event logs) are routed to sites via Communication.
- **Security & Auth**: Authorization rules are enforced on every command using the authenticated user identity from the message envelope.
- **Cluster Infrastructure**: ManagementActor runs on the active central node; ClusterClientReceptionist requires cluster membership.
- **Cluster Infrastructure**: ManagementActor runs on all central nodes; ClusterClientReceptionist requires cluster membership.
- **All service components**: The ManagementActor delegates to the same services used by the Central UI — Template Engine, Deployment Manager, etc.
## Interactions
- **CLI**: The primary consumer. Connects via Akka.NET ClusterClient and sends management messages to the ManagementActor.
- **CLI**: The primary consumer. Connects via the HTTP Management API (`POST /management`) and sends commands as JSON with Basic Auth credentials.
- **Host**: Registers the ManagementActor and ClusterClientReceptionist on central nodes during startup.
- **Central UI**: Shares the same underlying services and repositories. The ManagementActor and Central UI are parallel interfaces to the same operations.
- **Communication Layer**: Deployment commands and remote site queries flow through communication actors.
- **Configuration Database (via IAuditService)**: All configuration changes are audited.
- **Security & Auth**: The ManagementActor enforces authorization using user identity passed in messages. The CLI is responsible for authenticating the user and including their identity in every request.
- **Security & Auth**: The ManagementActor enforces authorization using user identity passed in messages. For HTTP API access, the Management endpoint authenticates the user via LDAP and resolves roles before dispatching to the ManagementActor.
@@ -24,17 +24,21 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
## Session Management
### JWT Tokens
### Cookie + JWT Hybrid
- On successful authentication, the app issues a **JWT** signed with a shared symmetric key (HMAC-SHA256). Both central cluster nodes use the same signing key from configuration, so either node can issue and validate tokens.
- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs. All authorization decisions are made from token claims without hitting the database.
- The JWT is embedded in an **authentication cookie** rather than being passed as a bearer token. This is the correct transport for Blazor Server, where persistent SignalR circuits do not carry Authorization headers — the browser automatically sends the cookie with every SignalR connection and HTTP request.
- The cookie is **HttpOnly** and **Secure** (requires HTTPS).
- On each request, the server extracts and validates the JWT from the cookie. All authorization decisions are made from the JWT claims without hitting the database.
- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs.
### Token Lifecycle
- **JWT expiry**: 15 minutes. On each request, if the token is near expiry, the app re-queries LDAP for current group memberships and issues a fresh token with updated claims. Roles are never more than 15 minutes stale.
- **JWT expiry**: 15 minutes. On each request, if the cookie-embedded JWT is near expiry, the app re-queries LDAP for current group memberships and issues a fresh JWT, writing an updated cookie. Roles are never more than 15 minutes stale.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the token is not refreshed and the user must re-login. Tracked via a last-activity timestamp in the token.
- **Sliding refresh**: Active users stay logged in indefinitely — the token refreshes every 15 minutes as long as requests are made within the 30-minute idle window.
### Load Balancer Compatibility
- JWT tokens are self-contained — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store. Central failover is transparent to users with valid tokens.
- The authentication cookie carries a self-contained JWT — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store.
- Since both central nodes share the same JWT signing key, either node can validate the cookie-embedded JWT. Central failover is transparent to users with valid cookies.
## LDAP Connection Failure
@@ -113,7 +113,7 @@ Deployment Manager Singleton (Cluster Singleton)
### Debug View Support
- On request from central (via Communication Layer), the Instance Actor provides a **snapshot** of all current attribute values and alarm states.
- Subsequent changes are delivered via the site-wide Akka stream, filtered by instance unique name.
- Subsequent changes are delivered via the **SiteStreamManager****SiteStreamGrpcServer** → gRPC stream to central. The Instance Actor publishes attribute value and alarm state changes to the SiteStreamManager; it does not forward events directly to the Communication Layer.
- The Instance Actor also handles one-shot `DebugSnapshotRequest` messages: it builds the same snapshot (attribute values and alarm states) and replies directly to the sender. Unlike `SubscribeDebugViewRequest`, no subscriber is registered and no stream is established.
### Supervision Strategy
@@ -176,19 +176,20 @@ When the Instance Actor is stopped (due to disable, delete, or redeployment), Ak
### Alarm Evaluation
- Subscribes to attribute change notifications from its parent Instance Actor for the attribute(s) referenced by its trigger definition.
- On each value update, evaluates the trigger condition:
- **Value Match**: Incoming value equals the predefined target.
- **Value Match**: Incoming value equals the predefined target. Supports `"!=X"` prefix for not-equals semantics.
- **Range Violation**: Value is outside the allowed min/max range.
- **Rate of Change**: Value change rate exceeds the defined threshold over time.
- When the condition is met and the alarm is currently in **normal** state, the alarm transitions to **active**:
- **Rate of Change**: Value change rate exceeds the defined threshold over a configurable time window. Direction filter (rising / falling / either) restricts which side of the rate triggers.
- **HiLo**: Multi-setpoint level alarm with up to four configurable setpoints (LoLo, Lo, Hi, HiHi). Any subset may be configured. Each setpoint may carry its own priority that overrides the alarm-level priority for that band.
- For binary trigger types (ValueMatch / RangeViolation / RateOfChange), when the condition is met and the alarm is currently in **normal** state, the alarm transitions to **active**:
- Updates the alarm state on the parent Instance Actor (which publishes to the Akka stream).
- If an on-trigger script is defined, spawns an Alarm Execution Actor to execute it.
- When the condition clears and the alarm is in **active** state, the alarm transitions to **normal**:
- Updates the alarm state on the parent Instance Actor.
- No script execution on clear.
- When the condition clears and the alarm is in **active** state, the alarm transitions to **normal**.
- For HiLo triggers, the actor tracks the current `AlarmLevel` (None / Low / LowLow / High / HighHigh). Each level transition emits a fresh `AlarmStateChanged` with the new level and its priority; level escalations (e.g., High → HighHigh) and de-escalations (HighHigh → High) both produce events. The on-trigger script fires only on the Normal → non-None edge, not on escalations between alarm bands.
- No script execution on clear in any trigger type.
### Alarm State
- Held **in memory** only — not persisted to SQLite.
- On restart (or failover), alarm states are re-evaluated from incoming values. All alarms start in normal state and transition to active when conditions are detected.
- Held **in memory** only — not persisted to SQLite. State comprises `AlarmState` (Active / Normal) and `AlarmLevel` (None for binary triggers; the active band for HiLo).
- On restart (or failover), alarm states are re-evaluated from incoming values. All alarms start in normal state with level None and transition based on incoming values.
### Alarm Execution Actor
- **Short-lived** child actor created when an on-trigger script needs to execute.
@@ -209,6 +210,32 @@ When the Instance Actor is stopped (due to disable, delete, or redeployment), Ak
---
## Script Lifecycle
All script types can be updated without restarting the cluster, but the mechanism differs per type.
### Instance Scripts and Alarm On-Trigger Scripts
- Compiled at deployment time when the Deployment Manager receives a flattened configuration and creates the Instance Actor hierarchy.
- **To update**: modify the script in the template, then redeploy the instance (`instance deploy --id <N>`).
- Redeployment stops the existing Instance Actor and all its children, creates a new Instance Actor, and recompiles all scripts and alarms from the updated configuration.
- There is no way to hot-reload a single instance script without redeploying the entire instance.
### Shared Scripts
- Compiled at the site when received from central via artifact deployment (`deploy artifacts`).
- The `SharedScriptLibrary` replaces its in-memory compiled code dictionary under a lock, making updated code immediately available to all Script Actors.
- **To update**: modify the shared script via the CLI or UI, then run `deploy artifacts` to push the change to sites.
- No instance redeployment is required — running instances pick up the new shared script code on the next `Scripts.CallShared()` invocation.
### Inbound API Method Scripts
- See Component-InboundAPI.md for the compilation and hot-reload lifecycle of API method scripts.
> **Warning**: Editing scripts via direct SQL does not trigger recompilation. The in-memory compiled script will remain stale until the next node restart or redeployment. Always use the CLI, Management API, or Central UI to modify scripts.
---
## Script Runtime API
Available to all Script Execution Actors and Alarm Execution Actors:
@@ -232,6 +259,14 @@ Available to all Script Execution Actors and Alarm Execution Actors:
- `Database.Connection("connectionName")` — Obtain a raw MS SQL client connection (ADO.NET) for synchronous read/write.
- `Database.CachedWrite("connectionName", "sql", parameters)` — Submit a write operation for store-and-forward delivery.
### Parameter Access
- `Parameters["key"]` — Raw dictionary access (returns `object?`, requires manual casting).
- `Parameters.Get<T>("key")` — Typed access with descriptive error messages. Throws `ScriptParameterException` if parameter is missing, null, or cannot be converted to `T`.
- `Parameters.Get<T?>("key")` — Nullable typed access. Returns `null` if parameter is missing, null, or cannot be converted.
- `Parameters.Get<T[]>("key")` — Array access. Converts a list parameter to a typed array. Throws on first unconvertible element.
- `Parameters.Get<List<T>>("key")` — List access. Converts a list parameter to a typed `List<T>`.
- Supported types: `bool`, `int`, `long`, `float`, `double`, `string`, `DateTime`.
### Recursion Limit
- Every script call (`Instance.CallScript` and `Scripts.CallShared`) increments a call depth counter.
- If the counter exceeds the maximum recursion depth (default: 10), the call fails with an error.
@@ -280,10 +315,16 @@ Per Akka.NET best practices, internal actor communication uses **Tell** (fire-an
- Script Execution Actors may run concurrently, but all state mutations (attribute reads/writes, alarm state updates) are mediated through the parent Instance Actor's message queue.
- External side effects (external system calls, notifications, database writes) are not serialized — concurrent scripts may produce interleaved side effects. This is acceptable because each side effect is independent.
## SiteStreamManager and gRPC Integration
- The `SiteStreamManager` implements the `ISiteStreamSubscriber` interface, allowing the Communication Layer's `SiteStreamGrpcServer` to subscribe to the stream for cross-cluster delivery via gRPC.
- When a gRPC `SubscribeInstance` call arrives, the `SiteStreamGrpcServer` creates a `StreamRelayActor` and subscribes it to `SiteStreamManager` for the requested instance. Events flow from `SiteStreamManager``StreamRelayActor``Channel<SiteStreamEvent>` → gRPC response stream to central.
- The `SiteStreamManager` filters events by instance unique name and forwards matching events to all registered subscribers (both local debug consumers and gRPC relay actors).
## Site-Wide Stream Backpressure
- The site-wide Akka stream uses **per-subscriber buffering** with bounded buffers. Each subscriber (debug view, future consumers) gets an independent buffer.
- If a subscriber falls behind (e.g., slow network on debug view), its buffer fills and oldest events are dropped. This does not affect other subscribers or the publishing Instance Actors.
- The site-wide Akka stream uses **per-subscriber buffering** with bounded buffers. Each subscriber (gRPC relay actors, future consumers) gets an independent buffer.
- If a subscriber falls behind (e.g., slow network on gRPC stream), its buffer fills and oldest events are dropped. This does not affect other subscribers or the publishing Instance Actors.
- Instance Actors publish to the stream with **fire-and-forget** semantics — publishing never blocks the actor.
## Error Handling
@@ -23,6 +23,7 @@ Central cluster only. Sites receive flattened output and have no awareness of te
- Perform comprehensive pre-deployment validation (see Validation section).
- Provide on-demand validation for Design users during template authoring.
- Enforce template deletion constraints — templates cannot be deleted if any instances or child templates reference them.
- Organize templates into nested folders (`TemplateFolder` entity) and validate folder hierarchy invariants (acyclicity, sibling uniqueness, non-empty-on-delete).
## Key Entities
@@ -33,6 +34,14 @@ Central cluster only. Sites receive flattened output and have no awareness of te
- Defines attributes, alarms, and scripts as first-class members.
- Cannot be deleted if referenced by instances or child templates.
- Concurrent editing uses **last-write-wins** — no pessimistic locking or conflict detection.
- May belong to a `TemplateFolder` via nullable `FolderId`, or live at the tree root when null.
### TemplateFolder
- Hierarchical organizational entity with a self-referencing `ParentFolderId` (null at the root).
- Sibling folder names are unique (case-insensitive) within the same parent.
- Folders carry **no semantic meaning** for template resolution, flattening, validation, or inheritance — they exist purely for UI organization.
- Folder deletion is blocked if the folder contains any subfolders or templates.
- The folder graph is enforced acyclic on move (a folder cannot become its own descendant).
### Attribute
- Name, Value, Data Type (Boolean, Integer, Float, String), Lock Flag, Description.
+138
View File
@@ -0,0 +1,138 @@
# Component: Traefik Proxy
## Purpose
The Traefik Proxy is a reverse proxy and load balancer that sits in front of the central cluster's two web servers. It provides a single stable URL for the CLI, browser, and external API consumers, automatically routing traffic to the active central node. When the active node fails over, Traefik detects the change via health checks and redirects traffic to the new active node without manual intervention.
## Location
Runs as a Docker container (`scadalink-traefik`) in the cluster compose stack (`docker/docker-compose.yml`). Not part of the application codebase — it is a third-party infrastructure component with static configuration files.
`docker/traefik/`
## Responsibilities
- Route all HTTP traffic (Central UI, Management API, Inbound API, health endpoints) to the active central node.
- Health-check both central nodes via `/health/active` to determine which is the active (cluster leader) node.
- Automatically fail over to the standby node when the active node goes down.
- Provide a dashboard for monitoring routing state and backend health.
## How It Works
### Active Node Detection
Traefik polls `/health/active` on both central nodes every 5 seconds. This endpoint returns:
- **HTTP 200** on the active node (the Akka.NET cluster leader).
- **HTTP 503** on the standby node (or if the node is unreachable).
Only the node returning 200 receives traffic. The health check is implemented by `ActiveNodeHealthCheck` in the Host project, which checks `Cluster.Get(system).State.Leader == SelfMember.Address`.
### Failover Sequence
1. Active node fails (crash, network partition, or graceful shutdown).
2. Akka.NET cluster detects the failure (~10s heartbeat timeout).
3. Split-brain resolver acts after stable-after period (~15s).
4. Surviving node becomes cluster leader.
5. `ActiveNodeHealthCheck` on the surviving node starts returning 200.
6. Traefik's next health poll (within 5s) detects the change.
7. Traffic routes to the new active node.
**Total failover time**: ~2530s (Akka failover ~25s + Traefik poll interval up to 5s).
### SignalR / Blazor Server Considerations
Blazor Server uses persistent SignalR connections (WebSocket circuits). During failover:
- Active SignalR circuits on the failed node are lost.
- The browser's SignalR reconnection logic attempts to reconnect.
- Traefik routes the reconnection to the new active node.
- The user's session survives because authentication uses cookie-embedded JWT with shared Data Protection keys across both central nodes.
- The user may see a brief "Reconnecting..." overlay before the circuit re-establishes.
## Configuration
### Static Config (`docker/traefik/traefik.yml`)
```yaml
entryPoints:
web:
address: ":80"
api:
dashboard: true
insecure: true
providers:
file:
filename: /etc/traefik/dynamic.yml
```
- **Entrypoint `web`**: Listens on port 80 (mapped to host port 9000).
- **Dashboard**: Enabled in insecure mode (no auth) for development. Accessible at `http://localhost:8180`.
- **File provider**: Loads routing rules from a static YAML file (no Docker socket required).
### Dynamic Config (`docker/traefik/dynamic.yml`)
```yaml
http:
routers:
central:
rule: "PathPrefix(`/`)"
service: central
entryPoints:
- web
services:
central:
loadBalancer:
healthCheck:
path: /health/active
interval: 5s
timeout: 3s
servers:
- url: "http://scadalink-central-a:5000"
- url: "http://scadalink-central-b:5000"
```
- **Router `central`**: Catches all requests and forwards to the `central` service.
- **Service `central`**: Load balancer with two backends (both central nodes) and a health check on `/health/active`.
- **Health check interval**: 5 seconds. A server failing the health check is removed from the pool within one interval.
## Ports
| Host Port | Container Port | Purpose |
|-----------|---------------|---------|
| 9000 | 80 | Load-balanced entrypoint (Central UI, Management API, Inbound API) |
| 8180 | 8080 | Traefik dashboard |
## Health Endpoints
The central nodes expose three health endpoints:
| Endpoint | Purpose | Who Uses It |
|----------|---------|-------------|
| `/health/ready` | Readiness gate — 200 when database + Akka cluster are healthy | Kubernetes probes, monitoring |
| `/health/active` | Active node — 200 only on cluster leader | **Traefik** (routing decisions) |
## Dependencies
- **Central cluster nodes**: The two backends (`scadalink-central-a`, `scadalink-central-b`) on the `scadalink-net` Docker network.
- **ActiveNodeHealthCheck**: Health check implementation in `src/ScadaLink.Host/Health/ActiveNodeHealthCheck.cs` that determines cluster leader status.
- **Docker network**: All containers must be on the shared `scadalink-net` bridge network.
## Interactions
- **CLI**: Connects to `http://localhost:9000/management` — routed by Traefik to the active node.
- **Browser (Central UI)**: Connects to `http://localhost:9000` — Blazor Server + SignalR routed to the active node.
- **Inbound API consumers**: Connect to `http://localhost:9000/api/{methodName}` — routed to the active node.
- **Cluster Infrastructure**: The `ActiveNodeHealthCheck` relies on Akka.NET cluster gossip state to determine the leader.
## Production Considerations
The current configuration is for development/testing. In production:
- **TLS termination**: Add HTTPS entrypoint with certificates (Let's Encrypt via Traefik's ACME provider, or static certs).
- **Dashboard auth**: Disable `insecure: true` and configure authentication on the dashboard.
- **WebSocket support**: Traefik supports WebSocket proxying natively — no additional config needed for SignalR.
- **Sticky sessions**: Not required. The Management API is stateless (Basic Auth per request). Blazor Server circuits are bound to a specific node via SignalR, but reconnection handles failover transparently.
+785
View File
@@ -0,0 +1,785 @@
# TreeView Component
## Purpose
A reusable, generic Blazor Server component that renders hierarchical data as an expandable/collapsible tree. The component is data-agnostic — it accepts any tree-shaped data via type parameters and render fragments, following the same pattern as the existing `DataTable<TItem>` shared component.
## Location
`src/ScadaLink.CentralUI/Components/Shared/TreeView.razor`
## Primary Use Case: Instance Hierarchy
The motivating use case is displaying instances organized by site and area:
```
- Site A
+ Area 1
- Sub Area 1
Instance 1
Instance 2
+ Area 2
+ Site B
+ Site C
```
**Hierarchy**: Site → Area → Sub Area (recursive) → Instance (leaf)
Nodes at each level may be expandable (branches) or plain items (leaves). Leaf nodes have no expand/collapse toggle.
## Requirements
### R1 — Generic Type Parameter
The component accepts a single type parameter `TItem` representing any node in the tree. The consumer provides:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `Items` | `IReadOnlyList<TItem>` | Yes | Root-level items |
| `ChildrenSelector` | `Func<TItem, IReadOnlyList<TItem>>` | Yes | Returns children for a given node |
| `HasChildrenSelector` | `Func<TItem, bool>` | Yes | Whether the node can be expanded (branch vs. leaf) |
| `KeySelector` | `Func<TItem, object>` | Yes | Unique key per node (for state tracking) |
### R2 — Render Fragments
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `NodeContent` | `RenderFragment<TItem>` | Yes | Renders the label/content for each node |
| `EmptyContent` | `RenderFragment?` | No | Shown when `Items` is empty |
The `NodeContent` fragment receives the `TItem` and is responsible for rendering the node's display (text, icons, badges, action buttons, etc.). The tree component only renders the structural chrome (indentation, expand/collapse toggle, vertical guide lines).
### R3 — Expand/Collapse Behavior
- Each branch node displays a toggle indicator: `+` when collapsed, `` when expanded.
- Clicking the **toggle icon** expands/collapses the node. Clicking the **content area** does **not** toggle expansion (it is reserved for selection — see R5).
- Leaf nodes (where `HasChildrenSelector` returns `false`) display no toggle — they are indented inline with sibling branch nodes.
- Expand/collapse state is tracked internally by the component using `KeySelector` for identity.
- All nodes start collapsed by default unless `InitiallyExpanded` is set.
- **Session persistence**: When the user navigates away and returns, previously expanded nodes are restored (see R11).
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `InitiallyExpanded` | `Func<TItem, bool>?` | No | Predicate — nodes matching this start expanded (first load only, before any persisted state exists) |
### R4 — Indentation and Visual Structure
The component renders the structural chrome: indent gutters per depth, the toggle slot, and ancestor guide lines. Leaf nodes render an empty toggle placeholder so labels align across siblings.
The exact tokens (indent unit, toggle glyph, guide-line treatment) are specified in **V2** of the Visual Design Guide.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `IndentPx` | `int` | No | Pixels per indent level. Default: 24 |
| `ShowGuideLines` | `bool` | No | Show vertical connector lines. Default: true |
### R5 — Selection
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `Selectable` | `bool` | No | Enable click-to-select. Default: false |
| `SelectedKey` | `object?` | No | Currently selected node key (two-way binding) |
| `SelectedKeyChanged` | `EventCallback<object?>` | No | Fires when selection changes |
| `SelectedCssClass` | `string` | No | CSS class for selected node. Default: `"bg-primary bg-opacity-10"` |
When `Selectable` is true, clicking a node row selects it (highlighted). Clicking the expand/collapse toggle does **not** change selection — only clicking the content area does.
### R6 — Lazy Loading (Deferred)
Future enhancement. For now, all children are provided synchronously via `ChildrenSelector`. A future version may support `Func<TItem, Task<IReadOnlyList<TItem>>>` for on-demand loading with a spinner placeholder.
### R7 — Keyboard Navigation (Deferred)
Future enhancement. Arrow keys for navigation, Enter/Space for expand/collapse, Home/End for first/last.
### R8 — External Filtering
The tree component itself does **not** implement filter UI. Filtering is driven externally by the consuming page — for example, a site dropdown that filters the tree to show only the selected site's hierarchy.
**How it works:**
- The consumer filters `Items` (and/or adjusts `ChildrenSelector` results) and passes the filtered list to the component.
- When `Items` changes (Blazor re-render), the component re-renders the tree with the new data.
- **Expansion state is preserved across filter changes.** Nodes that were expanded before filtering remain expanded if they reappear after the filter changes. The component tracks expanded keys independently of the current `Items` — keys are never purged when items disappear, so re-adding a previously expanded node restores its expanded state.
- Selection is cleared if the selected node is no longer present after filtering.
**Example — site filter on the instances page:**
```razor
<select class="form-select form-select-sm" @bind="_selectedSiteId">
<option value="">All Sites</option>
@foreach (var site in _sites)
{
<option value="@site.Id">@site.Name</option>
}
</select>
<TreeView TItem="TreeNode" Items="GetFilteredRoots()" ...>
...
</TreeView>
@code {
private int? _selectedSiteId;
private List<TreeNode> GetFilteredRoots()
{
if (_selectedSiteId == null) return _allRoots;
return _allRoots.Where(r => r.SiteId == _selectedSiteId).ToList();
}
}
```
This keeps filter logic in the page (domain-specific) while the component handles rendering whatever it receives.
### R9 — Styling
- Uses Bootstrap 5 utility classes and CSS variables. No third-party Blazor component frameworks.
- Adds one icon-library dependency: **Bootstrap Icons** (static files at `wwwroot/lib/bootstrap-icons/`). Distribution rules in **V4** of the Visual Design Guide.
- Hardcoded colors are forbidden; use Bootstrap utility classes (`bg-primary bg-opacity-10`, `text-muted`) or CSS variables (`var(--bs-tertiary-bg)`, `var(--bs-border-color)`).
- Component-local CSS lives in `TreeView.razor.css` (Blazor CSS isolation).
- All visual tokens (row density, indent, state visuals, glyphs, labels, badges) are specified in the **Visual Design Guide** (V1V7). This requirement is non-normative summary; the Guide is authoritative.
### R10 — No Internal Scrolling
The tree renders inline in the page flow. The consuming page is responsible for placing it in a scrollable container if needed (e.g., `overflow-auto` with `max-height`).
### R11 — Session-Persistent Expansion State
When a user expands nodes, navigates away (e.g., clicks an instance link to the configure page), and returns to the page, the tree must restore the same expansion state.
**Mechanism:**
- The component requires a `StorageKey` parameter — a unique string identifying this tree instance (e.g., `"instances-tree"`, `"data-connections-tree"`).
- Expanded node keys are stored in browser `sessionStorage` under the key `treeview:{StorageKey}`.
- On mount (`OnAfterRenderAsync` first render), the component reads `sessionStorage` and expands any nodes whose keys are present. This takes precedence over `InitiallyExpanded`.
- On every expand/collapse toggle, the component writes the updated set of expanded keys to `sessionStorage`.
- `sessionStorage` is scoped to the browser tab — each tab has independent state. State is cleared when the tab is closed.
**Implementation note:** Blazor Server requires `IJSRuntime` to access `sessionStorage`. The component injects `IJSRuntime` and uses a small JS interop helper (inline or in a shared `.js` file) for `getItem`/`setItem`.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `StorageKey` | `string?` | No | Key for sessionStorage persistence. If null, expansion state is not persisted (in-memory only). |
### R12 — Expand All / Collapse All
The component exposes methods that the consumer can call via `@ref`:
```csharp
/// Expands all branch nodes in the tree (recursive).
public void ExpandAll();
/// Collapses all branch nodes in the tree.
public void CollapseAll();
```
**Usage:**
```razor
<button class="btn btn-outline-secondary btn-sm" @onclick="() => _tree.ExpandAll()">Expand All</button>
<button class="btn btn-outline-secondary btn-sm" @onclick="() => _tree.CollapseAll()">Collapse All</button>
<TreeView @ref="_tree" TItem="TreeNode" ... />
@code {
private TreeView<TreeNode> _tree = default!;
}
```
Both methods update sessionStorage if `StorageKey` is set. `ExpandAll` requires walking the full tree via `ChildrenSelector` to collect all branch node keys.
### R13 — Programmatic Expand-to-Node
The component exposes a method to reveal a specific node by expanding all of its ancestors:
```csharp
/// Expands all ancestor nodes so that the node with the given key becomes visible.
/// Optionally selects the node and scrolls it into view.
public void RevealNode(object key, bool select = false);
```
This requires the component to build a parent lookup (key → parent key) from the tree data. When called:
1. Walk from the target node's key up to the root, collecting ancestor keys.
2. Expand all ancestors.
3. If `select` is true, set the node as selected and fire `SelectedKeyChanged`.
4. After rendering, scroll the node element into view via JS interop (`element.scrollIntoView({ block: 'nearest' })`).
**Use case:** Search box on the instances page — user types "Motor-1", results list shows matching instances. Clicking a result calls `_tree.RevealNode(instanceKey, select: true)` to expand the Site → Area path and highlight the instance.
### R14 — Accessibility (ARIA)
The component renders semantic ARIA attributes for screen reader support:
- The root `<ul>` has `role="tree"`.
- Each node `<li>` has `role="treeitem"`.
- Branch nodes have `aria-expanded="true"` or `aria-expanded="false"`.
- Child `<ul>` containers have `role="group"`.
- When `Selectable` is true, the selected node has `aria-selected="true"`.
- Each node row has a unique `id` derived from `KeySelector` for anchor targeting.
This is baseline accessibility — no keyboard navigation yet (deferred in R7), but screen readers can understand the tree structure.
### R15 — Context Menu
The component supports an optional right-click context menu on nodes, defined by the consumer via a render fragment.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `ContextMenu` | `RenderFragment<TItem>?` | No | Menu content rendered when a node is right-clicked. Receives the right-clicked `TItem`. |
**Behavior:**
- Right-clicking a node renders the `ContextMenu` fragment for that node. The component checks whether the fragment produces any content — **if the fragment renders nothing (empty markup), no menu is shown and the browser default context menu is used.** This is how per-node-type menus work: the consumer uses `@if` blocks in the fragment, and nodes that don't match any condition simply produce no output.
- When content is produced, the browser's default context menu is suppressed (`@oncontextmenu:preventDefault`) and a floating menu is shown at the cursor.
- The menu is rendered as a Bootstrap dropdown: `<div class="dropdown-menu show">` containing `<button class="dropdown-item">` elements.
- Clicking a menu item or clicking anywhere outside the menu dismisses it.
- Pressing Escape dismisses the menu.
- Only one context menu is visible at a time — right-clicking another node replaces the current menu.
- If the `ContextMenu` parameter itself is null (not provided), right-click always uses the browser default for all nodes.
**The consumer controls which items appear and what they do:**
```razor
<TreeView TItem="TreeNode" Items="_roots" ... >
<NodeContent Context="node">
<span>@node.Label</span>
</NodeContent>
<ContextMenu Context="node">
@if (node.Kind == NodeKind.Instance)
{
<button class="dropdown-item" @onclick="() => DeployInstance(node)">
Deploy
</button>
@if (node.State == InstanceState.Enabled)
{
<button class="dropdown-item" @onclick="() => DisableInstance(node)">
Disable
</button>
}
else if (node.State == InstanceState.Disabled)
{
<button class="dropdown-item" @onclick="() => EnableInstance(node)">
Enable
</button>
}
<button class="dropdown-item" @onclick="() => NavigateToConfigure(node)">
Configure
</button>
<button class="dropdown-item" @onclick="() => ShowDiff(node)">
Diff
</button>
<div class="dropdown-divider"></div>
<button class="dropdown-item text-danger" @onclick="() => DeleteInstance(node)">
Delete
</button>
}
else if (node.Kind == NodeKind.Site)
{
<button class="dropdown-item" @onclick="() => DeployAllInSite(node)">
Deploy All
</button>
}
</ContextMenu>
</TreeView>
```
This keeps the tree clean — no inline action buttons cluttering leaf nodes. Different node types can show different menu items (instances get full CRUD actions, sites might get bulk operations, areas might have no menu at all).
**Positioning:**
- The menu is absolutely positioned relative to the viewport using the mouse event's `clientX`/`clientY`.
- If the menu would overflow the viewport bottom or right edge, it flips direction (opens upward or leftward).
- The component handles positioning internally — no JS interop needed (CSS `position: fixed` with `top`/`left` from the mouse event).
### R16 — Multi-Selection (Deferred)
Future enhancement. Single selection (R5) covers current needs. A future version may add:
- `MultiSelect` bool parameter
- `SelectedKeys` / `SelectedKeysChanged` for set-based selection
- Shift+click for range select, Ctrl+click for toggle
- Use case: bulk operations (select multiple instances → deploy/disable all)
## Visual Design Guide
This section is the canonical visual specification for the TreeView. It is normative: any change to the chrome (row layout, indentation, glyphs, state visuals, badge styling) must update this section. Consumers' `NodeContent` fragments follow the label and badge recipes in V5V6; `/design/templates` is the worked example in V7.
R4 and R9 above describe *that* the component renders structural chrome and uses Bootstrap utilities. This section says *exactly how*.
### V1 — Density & Row Anatomy
Each `<li role="treeitem">` renders one row. The row is a flexbox so trailing meta can right-align cleanly and the entire row width is a hover/selected/drop-target surface.
**Row container** (replaces today's `.tv-row` styling):
```html
<div class="tv-row d-flex align-items-center"
style="gap:.25rem; padding:.25rem .5rem; padding-left: calc(.5rem + var(--tv-indent, 0px));">
<span class="tv-toggle">…chevron or placeholder…</span>
<span class="tv-glyph">…Bootstrap Icon or placeholder…</span>
<span class="tv-label">…primary + secondary…</span>
<span class="tv-meta ms-auto">…badges…</span>
</div>
```
| Token | Value | Notes |
|---|---|---|
| Row vertical padding | `py-1` (0.25rem top/bottom) | Yields ~32px row height at base font-size + line-height 1.5. |
| Row horizontal padding | `px-2` (0.5rem left/right) | Selected/hover background spans full row including this padding. |
| Inter-slot gap | `gap: .25rem` | Between toggle, glyph, label. The meta slot is offset by `margin-left: auto`. |
| Font size | inherits (1rem base) | Compact pages may opt into `small` per-page, not at the component level. |
| Line height | inherits (1.5) | Aligns the chevron, glyph, and label baselines correctly. |
| Toggle slot width | 20px (`width: 1.25rem`) | Always present, even on leaves (which render an empty placeholder). |
| Glyph slot width | 20px (`width: 1.25rem`) | Always present; consumer may render an empty span to preserve alignment. |
| Label slot | `flex: 1 1 auto; min-width: 0;` | `min-width: 0` is required for ellipsis truncation to work in a flex child. |
| Meta slot | `margin-left: auto;` | Pushes badges to the right edge of the row. |
**Hit semantics**:
- The full row (`tv-row`) is the surface for hover, selected, focus-visible, and drop-target backgrounds.
- Click-to-select fires only on the **label slot** (preserves R5: toggle clicks do not select).
- The toggle slot's invisible tap target is enlarged by negative margins inside the 20px slot so it remains a comfortable 24×24px target.
### V2 — Depth, Indent & Guide Lines
| Token | Value |
|---|---|
| Indent per depth | 24px (`IndentPx` default, unchanged) |
| Toggle glyph (collapsed) | `<i class="bi bi-chevron-right">` |
| Toggle glyph (expanded) | `<i class="bi bi-chevron-down">` (or `bi-chevron-right` rotated 90° via CSS) |
| Guide line color | `var(--bs-border-color)` |
| Guide line width | 1px |
| Guide line style | solid, vertical-only (no horizontal stubs) |
| Guide line position | one line per ancestor depth, drawn down the indent column (left edge of each 24px indent slot) |
| Guide lines enabled | `ShowGuideLines` parameter (default true) |
| Leaf alignment | identical depth gutter as siblings; the toggle slot renders an empty placeholder so glyphs and labels align across leaves and branches |
Implementation note: guide lines are drawn by repeating a `linear-gradient` background or by stacking `border-left` on indent spacers — both are pure CSS, no extra DOM. The current `tv-guides` class is the hook.
### V3 — State Visuals
States compose: focus rings layer on top of hover/selected; drop-target overrides hover and selected. All states paint the full row width (V1).
| State | Visual | Implementation |
|---|---|---|
| Default | none | — |
| Hover | full-row tint | `background: var(--bs-tertiary-bg);` on `:hover` of `.tv-row` |
| Focus-visible | inset 2px primary ring | `box-shadow: inset 0 0 0 2px var(--bs-primary);` on `:focus-visible` |
| Selected | full-row primary tint | `class="bg-primary bg-opacity-10"` (existing `SelectedCssClass` default, unchanged) |
| Selected + hover | selected tint persists; hover does not deepen | hover background applies only when not selected (`:hover:not(.bg-primary)`) |
| Selected + focus | tint + ring both visible | focus ring layers via box-shadow |
| Drop-target (valid) | `bg-info bg-opacity-25` | overrides hover/selected backgrounds; opt-in per consumer |
| Drop-target (invalid) | cursor `not-allowed`, no tint change | absence of valid-tint is the cue |
| Dragging source | `opacity: 0.5` | applied to the row currently being dragged |
| Dimmed (non-droppable while a drag is in progress) | `opacity: 0.5` | applied to nodes the consumer marks as unsuitable drop targets |
Drag-drop is **not** part of the TreeView component's intrinsic behavior — it is opt-in per consuming page. The drag-related state visuals (drop-target, dragging, dimmed) are documented here so consumers that *do* implement DnD share the same visual language. The `/design/templates` page (V7) explicitly does **not** use drag-drop; reorganization happens via the right-click context menu.
### V4 — Glyph & Icon System
**Distribution**: Bootstrap Icons ships as static files under `src/ScadaLink.CentralUI/wwwroot/lib/bootstrap-icons/` (`bootstrap-icons.css` + `fonts/*.woff2`). Referenced once from `MainLayout.razor`:
```html
<link rel="stylesheet" href="~/lib/bootstrap-icons/bootstrap-icons.css" />
```
No CDN dependency — works on air-gapped industrial deployments. Version pinned in the file path or filename.
**Rules**:
- Glyphs are inline `<i class="bi bi-…"></i>` elements inside the 20px glyph slot.
- Branches render an **open/closed pair**: a `closed` glyph when collapsed, an `open` glyph when expanded (consumer chooses both via `NodeContent`). The chevron toggle reinforces the same state.
- Leaves render a single static glyph or no glyph (empty span preserves alignment).
- **Color**: glyphs inherit `color` from their row. Default is body text; consumers may apply `text-muted` for de-emphasis. Kind is communicated by *shape*, not by color, to keep the palette available for status badges.
- **Size**: glyphs render at `1em` (inherits row font-size). No fixed pixel size.
### V5 — Label Recipe & Typography
The label slot contains, in order: **[primary] [secondary modifiers]**. Trailing meta lives in the separate `.tv-meta` slot (V1).
| Element | Style |
|---|---|
| Primary label (branches) | `class="fw-semibold"` |
| Primary label (leaves) | normal weight |
| Secondary modifiers | `class="text-muted small ms-1"` |
| Overflow handling | `.tv-label { white-space: nowrap; overflow: hidden; text-overflow: ellipsis; min-width: 0; }` |
| Tooltip | `title` attribute on the primary label span, set to the full name on every row (cheap, helps when the row is narrower than the name) |
**Rule of thumb**: font-weight tracks *has children*, not *kind*. A folder with no children renders regular weight; a leaf-template promoted to a branch by adding compositions becomes semibold automatically.
### V6 — Badge Taxonomy
Three semantic badge roles. The meta slot holds **at most two** badges per row. All badges live in `.tv-meta`, right-aligned (V1).
| Role | Purpose | Markup | Examples |
|---|---|---|---|
| Count | numeric child aggregation | `<span class="badge rounded-pill bg-secondary-subtle text-secondary-emphasis">@N</span>` | folder child count; area instance count |
| Status | semantic state | `<span class="badge bg-{success\|warning\|danger\|info}">@Label</span>` | Enabled / Disabled / Stale / Error |
| Kind | category / type tag | same filled semantic style, used sparingly | Protocol (OPC UA), Source (Inherited) |
**Rules**:
- Counts represent **direct children only**. Never transitive descendants.
- A count of 0 **renders nothing** — no badge at all.
- Status uses Bootstrap semantic colors; do not introduce custom palettes.
- The component does not enforce the 2-badge cap; it is a documented convention. PR review should catch violations.
### V7 — Worked Example: `/design/templates`
**Page model**: the templates page is a **tree browser only**. Selecting a template in the tree navigates to a dedicated edit page (`/design/templates/{id}`); creating a template navigates to `/design/templates/create`. No split-pane editor. Reorganization (move folder, move template) happens exclusively through the **right-click context menu** with modal dialog pickers — there is no drag-and-drop on this page.
Three node kinds; concrete recipes following V1V6.
| Kind | Glyph (collapsed) | Glyph (expanded) | Primary | Secondary | Badges |
|---|---|---|---|---|---|
| Folder | `bi-folder` | `bi-folder2-open` | folder name (semibold when has children, regular otherwise) | — | count of direct children (subtle pill), only if ≥ 1 |
| Template | `bi-file-earmark-text` | same (templates with compositions still use the same glyph — chevron carries state) | `$Name` (semibold when has compositions, regular otherwise) | — | none |
| Composition | `bi-arrow-return-right` | n/a (leaf, no expanded state) | composition instance name (regular weight) | — | none |
**`NodeContent` fragment** for the templates page (replaces the current `RenderNodeLabel` in `Templates.razor`):
```razor
@switch (node.Kind)
{
case TmplNodeKind.Folder:
var folderOpen = _tree.IsExpanded(node.Key);
<span class="tv-glyph"><i class="bi @(folderOpen ? "bi-folder2-open" : "bi-folder")"></i></span>
<span class="tv-label @(node.Children.Count > 0 ? "fw-semibold" : "")"
title="@node.Label">@node.Label</span>
@if (node.Children.Count > 0)
{
<span class="tv-meta ms-auto">
<span class="badge rounded-pill bg-secondary-subtle text-secondary-emphasis">@node.Children.Count</span>
</span>
}
break;
case TmplNodeKind.Template:
<span class="tv-glyph"><i class="bi bi-file-earmark-text"></i></span>
<span class="tv-label @(node.Children.Count > 0 ? "fw-semibold" : "")"
title="@node.Label">@node.Label</span>
break;
case TmplNodeKind.Composition:
<span class="tv-glyph"><i class="bi bi-arrow-return-right"></i></span>
<span class="tv-label" title="@node.Label">@node.Label</span>
break;
}
```
**Locked subtractions from the previous design**:
- Template node "inherits $Parent" muted text — **removed**. Inheritance is shown in the right pane only.
- Template node "X attr, Y alm, Z scr" compound badge — **removed**.
- Template node "N comp" accent badge — **removed**.
These subtractions are deliberate: templates are leaves-from-the-tree's-perspective (their inner attributes/alarms/scripts are not tree-navigable), so the tree row should carry only what's needed to identify and pick the template. All counts and inheritance information live in the right detail pane.
```csharp
@typeparam TItem
// Data
[Parameter] public IReadOnlyList<TItem> Items { get; set; }
[Parameter] public Func<TItem, IReadOnlyList<TItem>> ChildrenSelector { get; set; }
[Parameter] public Func<TItem, bool> HasChildrenSelector { get; set; }
[Parameter] public Func<TItem, object> KeySelector { get; set; }
// Rendering
[Parameter] public RenderFragment<TItem> NodeContent { get; set; }
[Parameter] public RenderFragment? EmptyContent { get; set; }
[Parameter] public RenderFragment<TItem>? ContextMenu { get; set; }
// Layout
[Parameter] public int IndentPx { get; set; } = 24;
[Parameter] public bool ShowGuideLines { get; set; } = true;
// Expand/Collapse
[Parameter] public Func<TItem, bool>? InitiallyExpanded { get; set; }
[Parameter] public string? StorageKey { get; set; } // sessionStorage persistence key
// Selection
[Parameter] public bool Selectable { get; set; }
[Parameter] public object? SelectedKey { get; set; }
[Parameter] public EventCallback<object?> SelectedKeyChanged { get; set; }
[Parameter] public string SelectedCssClass { get; set; } = "bg-primary bg-opacity-10";
// Public methods (called via @ref)
public void ExpandAll();
public void CollapseAll();
public void RevealNode(object key, bool select = false);
```
## Usage Example: Instance Hierarchy
```razor
@* Build a unified tree model from sites, areas, and instances *@
<TreeView TItem="TreeNode" Items="_roots"
ChildrenSelector="n => n.Children"
HasChildrenSelector="n => n.Children.Count > 0"
KeySelector="n => n.Key"
Selectable="true"
SelectedKey="_selectedKey"
SelectedKeyChanged="key => { _selectedKey = key; StateHasChanged(); }">
<NodeContent Context="node">
@switch (node.Kind)
{
case NodeKind.Site:
<span class="fw-semibold">@node.Label</span>
break;
case NodeKind.Area:
<span class="text-secondary">@node.Label</span>
break;
case NodeKind.Instance:
<span>@node.Label</span>
<span class="badge bg-success ms-2">Enabled</span>
break;
}
</NodeContent>
<EmptyContent>
<span class="text-muted fst-italic">No items to display.</span>
</EmptyContent>
</TreeView>
@code {
private object? _selectedKey;
private List<TreeNode> _roots = new();
record TreeNode(string Key, string Label, NodeKind Kind, List<TreeNode> Children);
enum NodeKind { Site, Area, Instance }
}
```
## Usage Example: Data Connections by Site
A simpler two-level tree — Site → Data Connections (leaves):
```
- Site A
Data Connection 1
Data Connection 2
+ Site B
+ Site C
```
```razor
<TreeView TItem="TreeNode" Items="_roots"
ChildrenSelector="n => n.Children"
HasChildrenSelector="n => n.Children.Count > 0"
KeySelector="n => n.Key">
<NodeContent Context="node">
@if (node.Kind == NodeKind.Site)
{
<span class="fw-semibold">@node.Label</span>
}
else
{
<span>@node.Label</span>
<span class="badge bg-info ms-2">@node.Protocol</span>
}
</NodeContent>
</TreeView>
@code {
private List<TreeNode> _roots = new();
record TreeNode(string Key, string Label, NodeKind Kind, List<TreeNode> Children, string? Protocol = null);
enum NodeKind { Site, DataConnection }
// Build: group data connections by SiteId, wrap each site as a branch
// with its connections as leaf children
}
```
This demonstrates the component working with a flat two-level grouping — no recursive hierarchy needed. The consumer simply groups data connections by site and builds one level of children per site node.
## Tree Model Construction Pattern
The consuming page is responsible for building the tree model. The component only knows about `TItem`.
**Instance hierarchy** (deep, recursive):
1. Load sites, areas (with `ParentAreaId` hierarchy), and instances.
2. Build `Area` subtree per site using recursive `ParentAreaId` traversal.
3. Attach instances as leaf children of their assigned area (or directly under the site if `AreaId` is null).
4. Wrap each entity in a uniform `TreeNode`.
**Data connections by site** (flat, two-level):
1. Load sites and data connections.
2. Group connections by `SiteId`.
3. Each site becomes a branch node with its connections as leaf children.
## Other Potential Uses
The component is generic enough for:
- **Template inheritance tree**: Template → child templates (via `ParentTemplateId`)
- **Area management**: Site → Area hierarchy (replace current flat indentation in Areas.razor)
- **Data connections**: Site → connections (flat grouping, as shown above)
- **Navigation sidebar**: Hierarchical menu structure
- **File/folder browser**: Any nested structure
## Testing
Unit tests use the existing bUnit + xUnit + NSubstitute setup in `tests/ScadaLink.CentralUI.Tests/`. Tests live in a dedicated file: `TreeViewTests.cs`.
All tests use a simple test model:
```csharp
record TestNode(string Key, string Label, List<TestNode> Children);
```
### Test Categories
**Rendering:**
- Renders root-level items with correct labels
- Renders `EmptyContent` when `Items` is empty
- Does not render `EmptyContent` when items exist
- Leaf nodes have no expand/collapse toggle
- Branch nodes show `+` toggle when collapsed
**Expand/Collapse:**
- Clicking toggle expands node and shows children
- Clicking expanded toggle collapses node and hides children
- Children of collapsed nodes are not in the DOM
- Deep nesting: expand parent, then expand child — grandchildren visible
- `InitiallyExpanded` predicate expands matching nodes on first render
**Indentation:**
- Root nodes have zero indentation
- Child nodes are indented by `IndentPx` pixels per depth level
- Custom `IndentPx` value is applied correctly
**Selection:**
- When `Selectable` is false (default), clicking a node does not fire `SelectedKeyChanged`
- When `Selectable` is true, clicking node content fires `SelectedKeyChanged` with correct key
- Clicking expand toggle does **not** change selection
- Selected node has `SelectedCssClass` applied
- Custom `SelectedCssClass` is used when provided
**External Filtering:**
- Re-rendering with a filtered `Items` list removes hidden root nodes
- Expansion state is preserved after filter changes — expanding Site A, filtering to Site A only, then removing filter still shows Site A expanded
- Selection is cleared when the selected node disappears from filtered results
**Session Persistence (R11):**
- When `StorageKey` is null, no JS interop calls are made
- When `StorageKey` is set, expanding a node writes to sessionStorage via JS interop
- On mount with a `StorageKey`, reads sessionStorage and restores expanded nodes
- Persisted state takes precedence over `InitiallyExpanded`
*Note: sessionStorage tests mock `IJSRuntime` (already available via bUnit's `JSInterop`).*
**Expand All / Collapse All (R12):**
- `ExpandAll()` expands all branch nodes — all descendants visible
- `CollapseAll()` collapses all branch nodes — only roots visible
- `ExpandAll()` updates sessionStorage when `StorageKey` is set
- `CollapseAll()` clears sessionStorage expanded set when `StorageKey` is set
**RevealNode (R13):**
- `RevealNode(key)` expands all ancestors of the target node
- Target node's content is present in the DOM after reveal
- `RevealNode(key, select: true)` selects the node and fires `SelectedKeyChanged`
- `RevealNode` with unknown key is a no-op (does not throw)
- Deeply nested node (3+ levels) — all intermediate ancestors expanded
**Accessibility (R14):**
- Root `<ul>` has `role="tree"`
- Node `<li>` elements have `role="treeitem"`
- Expanded branch has `aria-expanded="true"`
- Collapsed branch has `aria-expanded="false"`
- Child container `<ul>` has `role="group"`
- Selected node has `aria-selected="true"` when `Selectable` is true
**Context Menu (R15):**
- Right-clicking a node shows the context menu with consumer-defined content
- Context menu is positioned at cursor coordinates
- When `ContextMenu` parameter is null, right-click does not render a menu
- When `ContextMenu` fragment renders empty content for a node type, no menu appears and browser default is used
- Right-clicking a node type with menu items shows the menu; right-clicking a node type without menu items does not
- Clicking a menu item dismisses the menu
- Clicking outside the menu dismisses it
- Right-clicking a different node replaces the current menu
### Test File Location
`tests/ScadaLink.CentralUI.Tests/TreeViewTests.cs`
## Dependencies
- Bootstrap 5 (already included in CentralUI)
- No additional packages
- bUnit 2.0.33-preview (already in test project)
## Page Integration Notes
### 1. Topology Page (`/deployment/topology` — Topology.razor)
The Topology page is the single home for Site → Area → Instance hierarchy management. It replaces the former `/deployment/instances` page (the legacy URL is retained as a secondary `@page` directive on `Topology.razor` so existing bookmarks resolve) and the former `/admin/areas*` admin pages.
**Scope:**
- Structural management of areas (create, rename inline, move, delete) and instance placement (move to area).
- Instance lifecycle: Deploy/Redeploy, Enable/Disable, Configure, Diff, Delete via per-node context menu.
- Search-only filter row (single text input) — dims non-matching rows, preserves tree shape, no collapse.
**TreeView wiring:**
- `Items` = list of Site root nodes built from `_sites`, `_allAreas`, and `_allInstances`.
- `KeySelector` returns prefixed keys (`s:{id}`, `a:{id}`, `i:{id}`).
- `StorageKey` = `"topology-tree"` for expansion state.
- A separate `topology-tree-selected` sessionStorage key persists the selected node across navigation.
- `Selectable` = true; selection does not navigate (instance configure goes through the context menu).
- Empty containers always rendered (so they can be drop/move targets).
**Glyphs (V1V7 visual guide):**
- Site: `bi-building`
- Area: `bi-diagram-3`
- Instance: `bi-box` + state badge + Stale/Current badge when deployed.
**Context menus:**
- **Site:** Add Area, Create Instance here.
- **Area:** Add Sub-area, Create Instance here, Move to Area…, Rename… (also F2 / double-click inline), Delete.
- **Instance:** Deploy/Redeploy, Enable/Disable (state-dependent), Configure, Diff, Move to Area…, Delete. Instance rename is intentionally absent (see "Instance rename" below).
**Inline rename:** Area rows only. F2 or double-click swaps the label for an input bound to a local buffer. Enter commits via `AreaService.UpdateAreaAsync`; Escape cancels; server validation errors stay surfaced inline.
**Search behavior:** Single text input above the tree. While text is present, any row whose label does not match (case-insensitive substring) and whose subtree contains no match is rendered at `opacity: 0.4`. The tree shape stays intact.
**Top-of-page buttons:** `+ Area` (opens `CreateAreaDialog` with site picker), `+ Instance` (navigates to `/deployment/instances/create` with no preselection), `Refresh`, `Expand`, `Collapse`.
**Files added:**
- `src/ScadaLink.CentralUI/Components/Pages/Deployment/Topology.razor`
- `src/ScadaLink.CentralUI/Components/Pages/Deployment/MoveAreaDialog.razor`
- `src/ScadaLink.CentralUI/Components/Pages/Deployment/MoveInstanceDialog.razor`
- `src/ScadaLink.CentralUI/Components/Pages/Deployment/CreateAreaDialog.razor`
**Files removed:**
- `src/ScadaLink.CentralUI/Components/Pages/Deployment/Instances.razor`
- `src/ScadaLink.CentralUI/Components/Pages/Admin/Areas.razor` (and AreaAdd / AreaEdit / AreaDelete)
**Backend addition:** `AreaService.MoveAreaAsync(int areaId, int? newParentAreaId, string user)` adds area re-parenting (cycle prevention, same-site, name collision at new parent). Pairs with the existing `InstanceService.AssignToAreaAsync`.
**Instance rename:** Out of scope for this page. `InstanceService` does not currently support renaming an instance (`UniqueName` is also the site-side `InstanceActor` identity and appears in deployment records). A separate design pass is required if rename is wanted.
---
### 2. Data Connections Page (`/admin/data-connections` — DataConnections.razor)
**Current state:** Flat table listing all data connections across all sites. Columns: ID, Name, Protocol, Site, Primary Config, Backup Config, Actions (Edit, Delete). No filters. ~119 lines.
**Change to:**
- Replace the `<table>` with a `<TreeView>` showing Site → Data Connection hierarchy (two levels, no recursion).
- **No filter bar needed initially** — the tree naturally groups by site. If the number of sites grows, a site filter dropdown can be added later using the external filtering pattern.
- **Move Edit and Delete into the `ContextMenu` fragment**, shown only for data connection nodes:
- Edit → navigates to `/admin/data-connections/{id}/edit`
- Delete → shows confirm dialog, then deletes
- Site nodes get no context menu.
- **Node content per type:**
- Site nodes: `<span class="fw-semibold">SiteName</span>` + child count badge (e.g., `<span class="badge bg-secondary ms-1">3</span>`)
- Data Connection nodes: `<span>Name</span>` + protocol badge (e.g., `<span class="badge bg-info ms-2">OPC UA</span>`)
- **Tree model:** Group data connections by `SiteId`. Each site becomes a branch, its connections become leaves. Sites with no connections still appear as empty branches (expandable but no children).
- **StorageKey:** `"data-connections-tree"`
**Files to modify:**
- `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor` — replace table with TreeView, add tree model building, move actions to context menu.
**Removed code:**
- `<table>` / `<thead>` / `<tbody>` structure
- Inline Edit/Delete buttons
---
## Interactions
- **DataTable**: The tree replaces flat tables on the Topology and Data Connections pages. Other pages that don't need hierarchy continue using DataTable.
- **InstanceConfigure.razor**: Right-click → Configure on an instance node navigates to `/deployment/instances/{Id}/configure`. Back-nav returns to `/deployment/topology`.
@@ -45,11 +45,13 @@
- **Machine Data Database**: A separate database for collected machine data (e.g., telemetry, measurements, events).
### 2.2 Communication: Central ↔ Site
- Central-to-site and site-to-central communication uses **Akka.NET ClusterClient/ClusterClientReceptionist** for cross-cluster messaging with automatic failover.
- **Site addressing**: Site Akka base addresses (NodeA and NodeB) are stored in the **Sites database table** and configured via the Central UI. Central creates a ClusterClient per site using both addresses as contact points (cached in memory, refreshed periodically and on admin changes) rather than relying on runtime registration messages from sites.
- Two transport layers are used for central-site communication:
- **Akka.NET ClusterClient/ClusterClientReceptionist**: Handles **command/control** messaging — deployments, instance lifecycle commands, subscribe/unsubscribe handshake, debug snapshots, health reports, remote queries, and integration routing. Provides automatic failover between contact points.
- **gRPC server-streaming (site→central)**: Handles **real-time data streaming** — attribute value updates and alarm state changes. Each site node hosts a **SiteStreamGrpcServer** on a dedicated HTTP/2 port (Kestrel, default port 8083). Central creates per-site **SiteStreamGrpcClient** instances to subscribe to site streams. gRPC provides HTTP/2 flow control and per-stream backpressure that ClusterClient lacks.
- **Site addressing**: Site Akka base addresses (NodeA and NodeB) and gRPC endpoints (GrpcNodeAAddress and GrpcNodeBAddress) are stored in the **Sites database table** and configured via the Central UI or CLI. Central creates a ClusterClient per site using both Akka addresses as contact points, and per-site gRPC clients using the gRPC addresses.
- **Central contact points**: Sites configure **multiple central contact points** (both central node addresses) for redundancy. ClusterClient handles failover between central nodes automatically.
- **Central as integration hub**: Central brokers requests between external systems and sites. For example, a recipe manager sends a recipe to central, which routes it to the appropriate site. MES requests machine values from central, which routes the request to the site and returns the response.
- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes to the site-wide Akka stream filtered by instance (see Section 8.1).
- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes via gRPC to the site's SiteStreamManager filtered by instance (see Section 8.1).
### 2.3 Site-Level Storage & Interface
- Sites have **no user interface** — they are headless collectors, forwarders, and script executors.
@@ -58,11 +60,12 @@
- Store-and-forward buffers are persisted to a **local SQLite database on each node** and replicated between nodes via application-level replication (see 1.3).
### 2.4 Data Connection Protocols
- The system supports **OPC UA** and **LmxProxy** (a gRPC-based custom protocol with an existing client SDK).
- Both protocols implement a **common interface** supporting: connect, subscribe to tag paths, receive value updates, and write values.
- The system supports **OPC UA** as the primary data connection protocol.
- All protocols implement a **common interface** supporting: connect, subscribe to tag paths, receive value updates, and write values.
- Additional protocols can be added by implementing the common interface.
- The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions.
- **Initial attribute quality**: Attributes bound to a data connection start with **uncertain** quality when the Instance Actor initializes. The quality remains uncertain until the first value update is received from the Data Connection Layer. This distinguishes "never received a value" from "received a known-good value" or "connection lost" (bad quality).
- Data connections support optional **backup endpoints** with automatic failover after a configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint.
### 2.5 Scale
- Approximately **10 sites**.
@@ -103,16 +106,18 @@ Each alarm has:
- **Priority Level**: Numeric value from 01000.
- **Lock Flag**: Controls whether the alarm can be overridden downstream.
- **Trigger Definition**: One of the following trigger types:
- **Value Match**: Triggers when a monitored attribute equals a predefined value.
- **Value Match**: Triggers when a monitored attribute equals a predefined value. Supports a `!=X` prefix on the match value for not-equals semantics.
- **Range Violation**: Triggers when a monitored attribute value falls outside an allowed range.
- **Rate of Change**: Triggers when a monitored attribute value changes faster than a defined threshold.
- **Rate of Change**: Triggers when a monitored attribute value changes faster than a defined threshold over a configurable time window. A direction filter (rising / falling / either) restricts which side of the rate triggers.
- **HiLo**: Multi-setpoint level alarm. Any subset of four setpoints (LoLo, Lo, Hi, HiHi) may be configured. The most severe matching band wins (LoLo/HiHi outrank Lo/Hi). Each setpoint may carry its own priority that overrides the alarm-level priority for that band.
- **On-Trigger Script** *(optional)*: A script to execute when the alarm triggers. The alarm on-trigger script executes in the context of the instance and can call instance scripts, but instance scripts **cannot** call alarm on-trigger scripts. The call direction is one-way.
### 3.4.1 Alarm State
- Alarm state (active/normal) is **managed at the site level** per instance, held **in memory** by the Alarm Actor.
- Active alarms additionally carry an **alarm level**: `None` for binary trigger types (ValueMatch, RangeViolation, RateOfChange); one of `Low`, `LowLow`, `High`, `HighHigh` for HiLo triggers based on which setpoint the monitored attribute has crossed. Level transitions within an active HiLo alarm (e.g., High → HighHigh) emit fresh state-change events without re-running the on-trigger script — the script only fires on the Normal → non-None edge.
- When the alarm condition clears, the alarm **automatically returns to normal state** — no acknowledgment workflow is required.
- Alarm state is **not persisted** — on restart, alarm states are re-evaluated from incoming values.
- Alarm state changes are published to the site-wide Akka stream as `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
- Alarm state changes are published to the site-wide Akka stream as `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), alarm level, priority, timestamp.
### 3.5 Template Relationships
@@ -362,7 +367,7 @@ The central cluster hosts a **configuration and management UI** (no live machine
- **Database Connection Management**: Define named database connections for script use.
- **Inbound API Management**: Manage API keys (create, enable/disable, delete). Define API methods (name, parameters, return values, approved keys, implementation script). *(Admin role for keys, Design role for methods.)*
- **Instance Management**: Create instances from templates, bind data connections (per-attribute, with **bulk assignment** UI for selecting multiple attributes and assigning a data connection at once), set instance-level attribute overrides, assign instances to areas. **Disable** or **delete** instances.
- **Site & Data Connection Management**: Define sites (including optional NodeAAddress and NodeBAddress fields for Akka remoting paths), manage data connections and assign them to sites.
- **Site & Data Connection Management**: Define sites (including optional NodeAAddress and NodeBAddress fields for Akka remoting paths, and optional GrpcNodeAAddress and GrpcNodeBAddress fields for gRPC streaming endpoints), manage data connections and assign them to sites.
- **Area Management**: Define hierarchical area structures per site for organizing instances.
- **Deployment**: View diffs between deployed and current template-derived configurations, deploy updates to individual instances. Filter instances by area. Pre-deployment validation runs automatically before any deployment is sent.
- **System-Wide Artifact Deployment**: Explicitly deploy shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration to all sites or to an individual site (requires Deployment role). Per-site deployment is available via the Sites admin page.
@@ -373,7 +378,7 @@ The central cluster hosts a **configuration and management UI** (no live machine
- **Site Event Log Viewer**: Query and view operational event logs from site clusters (see Section 12).
### 8.1 Debug View
- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central subscribes to the **site-wide Akka stream** filtered by instance unique name. The site first provides a **snapshot** of all current attribute values and alarm states from the Instance Actor, then streams subsequent changes from the Akka stream.
- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central opens a **gRPC server-streaming subscription** to the site's `SiteStreamGrpcServer` for the instance, then requests a **snapshot** of all current attribute values and alarm states via ClusterClient. The gRPC stream delivers subsequent attribute value and alarm state changes directly from the site's `SiteStreamManager`.
- Attribute value stream messages are structured as: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp.
- Alarm state stream messages are structured as: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
- The stream continues until the engineer **closes the debug view**, at which point central unsubscribes and the site stops streaming.
@@ -471,19 +476,19 @@ Sites log operational events locally, including:
### 13.1 Management Service
- The central cluster exposes a **ManagementActor** that provides programmatic access to all administrative operations — the same operations available through the Central UI.
- The ManagementActor registers with Akka.NET **ClusterClientReceptionist**, allowing external tools to communicate with it via ClusterClient without joining the cluster.
- The ManagementActor registers with Akka.NET **ClusterClientReceptionist** for cross-cluster access, and is also exposed via an HTTP Management API endpoint (`POST /management`) with Basic Auth, LDAP authentication, and role resolution — enabling external tools like the CLI to interact without Akka.NET dependencies.
- The ManagementActor enforces the **same role-based authorization** as the Central UI. Every incoming message carries the authenticated user's identity and roles.
- All mutating operations performed through the Management Service are **audit logged** via IAuditService, identical to operations performed through the Central UI.
- The ManagementActor runs on the **active central node** and fails over with it. ClusterClient handles reconnection transparently.
- The ManagementActor runs on **every central node** (stateless). For HTTP API access, any central node can handle any request without sticky sessions.
### 13.2 CLI
- The system provides a standalone **command-line tool** (`scadalink`) for scripting and automating administrative operations.
- The CLI connects to the ManagementActor via Akka.NET **ClusterClient** — it does not join the cluster as a full member and does not use HTTP/REST.
- The CLI authenticates the user against **LDAP/AD** (direct bind, same mechanism as the Central UI) and includes the authenticated identity in every message sent to the ManagementActor.
- The CLI connects to the Central Host's HTTP Management API (`POST /management`) — it sends commands as JSON with HTTP Basic Auth credentials. The server handles LDAP authentication, role resolution, and ManagementActor dispatch.
- The CLI sends user credentials via HTTP Basic Auth. The server authenticates against **LDAP/AD** and resolves roles before dispatching commands to the ManagementActor.
- CLI commands mirror all Management Service operations: templates, instances, sites, data connections, deployments, external systems, notifications, security (API keys and role mappings), audit log queries, and health status.
- Output is **JSON by default** (machine-readable, suitable for scripting) with an optional `--format table` flag for human-readable tabular output.
- Configuration is resolved from command-line options, **environment variables** (`SCADALINK_CONTACT_POINTS`, `SCADALINK_LDAP_SERVER`, etc.), or a **configuration file** (`~/.scadalink/config.json`).
- The CLI is a separate executable from the Host binary — it is deployed on any Windows machine with network access to the central cluster.
- Configuration is resolved from command-line options, **environment variables** (`SCADALINK_MANAGEMENT_URL`, `SCADALINK_FORMAT`), or a **configuration file** (`~/.scadalink/config.json`).
- The CLI is a separate executable from the Host binary — it is deployed on any machine with HTTP access to a central node.
## 14. General Conventions
+28 -10
View File
@@ -1,16 +1,19 @@
# Test Infrastructure
This document describes the local Docker-based test infrastructure for ScadaLink development. Five services provide the external dependencies needed to run and test the system locally.
This document describes the local Docker-based test infrastructure for ScadaLink development. Seven services provide the external dependencies needed to run and test the system locally. The first seven run in `infra/docker-compose.yml`; Traefik runs alongside the cluster nodes in `docker/docker-compose.yml`.
## Services
| Service | Image | Port(s) | Config |
|---------|-------|---------|--------|
| OPC UA Server | `mcr.microsoft.com/iotedge/opc-plc:latest` | 50000 (OPC UA), 8080 (web) | `infra/opcua/nodes.json` |
| LDAP Server | `glauth/glauth:latest` | 3893 | `infra/glauth/config.toml` |
| MS SQL 2022 | `mcr.microsoft.com/mssql/server:2022-latest` | 1433 | `infra/mssql/setup.sql` |
| SMTP (Mailpit) | `axllent/mailpit:latest` | 1025 (SMTP), 8025 (web) | Environment vars |
| REST API (Flask) | Custom build (`infra/restapi/Dockerfile`) | 5200 | `infra/restapi/app.py` |
| Service | Image | Port(s) | Config | Compose File |
|---------|-------|---------|--------|-------------|
| OPC UA Server | `mcr.microsoft.com/iotedge/opc-plc:latest` | 50000 (OPC UA), 8080 (web) | `infra/opcua/nodes.json` | `infra/` |
| OPC UA Server 2 | `mcr.microsoft.com/iotedge/opc-plc:latest` | 50010 (OPC UA), 8081 (web) | `infra/opcua/nodes.json` | `infra/` |
| LDAP Server | `glauth/glauth:latest` | 3893 | `infra/glauth/config.toml` | `infra/` |
| MS SQL 2022 | `mcr.microsoft.com/mssql/server:2022-latest` | 1433 | `infra/mssql/setup.sql` | `infra/` |
| SMTP (Mailpit) | `axllent/mailpit:latest` | 1025 (SMTP), 8025 (web) | Environment vars | `infra/` |
| REST API (Flask) | Custom build (`infra/restapi/Dockerfile`) | 5200 | `infra/restapi/app.py` | `infra/` |
| Playwright | `mcr.microsoft.com/playwright:v1.58.2-noble` | 3000 (WebSocket) | Command args | `infra/` |
| Traefik LB | `traefik:v3.4` | 9000 (proxy), 8180 (dashboard) | `docker/traefik/` | `docker/` |
## Quick Start
@@ -40,6 +43,14 @@ Each service has a dedicated document with configuration details, verification s
- [test_infra_db.md](test_infra_db.md) — MS SQL 2022 database
- [test_infra_smtp.md](test_infra_smtp.md) — SMTP test server (Mailpit)
- [test_infra_restapi.md](test_infra_restapi.md) — REST API test server (Flask)
- [test_infra_playwright.md](test_infra_playwright.md) — Playwright browser server (Central UI testing)
- Traefik LB — see `docker/README.md` and `docker/traefik/` (runs with the cluster, not in `infra/`)
## Remote Test Infrastructure
In addition to the local Docker services, the following remote services are available for testing against real AVEVA System Platform hardware.
**Primary/backup testing**: The dual OPC UA test servers (ports 50000 and 50010) in local Docker provide primary/backup endpoint pairs for testing Data Connection Layer failover. Use `docker compose stop opcua` to simulate primary failure and verify automatic failover to the backup.
## Connection Strings
@@ -60,6 +71,9 @@ For use in `appsettings.Development.json`:
"OpcUa": {
"EndpointUrl": "opc.tcp://localhost:50000"
},
"OpcUa2": {
"EndpointUrl": "opc.tcp://localhost:50010"
},
"Smtp": {
"Server": "localhost",
"Port": 1025,
@@ -82,7 +96,7 @@ For use in `appsettings.Development.json`:
```bash
cd infra
docker compose down # stop containers, preserve SQL data volume
docker compose stop opcua # stop a single service (also: ldap, mssql, smtp, restapi)
docker compose stop opcua # stop a single service (also: opcua2, ldap, mssql, smtp, restapi)
```
**Full teardown** (removes volumes, optionally images and venv):
@@ -99,7 +113,7 @@ After a full teardown, the next `docker compose up -d` starts fresh — re-run t
```
infra/
docker-compose.yml # All five services
docker-compose.yml # All seven services
teardown.sh # Teardown script (volumes, images, venv)
glauth/config.toml # LDAP users and groups
mssql/setup.sql # Database and user creation
@@ -109,4 +123,8 @@ infra/
restapi/Dockerfile # REST API container build
tools/ # Python CLI tools (opcua, ldap, mssql, smtp, restapi)
README.md # Quick-start for the infra folder
docker/
traefik/traefik.yml # Traefik static config (entrypoints, file provider)
traefik/dynamic.yml # Traefik dynamic config (load balancer, health check routing)
```
@@ -6,9 +6,14 @@ The test OPC UA server uses [Azure IoT OPC PLC](https://github.com/Azure-Samples
## Image & Ports
Two identical OPC UA server instances run with the same tag configuration, on different ports:
| Instance | OPC UA Endpoint | Web UI | Container |
|----------|----------------|--------|-----------|
| opcua | `opc.tcp://localhost:50000` | `http://localhost:8080` | scadalink-opcua |
| opcua2 | `opc.tcp://localhost:50010` | `http://localhost:8081` | scadalink-opcua2 |
- **Image**: `mcr.microsoft.com/iotedge/opc-plc:latest`
- **OPC UA endpoint**: `opc.tcp://localhost:50000`
- **Web/config UI**: `http://localhost:8080`
## Startup Flags
@@ -33,29 +38,31 @@ The file `infra/opcua/nodes.json` defines a single `ConfigFolder` object (not an
| Pump | FlowRate, Pressure, Running | Double, Boolean |
| Tank | Level, Temperature, HighLevel, LowLevel | Double, Boolean |
| Valve | Position, Command | Double, UInt32 |
| JoeAppEngine | BTCS, AlarmCntsBySeverity, Scheduler/ScanTime | String, Int32[], DateTime |
All custom nodes hold their initial/default values (0 for numerics, false for booleans) until written. OPC PLC's custom node format does not support random value generation for these nodes.
All custom nodes hold their initial/default values (0 for numerics, false for booleans, empty for strings, epoch for DateTime) until written. OPC PLC's custom node format does not support random value generation for these nodes.
Custom nodes live in namespace 3 (`http://microsoft.com/Opc/OpcPlc/`). Node IDs follow the pattern `ns=3;s=<Folder>.<Tag>` (e.g., `ns=3;s=Motor.Speed`).
Custom nodes live in namespace 3 (`http://microsoft.com/Opc/OpcPlc/`). Node IDs follow the pattern `ns=3;s=<Folder>.<Tag>` (e.g., `ns=3;s=Motor.Speed`). Nested folders use dot notation: `ns=3;s=JoeAppEngine.Scheduler.ScanTime`.
The browse path from the Objects root is: `OpcPlc > ScadaLink > Motor|Pump|Tank|Valve`.
The browse path from the Objects root is: `OpcPlc > ScadaLink > Motor|Pump|Tank|Valve|JoeAppEngine`.
## Verification
1. Check the container is running:
1. Check both containers are running:
```bash
docker ps --filter name=scadalink-opcua
```
2. Verify the OPC UA endpoint using any OPC UA client (e.g., UaExpert, opcua-commander):
2. Verify both OPC UA endpoints using any OPC UA client (e.g., UaExpert, opcua-commander):
```bash
# Using opcua-commander (npm install -g opcua-commander)
opcua-commander -e opc.tcp://localhost:50000
opcua-commander -e opc.tcp://localhost:50010
```
3. Check the web UI at `http://localhost:8080` for server status and node listing.
3. Check the web UIs at `http://localhost:8080` (opcua) and `http://localhost:8081` (opcua2) for server status and node listing.
## CLI Tool
@@ -88,7 +95,14 @@ python infra/tools/opcua_tool.py write --node "ns=3;s=Motor.Running" --value tru
python infra/tools/opcua_tool.py monitor --nodes "ns=3;s=Motor.Speed,ns=3;s=Pump.FlowRate" --duration 15
```
Use `--endpoint` to override the default endpoint (`opc.tcp://localhost:50000`). Run with `--help` for full usage.
Use `--endpoint` to override the default endpoint (`opc.tcp://localhost:50000`). For the second instance:
```bash
python infra/tools/opcua_tool.py --endpoint opc.tcp://localhost:50010 check
python infra/tools/opcua_tool.py --endpoint opc.tcp://localhost:50010 browse --path "3:OpcPlc.3:ScadaLink.3:Motor"
```
Run with `--help` for full usage.
## Relevance to ScadaLink Components
+104
View File
@@ -0,0 +1,104 @@
# Test Infrastructure: Playwright Browser Server
## Overview
The Playwright browser server provides a remote headless browser (Chromium, Firefox, WebKit) that test scripts connect to over the network. It runs as a Playwright Server on port 3000, allowing UI tests for the Central UI (Blazor Server) to run from the host machine while the browser executes inside the container with access to the Docker network.
## Image & Ports
- **Image**: `mcr.microsoft.com/playwright:v1.58.2-noble` (Ubuntu 24.04 LTS)
- **Server port**: 3000 (Playwright Server WebSocket endpoint)
## Configuration
| Setting | Value | Description |
|---------|-------|-------------|
| `--host 0.0.0.0` | Bind address | Listen on all interfaces |
| `--port 3000` | Server port | Playwright Server WebSocket port |
| `ipc: host` | Docker IPC | Shared IPC namespace (required for Chromium) |
No additional config files are needed. The container runs `npx playwright run-server` on startup.
## Connecting from Test Scripts
Test scripts run on the host and connect to the browser server via WebSocket. The connection URL is:
```
ws://localhost:3000
```
### .NET (Microsoft.Playwright)
```csharp
using var playwright = await Playwright.CreateAsync();
var browser = await playwright.Chromium.ConnectAsync("ws://localhost:3000");
var page = await browser.NewPageAsync();
// Browser runs inside Docker — use the Docker network hostname for Traefik.
await page.GotoAsync("http://scadalink-traefik");
```
### Node.js
```javascript
const { chromium } = require('playwright');
const browser = await chromium.connect('ws://localhost:3000');
const page = await browser.newPage();
await page.goto('http://scadalink-traefik');
```
### Python
```python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.connect("ws://localhost:3000")
page = browser.new_page()
page.goto("http://scadalink-traefik")
```
## Central UI Access
The Playwright container is on the `scadalink-net` Docker network, so it can reach the Central UI cluster nodes directly:
| Target | URL in Test Scripts |
|--------|---------------------|
| Traefik LB | `http://scadalink-traefik` |
| Central Node A | `http://scadalink-central-a:5000` |
| Central Node B | `http://scadalink-central-b:5000` |
**Important**: The browser runs inside the Docker container, so `page.goto()` URLs must use Docker network hostnames (not `localhost`). The test script itself connects to the Playwright server via `ws://localhost:3000` (host-mapped port), but all URLs navigated by the browser resolve inside the container.
## Verification
1. Check the container is running:
```bash
docker ps --filter name=scadalink-playwright
```
2. Check the server is accepting connections (look for the WebSocket endpoint in logs):
```bash
docker logs scadalink-playwright 2>&1 | head -5
```
3. Quick smoke test with a one-liner (requires `npx` and `playwright` on the host):
```bash
npx playwright@1.58.2 test --browser chromium --connect ws://localhost:3000
```
## Relevance to ScadaLink Components
- **Central UI** — end-to-end UI testing of all Blazor Server pages (login, admin, design, deployment, monitoring workflows).
- **Traefik Proxy** — verify load balancer behavior, failover, and active node routing from a browser perspective.
## Notes
- The container includes Chromium, Firefox, and WebKit. Connect to the desired browser via `playwright.chromium.connect()`, `playwright.firefox.connect()`, or `playwright.webkit.connect()`.
- The `ipc: host` flag is required for Chromium to avoid out-of-memory crashes in the container.
- The Playwright Server version (`1.58.2`) must match the `@playwright` package version used by test scripts on the host.
- The container is stateless — no test data or browser state persists between restarts.
- To stop only the Playwright container: `cd infra && docker compose stop playwright`.
+10 -9
View File
@@ -8,11 +8,12 @@ Local Docker-based test services for ScadaLink development.
docker compose up -d
```
This starts five services:
This starts the following services:
| Service | Port | Purpose |
|---------|------|---------|
| OPC UA (Azure IoT OPC PLC) | 50000 (OPC UA), 8080 (web) | Simulated OPC UA server with ScadaLink-style tags |
| OPC UA 2 (Azure IoT OPC PLC) | 50010 (OPC UA), 8081 (web) | Second OPC UA server instance (same tags, independent state) |
| LDAP (GLAuth) | 3893 | Lightweight LDAP with test users/groups matching ScadaLink roles |
| MS SQL 2022 | 1433 | Configuration and machine data databases |
| SMTP (Mailpit) | 1025 (SMTP), 8025 (web) | Email capture for notification testing |
@@ -45,7 +46,7 @@ docker compose down
**Stop a single service** (leave the others running):
```bash
docker compose stop opcua # or: ldap, mssql, smtp, restapi
docker compose stop opcua # or: opcua2, ldap, mssql, smtp, restapi
docker compose start opcua # bring it back without recreating
```
@@ -99,11 +100,11 @@ Each tool supports `--help` for full usage. See the per-service docs below for d
## Detailed Documentation
See the project root for per-service setup guides:
See `docs/test_infra/` for per-service setup guides:
- [test_infra.md](../test_infra.md) — Master test infrastructure overview
- [test_infra_opcua.md](../test_infra_opcua.md) — OPC UA server details
- [test_infra_ldap.md](../test_infra_ldap.md) — LDAP server details
- [test_infra_db.md](../test_infra_db.md) — MS SQL database details
- [test_infra_smtp.md](../test_infra_smtp.md) — SMTP server details (Mailpit)
- [test_infra_restapi.md](../test_infra_restapi.md) — REST API server details (Flask)
- [test_infra.md](../docs/test_infra/test_infra.md) — Master test infrastructure overview
- [test_infra_opcua.md](../docs/test_infra/test_infra_opcua.md) — OPC UA server details
- [test_infra_ldap.md](../docs/test_infra/test_infra_ldap.md) — LDAP server details
- [test_infra_db.md](../docs/test_infra/test_infra_db.md) — MS SQL database details
- [test_infra_smtp.md](../docs/test_infra/test_infra_smtp.md) — SMTP server details (Mailpit)
- [test_infra_restapi.md](../docs/test_infra/test_infra_restapi.md) — REST API server details (Flask)
+35
View File
@@ -20,6 +20,27 @@ services:
- scadalink-net
restart: unless-stopped
opcua2:
image: mcr.microsoft.com/iotedge/opc-plc:latest
container_name: scadalink-opcua2
ports:
- "50010:50010"
- "8081:8080"
volumes:
- ./opcua/nodes.json:/app/config/nodes.json:ro
command: >
--autoaccept
--unsecuretransport
--sph
--sn=5 --sr=10 --st=uint
--fn=5 --fr=1 --ft=uint
--gn=5
--nf=/app/config/nodes.json
--pn=50010
networks:
- scadalink-net
restart: unless-stopped
ldap:
image: glauth/glauth:latest
container_name: scadalink-ldap
@@ -74,6 +95,20 @@ services:
- scadalink-net
restart: unless-stopped
playwright:
image: mcr.microsoft.com/playwright:v1.58.2-noble
container_name: scadalink-playwright
ports:
- "3000:3000"
command: >
npx -y playwright@1.58.2 run-server
--host 0.0.0.0
--port 3000
ipc: host
networks:
- scadalink-net
restart: unless-stopped
volumes:
scadalink-mssql-data:
+195
View File
@@ -0,0 +1,195 @@
-- ScadaLink design-data seed.
-- Auto-generated by infra/tools/dump_seed.py against ScadaLinkConfig.
-- Replays the design-time configuration (templates, scripts,
-- data connections, external systems). Idempotent: deletes
-- existing rows in the covered tables before inserting.
--
-- Excluded: Sites (seed via docker/seed-sites.sh), Instances,
-- InstanceConnectionBindings, notifications, SMTP, API keys,
-- areas, LDAP mappings.
SET NOCOUNT ON;
SET XACT_ABORT ON;
SET QUOTED_IDENTIFIER ON;
BEGIN TRAN;
-- Wipe existing design + dependent rows so the seed is idempotent.
-- Order matters: dependents first.
DELETE FROM DeployedConfigSnapshots;
DELETE FROM DeploymentRecords;
DELETE FROM InstanceAlarmOverrides;
DELETE FROM InstanceAttributeOverrides;
DELETE FROM InstanceConnectionBindings;
DELETE FROM Instances;
DELETE FROM ExternalSystemMethods;
DELETE FROM ExternalSystemDefinitions;
DELETE FROM DataConnections;
DELETE FROM SharedScripts;
DELETE FROM TemplateCompositions;
UPDATE TemplateAlarms SET OnTriggerScriptId = NULL;
DELETE FROM TemplateAlarms;
DELETE FROM TemplateScripts;
DELETE FROM TemplateAttributes;
UPDATE Templates SET ParentTemplateId = NULL, OwnerCompositionId = NULL;
DELETE FROM Templates;
UPDATE TemplateFolders SET ParentFolderId = NULL;
DELETE FROM TemplateFolders;
-- TemplateFolders (1 rows)
SET IDENTITY_INSERT [TemplateFolders] ON;
INSERT INTO [TemplateFolders] ([Id], [Name], [ParentFolderId], [SortOrder]) VALUES (1002, N'Test', NULL, 0);
SET IDENTITY_INSERT [TemplateFolders] OFF;
-- Templates (18 rows)
SET IDENTITY_INSERT [Templates] ON;
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (1, N'Base Device', N'Root template for all devices', NULL, NULL, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2, N'Pump', N'Centrifugal pump template', 1, NULL, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (3, N'Sensor Module', N'Reusable sensor feature module', NULL, 1002, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (4, N'Motor Controller', N'Motor with OPC UA tags from test server', NULL, 1002, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (5, N'Variable Speed Motor', N'VFD motor extending Motor Controller with sensor composition', 4, NULL, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (1002, N'Tank Monitor', N'Tank level and temperature monitoring module', NULL, NULL, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2003, N'Pump.TempSensor', N'Reusable sensor feature module', 3, NULL, 1, 1);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2004, N'Variable Speed Motor.TempSensor', N'Reusable sensor feature module', 3, NULL, 1, 2);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2005, N'Motor Controller.CoolingTank', N'Tank level and temperature monitoring module', 1002, NULL, 1, 1002);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2006, N'Motor Controller.CoolingTank2', N'Tank level and temperature monitoring module', 1002, NULL, 1, 1003);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2007, N'aaa', NULL, 3, NULL, 0, NULL);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2008, N'Pump.AlarmSensor', N'Reusable sensor feature module', 3, NULL, 1, 1004);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2012, N'Tank Monitor.DrivePump', N'Centrifugal pump template', 2, NULL, 1, 1008);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2013, N'Tank Monitor.DrivePump.TempSensor', N'Reusable sensor feature module', 2003, NULL, 1, 1009);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2014, N'Tank Monitor.DrivePump.AlarmSensor', N'Reusable sensor feature module', 2008, NULL, 1, 1010);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2018, N'Motor Controller.Pump', N'Centrifugal pump template', 2, NULL, 1, 1014);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2019, N'Motor Controller.Pump.TempSensor', N'Reusable sensor feature module', 2003, NULL, 1, 1015);
INSERT INTO [Templates] ([Id], [Name], [Description], [ParentTemplateId], [FolderId], [IsDerived], [OwnerCompositionId]) VALUES (2020, N'Motor Controller.Pump.AlarmSensor', N'Reusable sensor feature module', 2008, NULL, 1, 1016);
SET IDENTITY_INSERT [Templates] OFF;
-- TemplateAttributes (48 rows)
SET IDENTITY_INSERT [TemplateAttributes] ON;
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1, 1, N'Status', N'Offline', N'String', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2, 1, N'Temperature', N'0.0', N'Double', 0, NULL, N'ns=3;s=Temperature', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (9, 3, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (10, 3, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (11, 5, N'MaxRPM', N'3600', N'Double', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (12, 5, N'MinRPM', N'0', N'Double', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1002, 4, N'Weather', N'Unknown', N'String', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1003, 4, N'Greeting', N'', N'String', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1004, 4, N'Goodbye', N'', N'String', 0, NULL, NULL, 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1005, 1002, N'Level', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Level', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1006, 1002, N'Temperature', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Temperature', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1007, 1002, N'HighLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.HighLevel', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (1008, 1002, N'LowLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.LowLevel', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2009, 4, N'TestBool', NULL, N'Boolean', 0, NULL, N'ns=3;s=TestChildObject.TestBool', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2010, 4, N'TestInt', NULL, N'Int32', 0, NULL, N'ns=3;s=TestChildObject.TestInt', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2011, 4, N'TestFloat', NULL, N'Float', 0, NULL, N'ns=3;s=TestChildObject.TestFloat', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2012, 4, N'TestDouble', NULL, N'Double', 0, NULL, N'ns=3;s=TestChildObject.TestDouble', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2013, 4, N'TestString', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestString', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2014, 4, N'TestDateTime', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestDateTime', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2015, 4, N'TestBoolArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestBoolArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2016, 4, N'TestDateTimeArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestDateTimeArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2017, 4, N'TestDoubleArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestDoubleArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2018, 4, N'TestFloatArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestFloatArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2019, 4, N'TestIntArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestIntArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2020, 4, N'TestStringArray', NULL, N'String', 0, NULL, N'ns=3;s=TestChildObject.TestStringArray', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (2021, 4, N'ScanTime', NULL, N'String', 0, NULL, N'ns=3;s=DevAppEngine.Scheduler.ScanTime', 0, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3009, 2003, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3010, 2003, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3011, 2004, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3012, 2004, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3013, 2005, N'Level', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Level', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3014, 2005, N'Temperature', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Temperature', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3015, 2005, N'HighLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.HighLevel', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3016, 2005, N'LowLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.LowLevel', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3017, 2006, N'Level', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Level', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3018, 2006, N'Temperature', N'0', N'Float', 0, NULL, N'ns=3;s=Tank.Temperature', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3019, 2006, N'HighLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.HighLevel', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3020, 2006, N'LowLevel', N'false', N'Boolean', 0, NULL, N'ns=3;s=Tank.LowLevel', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3021, 2008, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3022, 2008, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3025, 2013, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3026, 2013, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3027, 2014, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3028, 2014, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3033, 2019, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3034, 2019, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3035, 2020, N'SensorReading', N'0', N'Double', 0, NULL, N'ns=3;s=Sensor.Reading', 1, 0);
INSERT INTO [TemplateAttributes] ([Id], [TemplateId], [Name], [Value], [DataType], [IsLocked], [Description], [DataSourceReference], [IsInherited], [LockedInDerived]) VALUES (3036, 2020, N'SensorUnit', N'Celsius', N'String', 0, NULL, NULL, 1, 0);
SET IDENTITY_INSERT [TemplateAttributes] OFF;
-- TemplateScripts (12 rows)
SET IDENTITY_INSERT [TemplateScripts] ON;
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1, 1, N'CheckTemp', 0, N'var temp = Instance.GetAttribute("Temperature");
if (temp.Value > 90.0) {
Instance.SetAttribute("Status", "HighTemp");
}', N'ValueChange', NULL, NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1002, 4, N'TestExternalSystem', 0, N'var parms = new Dictionary<string, object?> { ["a"] = 2, ["b"] = 3 }; var result = await ExternalSystem.Call("Test REST API", "Add", parms); Instance.SetAttribute("Status", "API result: " + result.Response.result);', N'Interval', N'{"intervalMs":10000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1003, 4, N'TestDatabaseQuery', 0, N'var conn = await Database.Connection("Machine Data DB"); var cmd = conn.CreateCommand(); cmd.CommandText = "SELECT COUNT(*) FROM TagHistory"; var count = await cmd.ExecuteScalarAsync(); conn.Dispose(); Instance.SetAttribute("Status", "DB: " + count + " rows");', N'Interval', N'{"intervalMs":60000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1004, 4, N'UpdateWeather', 0, N'var weather = await Scripts.CallShared("GetWeather"); Instance.SetAttribute("Weather", weather?.ToString() ?? "Unknown");', N'Interval', N'{"intervalMs":10000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1005, 4, N'UpdateGreeting', 0, N'var parms = new Dictionary<string, object?> { ["name"] = "BOB" }; var greeting = await Scripts.CallShared("Greet", parms); Instance.SetAttribute("Greeting", greeting?.ToString() ?? "");', N'Interval', N'{"intervalMs":10000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1007, 4, N'SayGoodbye', 0, N'var name = (string)(Parameters?["Name"] ?? "World"); return $"Goodbye {name}! It is {DateTimeOffset.UtcNow:HH:mm:ss} UTC";', N'Call', N'{}', N'{"type":"object","properties":{"Name":{"type":"string"}},"required":["Name"]}', N'{"type":"string"}', NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1008, 4, N'UpdateGoodbye', 0, N'var parms = new Dictionary<string, object?> { ["Name"] = "Bob" }; var result = await Instance.CallScript("SayGoodbye", parms); Instance.SetAttribute("Goodbye", result?.ToString() ?? "");', N'Interval', N'{"intervalMs":10000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1009, 4, N'Hello', 0, N'var name = (string)(Parameters?["Name"] ?? "World"); return $"Hello {name}! It is {DateTimeOffset.UtcNow:HH:mm:ss} UTC";', N'Call', N'{}', N'{"type":"object","properties":{"Name":{"type":"string"}},"required":["Name"]}', N'{"type":"string"}', NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1010, 4, N'SendEmailAlert', 0, N'await Notify.To("Engineering Alerts").Send("Motor Status Update", "Motor check-in at " + DateTimeOffset.UtcNow.ToString("HH:mm:ss") + " UTC");', N'Interval', N'{"intervalMs":10000}', NULL, NULL, NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1011, 1002, N'AddNumbers', 0, N'var a = Convert.ToDouble(Parameters?["a"] ?? 0); var b = Convert.ToDouble(Parameters?["b"] ?? 0); return a + b;', N'Call', N'{}', N'{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}', N'{"type":"number"}', NULL, 0, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1012, 2005, N'AddNumbers', 0, N'var a = Convert.ToDouble(Parameters?["a"] ?? 0); var b = Convert.ToDouble(Parameters?["b"] ?? 0); return a + b;', N'Call', N'{}', N'{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}', N'{"type":"number"}', NULL, 1, 0);
INSERT INTO [TemplateScripts] ([Id], [TemplateId], [Name], [IsLocked], [Code], [TriggerType], [TriggerConfiguration], [ParameterDefinitions], [ReturnDefinition], [MinTimeBetweenRuns], [IsInherited], [LockedInDerived]) VALUES (1013, 2006, N'AddNumbers', 0, N'var a = Convert.ToDouble(Parameters?["a"] ?? 0); var b = Convert.ToDouble(Parameters?["b"] ?? 0); return a + b;', N'Call', N'{}', N'{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}', N'{"type":"number"}', NULL, 1, 0);
SET IDENTITY_INSERT [TemplateScripts] OFF;
-- TemplateAlarms (4 rows)
SET IDENTITY_INSERT [TemplateAlarms] ON;
INSERT INTO [TemplateAlarms] ([Id], [TemplateId], [Name], [Description], [PriorityLevel], [IsLocked], [TriggerType], [TriggerConfiguration], [OnTriggerScriptId]) VALUES (1, 1, N'HighTemp', NULL, 800, 0, N'RangeViolation', N'{"attribute":"Temperature","high":95.0}', NULL);
INSERT INTO [TemplateAlarms] ([Id], [TemplateId], [Name], [Description], [PriorityLevel], [IsLocked], [TriggerType], [TriggerConfiguration], [OnTriggerScriptId]) VALUES (1002, 1002, N'HighLevel', NULL, 800, 0, N'RangeViolation', N'{"attribute":"Level","high":80}', NULL);
INSERT INTO [TemplateAlarms] ([Id], [TemplateId], [Name], [Description], [PriorityLevel], [IsLocked], [TriggerType], [TriggerConfiguration], [OnTriggerScriptId]) VALUES (1003, 2, N'RatePump', NULL, 750, 0, N'RateOfChange', N'{"attributeName":"AlarmSensor.SensorReading","thresholdPerSecond":25,"windowSeconds":2,"direction":"falling"}', NULL);
INSERT INTO [TemplateAlarms] ([Id], [TemplateId], [Name], [Description], [PriorityLevel], [IsLocked], [TriggerType], [TriggerConfiguration], [OnTriggerScriptId]) VALUES (1004, 2, N'TempLevels', NULL, 500, 0, N'HiLo', N'{"attributeName":"AlarmSensor.SensorReading","loLo":-10,"lo":5,"hi":80,"hiHi":100,"loLoPriority":900,"loPriority":600,"hiPriority":600,"hiHiPriority":900,"hiDeadband":3,"hiHiDeadband":5,"hiMessage":"Temperature high — investigate","hiHiMessage":"CRITICAL: shut down immediately"}', NULL);
SET IDENTITY_INSERT [TemplateAlarms] OFF;
-- TemplateCompositions (11 rows)
SET IDENTITY_INSERT [TemplateCompositions] ON;
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1, 2, 2003, N'TempSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (2, 5, 2004, N'TempSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1002, 4, 2005, N'CoolingTank');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1003, 4, 2006, N'CoolingTank2');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1004, 2, 2008, N'AlarmSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1008, 1002, 2012, N'DrivePump');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1009, 2012, 2013, N'TempSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1010, 2012, 2014, N'AlarmSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1014, 4, 2018, N'Pump');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1015, 2018, 2019, N'TempSensor');
INSERT INTO [TemplateCompositions] ([Id], [TemplateId], [ComposedTemplateId], [InstanceName]) VALUES (1016, 2018, 2020, N'AlarmSensor');
SET IDENTITY_INSERT [TemplateCompositions] OFF;
-- SharedScripts (2 rows)
SET IDENTITY_INSERT [SharedScripts] ON;
INSERT INTO [SharedScripts] ([Id], [Name], [Code], [ParameterDefinitions], [ReturnDefinition]) VALUES (1, N'GetWeather', N'var conditions = new[]
{
"Sunny",
"Cloudy",
"Rainy",
"Stormy",
"Windy",
"Foggy",
"Snowy",
"Clear"
};
var temps = new Random().Next(-10, 40);
var condition = conditions[new Random().Next(conditions.Length)];
return $"{condition}, {temps}°C";', NULL, N'{"type":"string"}');
INSERT INTO [SharedScripts] ([Id], [Name], [Code], [ParameterDefinitions], [ReturnDefinition]) VALUES (2, N'Greet', N'var name = (string)(Parameters?["name"] ?? "World"); return $"Hello, {name}! It is {DateTimeOffset.UtcNow:HH:mm:ss} UTC";', N'{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}', N'{"type":"string"}');
SET IDENTITY_INSERT [SharedScripts] OFF;
-- DataConnections (3 rows)
SET IDENTITY_INSERT [DataConnections] ON;
INSERT INTO [DataConnections] ([Id], [Name], [Protocol], [PrimaryConfiguration], [SiteId], [BackupConfiguration], [FailoverRetryCount]) VALUES (1, N'OPC PLC Simulator', N'OpcUa', N'{"endpointUrl":"opc.tcp://scadalink-opcua:50000","securityMode":"none","autoAcceptUntrustedCerts":true,"sessionTimeoutMs":60000,"operationTimeoutMs":15000,"publishingIntervalMs":1000,"samplingIntervalMs":1000,"queueSize":10,"keepAliveCount":10,"lifetimeCount":30,"maxNotificationsPerPublish":100,"discardOldest":true,"subscriptionPriority":0,"subscriptionDisplayName":"ScadaLink","timestampsToReturn":"source","deadband":null,"userIdentity":null,"heartbeat":null}', 1, NULL, 3);
INSERT INTO [DataConnections] ([Id], [Name], [Protocol], [PrimaryConfiguration], [SiteId], [BackupConfiguration], [FailoverRetryCount]) VALUES (3014, N'OPC PLC Simulator', N'OpcUa', N'{"endpoint":"opc.tcp://scadalink-opcua:50000","securityMode":"None","publishInterval":1000}', 2, NULL, 3);
INSERT INTO [DataConnections] ([Id], [Name], [Protocol], [PrimaryConfiguration], [SiteId], [BackupConfiguration], [FailoverRetryCount]) VALUES (3015, N'OPC PLC Simulator', N'OpcUa', N'{"endpoint":"opc.tcp://scadalink-opcua:50000","securityMode":"None","publishInterval":1000}', 3, NULL, 3);
SET IDENTITY_INSERT [DataConnections] OFF;
-- ExternalSystemDefinitions (1 rows)
SET IDENTITY_INSERT [ExternalSystemDefinitions] ON;
INSERT INTO [ExternalSystemDefinitions] ([Id], [Name], [EndpointUrl], [AuthType], [AuthConfiguration], [MaxRetries], [RetryDelay]) VALUES (1, N'Test REST API', N'http://scadalink-restapi:5200', N'ApiKey', N'scadalink-test-key-1', 0, '00:00:00.000000');
SET IDENTITY_INSERT [ExternalSystemDefinitions] OFF;
-- ExternalSystemMethods (1 rows)
SET IDENTITY_INSERT [ExternalSystemMethods] ON;
INSERT INTO [ExternalSystemMethods] ([Id], [ExternalSystemDefinitionId], [Name], [HttpMethod], [Path], [ParameterDefinitions], [ReturnDefinition]) VALUES (1, 1, N'Add', N'POST', N'/api/Add', N'{"a":"number","b":"number"}', N'{"result":"number"}');
SET IDENTITY_INSERT [ExternalSystemMethods] OFF;
COMMIT;
+189
View File
@@ -133,6 +133,195 @@
"Description": "Valve command (0=Close, 1=Open, 2=Stop)"
}
]
},
{
"Folder": "JoeAppEngine",
"NodeList": [
{
"NodeId": "JoeAppEngine.BTCS",
"Name": "BTCS",
"DataType": "String",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "BTCS string value"
},
{
"NodeId": "JoeAppEngine.AlarmCntsBySeverity",
"Name": "AlarmCntsBySeverity",
"DataType": "Int32",
"ValueRank": 1,
"ArrayDimensions": [13],
"AccessLevel": "CurrentReadOrWrite",
"Description": "13-element alarm counts by severity level"
}
],
"FolderList": [
{
"Folder": "Scheduler",
"NodeList": [
{
"NodeId": "JoeAppEngine.Scheduler.ScanTime",
"Name": "ScanTime",
"DataType": "DateTime",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Current scan time (updates every second)"
}
]
}
]
},
{
"Folder": "DevAppEngine",
"NodeList": [],
"FolderList": [
{
"Folder": "Scheduler",
"NodeList": [
{
"NodeId": "DevAppEngine.Scheduler.ScanTime",
"Name": "ScanTime",
"DataType": "DateTime",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Current scan time for DevAppEngine"
}
]
}
]
},
{
"Folder": "Sensor",
"NodeList": [
{
"NodeId": "Sensor.Reading",
"Name": "Reading",
"DataType": "Double",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Generic sensor reading"
}
]
},
{
"Folder": "Misc",
"NodeList": [
{
"NodeId": "Temperature",
"Name": "Temperature",
"DataType": "Double",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Standalone Temperature tag (Base Device default)"
}
]
},
{
"Folder": "TestChildObject",
"NodeList": [
{
"NodeId": "TestChildObject.TestBool",
"Name": "TestBool",
"DataType": "Boolean",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar Boolean"
},
{
"NodeId": "TestChildObject.TestBoolArray",
"Name": "TestBoolArray",
"DataType": "Boolean",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test Boolean array"
},
{
"NodeId": "TestChildObject.TestDateTime",
"Name": "TestDateTime",
"DataType": "DateTime",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar DateTime"
},
{
"NodeId": "TestChildObject.TestDateTimeArray",
"Name": "TestDateTimeArray",
"DataType": "DateTime",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test DateTime array"
},
{
"NodeId": "TestChildObject.TestDouble",
"Name": "TestDouble",
"DataType": "Double",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar Double"
},
{
"NodeId": "TestChildObject.TestDoubleArray",
"Name": "TestDoubleArray",
"DataType": "Double",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test Double array"
},
{
"NodeId": "TestChildObject.TestFloat",
"Name": "TestFloat",
"DataType": "Float",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar Float"
},
{
"NodeId": "TestChildObject.TestFloatArray",
"Name": "TestFloatArray",
"DataType": "Float",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test Float array"
},
{
"NodeId": "TestChildObject.TestInt",
"Name": "TestInt",
"DataType": "Int32",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar Int32"
},
{
"NodeId": "TestChildObject.TestIntArray",
"Name": "TestIntArray",
"DataType": "Int32",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test Int32 array"
},
{
"NodeId": "TestChildObject.TestString",
"Name": "TestString",
"DataType": "String",
"ValueRank": -1,
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test scalar String"
},
{
"NodeId": "TestChildObject.TestStringArray",
"Name": "TestStringArray",
"DataType": "String",
"ValueRank": 1,
"ArrayDimensions": [4],
"AccessLevel": "CurrentReadOrWrite",
"Description": "Test String array"
}
]
}
]
}
+124
View File
@@ -0,0 +1,124 @@
#!/usr/bin/env bash
# Full reseed of the ScadaLink test cluster.
#
# Tears down infra + app containers, drops the MSSQL volume, brings
# everything back, lets EF Core migrations create the schema, replays
# infra/mssql/seed-config.sql for templates/scripts/data-connections, and
# re-seeds sites via docker/seed-sites.sh.
#
# Usage:
# infra/reseed.sh Full reseed (default seed file)
# infra/reseed.sh --seed PATH Replay a different seed SQL
# infra/reseed.sh --skip-teardown Replay seed against running stack
#
# Prerequisites:
# - Docker / OrbStack running
# - Python 3 with pymssql (used by infra/tools/mssql_tool.py + dump_seed.py)
# - Built scadalink:latest image (docker/build.sh — deploy.sh runs it)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
SEED_FILE="$SCRIPT_DIR/mssql/seed-config.sql"
SKIP_TEARDOWN=false
MGMT_URL="http://localhost:9000"
while [ $# -gt 0 ]; do
case "$1" in
--seed)
SEED_FILE="$2"
shift 2
;;
--skip-teardown)
SKIP_TEARDOWN=true
shift
;;
-h|--help)
sed -n '2,16p' "$0" | sed 's/^# \{0,1\}//'
exit 0
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
if [ ! -f "$SEED_FILE" ]; then
echo "Seed file not found: $SEED_FILE" >&2
exit 1
fi
echo "=== ScadaLink Reseed ==="
echo "Seed file: $SEED_FILE"
echo ""
if ! $SKIP_TEARDOWN; then
echo "--- Stage 1/6: tear down application containers ---"
"$PROJECT_ROOT/docker/teardown.sh"
echo ""
echo "--- Stage 2/6: wipe site SQLite state ---"
shopt -s nullglob
for d in "$PROJECT_ROOT"/docker/site-*/data; do
rm -rf "$d"/*
echo " cleared $d"
done
shopt -u nullglob
echo ""
echo "--- Stage 3/6: tear down infra (drops MSSQL volume) ---"
(cd "$SCRIPT_DIR" && docker compose down -v)
echo ""
echo "--- Stage 4/6: bring infra back up ---"
(cd "$SCRIPT_DIR" && docker compose up -d)
echo " Waiting for MSSQL to accept connections..."
until docker exec scadalink-mssql /opt/mssql-tools18/bin/sqlcmd \
-S localhost -U sa -P 'ScadaLink_Dev1#' -C -Q "SELECT 1" >/dev/null 2>&1; do
sleep 2
done
echo " MSSQL ready."
echo " Waiting for setup.sql to create ScadaLinkConfig..."
until docker exec scadalink-mssql /opt/mssql-tools18/bin/sqlcmd \
-S localhost -U sa -P 'ScadaLink_Dev1#' -C \
-Q "IF DB_ID('ScadaLinkConfig') IS NULL THROW 50000, 'not ready', 1;" \
>/dev/null 2>&1; do
sleep 2
done
echo " ScadaLinkConfig present."
echo ""
echo "--- Stage 5/6: deploy central + site nodes ---"
"$PROJECT_ROOT/docker/deploy.sh"
fi
echo ""
echo "--- Stage 6a/6: wait for central cluster /health/ready ---"
until curl -fs "$MGMT_URL/health/ready" >/dev/null 2>&1; do
sleep 2
done
echo " Central cluster ready (EF Core migrations applied)."
echo ""
echo "--- Stage 6b/6: seed sites (CLI) ---"
# Sites must exist before the design seed: DataConnections.SiteId FKs to Sites.
"$PROJECT_ROOT/docker/seed-sites.sh"
echo ""
echo "--- Stage 6c/6: replay seed SQL ---"
docker exec -i scadalink-mssql /opt/mssql-tools18/bin/sqlcmd \
-S localhost -U sa -P 'ScadaLink_Dev1#' -C -d ScadaLinkConfig -b < "$SEED_FILE"
echo " Seed replayed."
echo ""
echo "=== Reseed complete ==="
echo ""
echo "Verify:"
echo " $PROJECT_ROOT/src/ScadaLink.CLI/bin/Debug/net*/ScadaLink.CLI --url $MGMT_URL --username multi-role --password password template list"
echo ""
echo "To refresh the seed file from the current DB state:"
echo " python3 $SCRIPT_DIR/tools/dump_seed.py --output $SEED_FILE"
+11 -1
View File
@@ -1,6 +1,11 @@
#!/usr/bin/env bash
# Tear down ScadaLink test infrastructure.
#
# Drops the MSSQL data volume by default, so the ScadaLinkConfig DB
# (templates, scripts, data connections, etc.) is wiped. Use
# infra/reseed.sh afterwards to restore the design state from
# infra/mssql/seed-config.sql.
#
# Usage:
# ./teardown.sh Stop containers and delete the SQL data volume
# ./teardown.sh --images Also remove downloaded Docker images
@@ -44,4 +49,9 @@ fi
echo ""
echo "Teardown complete."
echo "To start fresh: docker compose up -d && python tools/mssql_tool.py setup --script mssql/setup.sql"
echo ""
echo "To restore the full test cluster (infra + app + design seed + sites):"
echo " infra/reseed.sh"
echo ""
echo "To start only infra (no app, no seed):"
echo " cd infra && docker compose up -d"
+220
View File
@@ -0,0 +1,220 @@
#!/usr/bin/env python3
"""Dump design tables from ScadaLinkConfig to a replayable SQL seed file.
Usage:
python3 infra/tools/dump_seed.py --output infra/mssql/seed-config.sql
Tables covered (insert order; reverse for delete):
TemplateFolders, Templates, TemplateAttributes, TemplateScripts,
TemplateAlarms, TemplateCompositions, SharedScripts, DataConnections,
ExternalSystemDefinitions, ExternalSystemMethods
Excluded by design (per-environment, not design-time): Sites (seeded via
seed-sites.sh), Instances + InstanceConnectionBindings + InstanceOverrides,
NotificationLists/Recipients, SmtpConfigurations, ApiKeys, Areas,
SiteScopeRules, LdapGroupMappings, DataProtectionKeys, audit, deployment.
"""
import argparse
import datetime
import sys
import pymssql
DEFAULT_HOST = "localhost"
DEFAULT_PORT = 1433
DEFAULT_USER = "sa"
DEFAULT_PASSWORD = "ScadaLink_Dev1#"
DEFAULT_DATABASE = "ScadaLinkConfig"
INSERT_ORDER = [
"TemplateFolders",
"Templates",
"TemplateAttributes",
"TemplateScripts",
"TemplateAlarms",
"TemplateCompositions",
"SharedScripts",
"DataConnections",
"ExternalSystemDefinitions",
"ExternalSystemMethods",
]
# Identity columns get IDENTITY_INSERT wrapped around inserts and are kept in
# the column list. All listed tables happen to use Id as their identity.
IDENTITY_TABLES = set(INSERT_ORDER)
# Templates has self-FK Templates.ParentTemplateId; emit a single batch that
# inserts shallow rows first then deeper ones. pymssql returns rows in Id order
# from our ORDER BY, which matches insertion order for this schema (parent Id
# is always less than child Id in the live data).
def quote(value):
if value is None:
return "NULL"
if isinstance(value, bool):
return "1" if value else "0"
if isinstance(value, (int, float)):
return str(value)
if isinstance(value, (bytes, bytearray)):
return "0x" + value.hex()
if isinstance(value, datetime.datetime):
return "'" + value.isoformat(sep=" ", timespec="microseconds") + "'"
if isinstance(value, datetime.date):
return "'" + value.isoformat() + "'"
if isinstance(value, datetime.time):
return "'" + value.isoformat(timespec="microseconds") + "'"
if isinstance(value, datetime.timedelta):
total = value.total_seconds()
hours, rem = divmod(int(total), 3600)
minutes, seconds = divmod(rem, 60)
micros = value.microseconds
return "'{:02d}:{:02d}:{:02d}.{:06d}'".format(hours, minutes, seconds, micros)
text = str(value).replace("'", "''")
return "N'" + text + "'"
def get_columns(cursor, table):
cursor.execute(
"""
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = %s
ORDER BY ORDINAL_POSITION
""",
(table,),
)
return [row[0] for row in cursor.fetchall()]
def dump(args):
conn = pymssql.connect(
server=args.host,
port=args.port,
user=args.user,
password=args.password,
database=args.database,
)
cursor = conn.cursor()
out = []
out.append("-- ScadaLink design-data seed.")
out.append("-- Auto-generated by infra/tools/dump_seed.py against " + args.database + ".")
out.append("-- Replays the design-time configuration (templates, scripts,")
out.append("-- data connections, external systems). Idempotent: deletes")
out.append("-- existing rows in the covered tables before inserting.")
out.append("--")
out.append("-- Excluded: Sites (seed via docker/seed-sites.sh), Instances,")
out.append("-- InstanceConnectionBindings, notifications, SMTP, API keys,")
out.append("-- areas, LDAP mappings.")
out.append("")
out.append("SET NOCOUNT ON;")
out.append("SET XACT_ABORT ON;")
# sqlcmd defaults QUOTED_IDENTIFIER OFF; EF Core's filtered indexes
# and computed columns require ON, so force it here.
out.append("SET QUOTED_IDENTIFIER ON;")
out.append("BEGIN TRAN;")
out.append("")
# Wipe in reverse FK order. Beyond the design tables themselves, we also
# clear instance + deployment rows because they FK to Templates and
# DataConnections; without this, an idempotent replay against a populated
# DB fails on the FK to DataConnections. On a fresh reseed (after
# teardown.sh) these tables are already empty so the DELETEs are no-ops.
out.append("-- Wipe existing design + dependent rows so the seed is idempotent.")
out.append("-- Order matters: dependents first.")
delete_order = [
# Dependents on Instances / DataConnections / Sites.
"DeployedConfigSnapshots",
"DeploymentRecords",
"InstanceAlarmOverrides",
"InstanceAttributeOverrides",
"InstanceConnectionBindings",
"Instances",
# Design tables themselves.
"ExternalSystemMethods",
"ExternalSystemDefinitions",
"DataConnections",
"SharedScripts",
"TemplateCompositions",
# Alarms reference scripts via OnTriggerScriptId; null it first so we
# can delete scripts without FK violations.
"UPDATE TemplateAlarms SET OnTriggerScriptId = NULL",
"TemplateAlarms",
"TemplateScripts",
"TemplateAttributes",
# Templates is self-referential and references TemplateCompositions
# (OwnerCompositionId); null parent links first.
"UPDATE Templates SET ParentTemplateId = NULL, OwnerCompositionId = NULL",
"Templates",
# Folders is self-referential too.
"UPDATE TemplateFolders SET ParentFolderId = NULL",
"TemplateFolders",
]
for step in delete_order:
if step.startswith("UPDATE "):
out.append(step + ";")
else:
out.append("DELETE FROM " + step + ";")
out.append("")
for table in INSERT_ORDER:
columns = get_columns(cursor, table)
if not columns:
print("Skipping {} (no columns found)".format(table), file=sys.stderr)
continue
# Order by Id so self-referential rows insert in dependency order
# (in the live data, parent Id < child Id by construction).
order_clause = "ORDER BY Id" if "Id" in columns else ""
cursor.execute(
"SELECT [{}] FROM [{}] {}".format("], [".join(columns), table, order_clause)
)
rows = cursor.fetchall()
out.append("-- " + table + " (" + str(len(rows)) + " rows)")
if not rows:
continue
col_list = ", ".join("[" + c + "]" for c in columns)
identity = table in IDENTITY_TABLES
if identity:
out.append("SET IDENTITY_INSERT [{}] ON;".format(table))
for row in rows:
values = ", ".join(quote(v) for v in row)
out.append(
"INSERT INTO [{}] ({}) VALUES ({});".format(table, col_list, values)
)
if identity:
out.append("SET IDENTITY_INSERT [{}] OFF;".format(table))
out.append("")
out.append("COMMIT;")
out.append("")
sql = "\n".join(out)
with open(args.output, "w") as f:
f.write(sql)
print("Wrote " + args.output + " (" + str(sum(1 for line in out if line.startswith('INSERT'))) + " inserts).")
cursor.close()
conn.close()
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--host", default=DEFAULT_HOST)
parser.add_argument("--port", type=int, default=DEFAULT_PORT)
parser.add_argument("--user", default=DEFAULT_USER)
parser.add_argument("--password", default=DEFAULT_PASSWORD)
parser.add_argument("--database", default=DEFAULT_DATABASE)
parser.add_argument("--output", required=True, help="Path to write seed SQL")
args = parser.parse_args()
dump(args)
if __name__ == "__main__":
main()
+46
View File
@@ -0,0 +1,46 @@
#!/usr/bin/env python3
"""Quick smoke test: verify Playwright can reach the Central UI through Traefik."""
import sys
from playwright.sync_api import sync_playwright
# The browser runs inside Docker, so use the Docker network hostname for Traefik.
# The Playwright server WebSocket is exposed to the host on port 3000.
TRAEFIK_URL = "http://scadalink-traefik"
PLAYWRIGHT_WS = "ws://localhost:3000"
def main():
with sync_playwright() as p:
print(f"Connecting to Playwright server at {PLAYWRIGHT_WS} ...")
browser = p.chromium.connect(PLAYWRIGHT_WS)
page = browser.new_page()
print(f"Navigating to {TRAEFIK_URL} ...")
response = page.goto(TRAEFIK_URL, wait_until="networkidle", timeout=15000)
status = response.status if response else None
title = page.title()
url = page.url
print(f" Status: {status}")
print(f" Title: {title}")
print(f" URL: {url}")
# Check for the login page (unauthenticated users get redirected)
has_login = page.locator("input[type='password'], form[action*='login'], button:has-text('Login'), button:has-text('Sign in')").count() > 0
if has_login:
print(" Login form detected: YES")
browser.close()
if status and 200 <= status < 400:
print("\nSMOKE TEST PASSED: Central UI is reachable through Traefik.")
return 0
else:
print(f"\nSMOKE TEST FAILED: unexpected status {status}")
return 1
if __name__ == "__main__":
sys.exit(main())
+11 -28
View File
@@ -4,10 +4,7 @@ namespace ScadaLink.CLI;
public class CliConfig
{
public List<string> ContactPoints { get; set; } = new();
public string? LdapServer { get; set; }
public int LdapPort { get; set; } = 636;
public bool LdapUseTls { get; set; } = true;
public string? ManagementUrl { get; set; }
public string DefaultFormat { get; set; } = "json";
public static CliConfig Load()
@@ -25,42 +22,28 @@ public class CliConfig
new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
if (fileConfig != null)
{
if (fileConfig.ContactPoints?.Count > 0) config.ContactPoints = fileConfig.ContactPoints;
if (fileConfig.Ldap != null)
{
config.LdapServer = fileConfig.Ldap.Server;
config.LdapPort = fileConfig.Ldap.Port;
config.LdapUseTls = fileConfig.Ldap.UseTls;
}
if (!string.IsNullOrEmpty(fileConfig.DefaultFormat)) config.DefaultFormat = fileConfig.DefaultFormat;
if (!string.IsNullOrEmpty(fileConfig.ManagementUrl))
config.ManagementUrl = fileConfig.ManagementUrl;
if (!string.IsNullOrEmpty(fileConfig.DefaultFormat))
config.DefaultFormat = fileConfig.DefaultFormat;
}
}
// Override from environment variables
var envContacts = Environment.GetEnvironmentVariable("SCADALINK_CONTACT_POINTS");
if (!string.IsNullOrEmpty(envContacts))
config.ContactPoints = envContacts.Split(',', StringSplitOptions.RemoveEmptyEntries).ToList();
var envLdap = Environment.GetEnvironmentVariable("SCADALINK_LDAP_SERVER");
if (!string.IsNullOrEmpty(envLdap)) config.LdapServer = envLdap;
var envUrl = Environment.GetEnvironmentVariable("SCADALINK_MANAGEMENT_URL");
if (!string.IsNullOrEmpty(envUrl))
config.ManagementUrl = envUrl;
var envFormat = Environment.GetEnvironmentVariable("SCADALINK_FORMAT");
if (!string.IsNullOrEmpty(envFormat)) config.DefaultFormat = envFormat;
if (!string.IsNullOrEmpty(envFormat))
config.DefaultFormat = envFormat;
return config;
}
private class CliConfigFile
{
public List<string>? ContactPoints { get; set; }
public LdapConfig? Ldap { get; set; }
public string? ManagementUrl { get; set; }
public string? DefaultFormat { get; set; }
}
private class LdapConfig
{
public string? Server { get; set; }
public int Port { get; set; } = 636;
public bool UseTls { get; set; } = true;
}
}
-63
View File
@@ -1,63 +0,0 @@
using System.Collections.Immutable;
using Akka.Actor;
using Akka.Cluster.Tools.Client;
using Akka.Configuration;
using ScadaLink.Commons.Messages.Management;
namespace ScadaLink.CLI;
public class ClusterConnection : IAsyncDisposable
{
private ActorSystem? _system;
private IActorRef? _clusterClient;
public async Task ConnectAsync(IReadOnlyList<string> contactPoints, TimeSpan timeout)
{
var seedNodes = string.Join(",", contactPoints.Select(cp => $"\"{cp}\""));
var config = ConfigurationFactory.ParseString($@"
akka {{
actor.provider = remote
remote.dot-netty.tcp {{
hostname = ""127.0.0.1""
port = 0
}}
}}
");
_system = ActorSystem.Create("scadalink-cli", config);
var initialContacts = contactPoints
.Select(cp => $"{cp}/system/receptionist")
.Select(path => ActorPath.Parse(path))
.ToImmutableHashSet();
var clientSettings = ClusterClientSettings.Create(_system)
.WithInitialContacts(initialContacts);
_clusterClient = _system.ActorOf(ClusterClient.Props(clientSettings), "cluster-client");
// Wait for connection by sending a ping
// ClusterClient doesn't have a direct "connected" signal, so we rely on the first Ask succeeding
await Task.CompletedTask;
}
public async Task<object> AskManagementAsync(ManagementEnvelope envelope, TimeSpan timeout)
{
if (_clusterClient == null) throw new InvalidOperationException("Not connected");
var response = await _clusterClient.Ask(
new ClusterClient.Send("/user/management", envelope),
timeout);
return response;
}
public async ValueTask DisposeAsync()
{
if (_system != null)
{
await CoordinatedShutdown.Get(_system).Run(CoordinatedShutdown.ClrExitReason.Instance);
_system = null;
}
}
}
+16 -16
View File
@@ -6,31 +6,31 @@ namespace ScadaLink.CLI.Commands;
public static class ApiMethodCommands
{
public static Command Build(Option<string> contactPointsOption, Option<string> formatOption)
public static Command Build(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var command = new Command("api-method") { Description = "Manage inbound API methods" };
command.Add(BuildList(contactPointsOption, formatOption));
command.Add(BuildGet(contactPointsOption, formatOption));
command.Add(BuildCreate(contactPointsOption, formatOption));
command.Add(BuildUpdate(contactPointsOption, formatOption));
command.Add(BuildDelete(contactPointsOption, formatOption));
command.Add(BuildList(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildGet(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildCreate(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildUpdate(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildDelete(urlOption, formatOption, usernameOption, passwordOption));
return command;
}
private static Command BuildList(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildList(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var cmd = new Command("list") { Description = "List all API methods" };
cmd.SetAction(async (ParseResult result) =>
{
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption, new ListApiMethodsCommand());
result, urlOption, formatOption, usernameOption, passwordOption, new ListApiMethodsCommand());
});
return cmd;
}
private static Command BuildGet(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildGet(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var idOption = new Option<int>("--id") { Description = "API method ID", Required = true };
var cmd = new Command("get") { Description = "Get an API method by ID" };
@@ -39,12 +39,12 @@ public static class ApiMethodCommands
{
var id = result.GetValue(idOption);
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption, new GetApiMethodCommand(id));
result, urlOption, formatOption, usernameOption, passwordOption, new GetApiMethodCommand(id));
});
return cmd;
}
private static Command BuildCreate(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildCreate(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var nameOption = new Option<string>("--name") { Description = "Method name", Required = true };
var scriptOption = new Option<string>("--script") { Description = "Script code", Required = true };
@@ -67,13 +67,13 @@ public static class ApiMethodCommands
var parameters = result.GetValue(parametersOption);
var returnDef = result.GetValue(returnDefOption);
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption,
result, urlOption, formatOption, usernameOption, passwordOption,
new CreateApiMethodCommand(name, script, timeout, parameters, returnDef));
});
return cmd;
}
private static Command BuildUpdate(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildUpdate(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var idOption = new Option<int>("--id") { Description = "API method ID", Required = true };
var scriptOption = new Option<string>("--script") { Description = "Script code", Required = true };
@@ -96,13 +96,13 @@ public static class ApiMethodCommands
var parameters = result.GetValue(parametersOption);
var returnDef = result.GetValue(returnDefOption);
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption,
result, urlOption, formatOption, usernameOption, passwordOption,
new UpdateApiMethodCommand(id, script, timeout, parameters, returnDef));
});
return cmd;
}
private static Command BuildDelete(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildDelete(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var idOption = new Option<int>("--id") { Description = "API method ID", Required = true };
var cmd = new Command("delete") { Description = "Delete an API method" };
@@ -111,7 +111,7 @@ public static class ApiMethodCommands
{
var id = result.GetValue(idOption);
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption, new DeleteApiMethodCommand(id));
result, urlOption, formatOption, usernameOption, passwordOption, new DeleteApiMethodCommand(id));
});
return cmd;
}
@@ -6,16 +6,16 @@ namespace ScadaLink.CLI.Commands;
public static class AuditLogCommands
{
public static Command Build(Option<string> contactPointsOption, Option<string> formatOption)
public static Command Build(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var command = new Command("audit-log") { Description = "Query audit logs" };
command.Add(BuildQuery(contactPointsOption, formatOption));
command.Add(BuildQuery(urlOption, formatOption, usernameOption, passwordOption));
return command;
}
private static Command BuildQuery(Option<string> contactPointsOption, Option<string> formatOption)
private static Command BuildQuery(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var userOption = new Option<string?>("--user") { Description = "Filter by username" };
var entityTypeOption = new Option<string?>("--entity-type") { Description = "Filter by entity type" };
@@ -45,7 +45,7 @@ public static class AuditLogCommands
var page = result.GetValue(pageOption);
var pageSize = result.GetValue(pageSizeOption);
return await CommandHelpers.ExecuteCommandAsync(
result, contactPointsOption, formatOption,
result, urlOption, formatOption, usernameOption, passwordOption,
new QueryAuditLogCommand(user, entityType, action, from, to, page, pageSize));
});
return cmd;
+90 -35
View File
@@ -1,67 +1,122 @@
using System.CommandLine;
using System.CommandLine.Parsing;
using System.Text.Json;
using ScadaLink.Commons.Messages.Management;
namespace ScadaLink.CLI.Commands;
internal static class CommandHelpers
{
internal static AuthenticatedUser PlaceholderUser { get; } =
new("cli-user", "CLI User", ["Admin", "Design", "Deployment"], Array.Empty<string>());
internal static string NewCorrelationId() => Guid.NewGuid().ToString("N");
internal static async Task<int> ExecuteCommandAsync(
ParseResult result,
Option<string> contactPointsOption,
Option<string> urlOption,
Option<string> formatOption,
Option<string> usernameOption,
Option<string> passwordOption,
object command)
{
var contactPointsRaw = result.GetValue(contactPointsOption);
var format = result.GetValue(formatOption) ?? "json";
var config = CliConfig.Load();
if (string.IsNullOrWhiteSpace(contactPointsRaw))
{
var config = CliConfig.Load();
if (config.ContactPoints.Count > 0)
contactPointsRaw = string.Join(",", config.ContactPoints);
}
// Resolve management URL
var url = result.GetValue(urlOption);
if (string.IsNullOrWhiteSpace(url))
url = config.ManagementUrl;
if (string.IsNullOrWhiteSpace(contactPointsRaw))
if (string.IsNullOrWhiteSpace(url))
{
OutputFormatter.WriteError("No contact points specified. Use --contact-points or set SCADALINK_CONTACT_POINTS.", "NO_CONTACT_POINTS");
OutputFormatter.WriteError(
"No management URL specified. Use --url, set SCADALINK_MANAGEMENT_URL, or add 'managementUrl' to ~/.scadalink/config.json.",
"NO_URL");
return 1;
}
var contactPoints = contactPointsRaw.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
// Validate credentials
var username = result.GetValue(usernameOption);
var password = result.GetValue(passwordOption);
await using var connection = new ClusterConnection();
await connection.ConnectAsync(contactPoints, TimeSpan.FromSeconds(10));
if (string.IsNullOrWhiteSpace(username) || string.IsNullOrWhiteSpace(password))
{
OutputFormatter.WriteError(
"Credentials required. Use --username and --password options.",
"NO_CREDENTIALS");
return 1;
}
var envelope = new ManagementEnvelope(PlaceholderUser, command, NewCorrelationId());
var response = await connection.AskManagementAsync(envelope, TimeSpan.FromSeconds(30));
// Derive command name from type
var commandName = ManagementCommandRegistry.GetCommandName(command.GetType());
return HandleResponse(response);
// Send via HTTP
using var client = new ManagementHttpClient(url, username, password);
var response = await client.SendCommandAsync(commandName, command, TimeSpan.FromSeconds(30));
return HandleResponse(response, format);
}
internal static int HandleResponse(object response)
internal static int HandleResponse(ManagementResponse response, string format)
{
switch (response)
if (response.JsonData != null)
{
case ManagementSuccess success:
Console.WriteLine(success.JsonData);
return 0;
if (string.Equals(format, "table", StringComparison.OrdinalIgnoreCase))
{
WriteAsTable(response.JsonData);
}
else
{
Console.WriteLine(response.JsonData);
}
return 0;
}
case ManagementError error:
OutputFormatter.WriteError(error.Error, error.ErrorCode);
return 1;
var errorCode = response.ErrorCode ?? "ERROR";
var error = response.Error ?? "Unknown error";
case ManagementUnauthorized unauth:
OutputFormatter.WriteError(unauth.Message, "UNAUTHORIZED");
return 2;
OutputFormatter.WriteError(error, errorCode);
return response.StatusCode == 403 ? 2 : 1;
}
default:
OutputFormatter.WriteError($"Unexpected response type: {response.GetType().Name}", "UNEXPECTED_RESPONSE");
return 1;
private static void WriteAsTable(string json)
{
using var doc = JsonDocument.Parse(json);
var root = doc.RootElement;
if (root.ValueKind == JsonValueKind.Array)
{
var items = root.EnumerateArray().ToList();
if (items.Count == 0)
{
Console.WriteLine("(no results)");
return;
}
var headers = items[0].ValueKind == JsonValueKind.Object
? items[0].EnumerateObject().Select(p => p.Name).ToArray()
: new[] { "Value" };
var rows = items.Select(item =>
{
if (item.ValueKind == JsonValueKind.Object)
{
return headers.Select(h =>
item.TryGetProperty(h, out var val)
? val.ValueKind == JsonValueKind.Null ? "" : val.ToString()
: "").ToArray();
}
return new[] { item.ToString() };
});
OutputFormatter.WriteTable(rows, headers);
}
else if (root.ValueKind == JsonValueKind.Object)
{
var headers = new[] { "Property", "Value" };
var rows = root.EnumerateObject().Select(p =>
new[] { p.Name, p.Value.ValueKind == JsonValueKind.Null ? "" : p.Value.ToString() });
OutputFormatter.WriteTable(rows, headers);
}
else
{
Console.WriteLine(root.ToString());
}
}
}

Some files were not shown because too many files have changed in this diff Show More