HandleSubscribe spawned a Task.Run that mutated DataConnectionActor private
state (_subscriptionIds, _subscriptionsByInstance, _totalSubscribed,
_resolvedTags, _unresolvedTags) from a thread-pool thread, racing the actor's
own message loop — a data race on non-thread-safe Dictionary/HashSet and
non-atomic counters.
Restructured HandleSubscribe to follow the actor's existing PipeTo(Self)
pattern: the background task now performs only adapter I/O and pipes a
SubscribeCompleted message to Self; all subscription-state mutation happens
in the new HandleSubscribeCompleted handler on the actor thread (wired into
the Connected, Connecting and Reconnecting states).
Adds DCL001_ConcurrentSubscribes_DoNotCorruptSubscriptionCounters (30x30
concurrent subscribes) which fails against the pre-fix code and passes after.
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
Six tests asserted DoesNotContain(SCADA004/SCADA005) or an empty InlayHints
result — all pass for the wrong reason now that those diagnostics and the
positional InlayHints were removed in the analyzer realignment. They also
used the obsolete top-level CallScript syntax. Removed.
Two stale references blocked compilation: the DataConnection page tests
still pointed at Components.Pages.Admin (the pages moved to .Design), and
ScriptAnalysisServiceTests constructed ScriptAnalysisService without the
IServiceProvider parameter. The project now compiles.
HandleSetStaticAttribute was made fire-and-forget (commit 2951507) — it no
longer replies with SetStaticAttributeResponse — but three InstanceActor
tests still ExpectMsg<SetStaticAttributeResponse> and timed out. Verify the
mutation via the GetAttributeRequest round-trip instead, which the FIFO
mailbox makes a sound sync point. Test intent (in-memory update, SQLite
persistence, serialized ordering) is unchanged.
Captures the brainstormed design for a new Expression trigger: a read-only
boolean C# expression evaluated on attribute updates, edge-triggered for
scripts and level-based for alarms, compiled against a restricted read-only
globals type.
The script add/edit modal exposed a script's trigger as two raw free-text
inputs — a type string and hand-written config JSON — with no validation
and no parity with the alarm trigger UI.
Replace them with a ScriptTriggerEditor component (mirroring
AlarmTriggerEditor): a trigger-type dropdown plus type-specific panels for
Interval, ValueChange, Conditional, and Call, a grouped attribute picker,
and an auto-generated hint. A ScriptTriggerConfigCodec round-trips the
TriggerConfiguration JSON the site runtime's ScriptActor consumes, tolerant
of legacy keys; an unrecognized stored type is preserved untouched in a
read-only panel.
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.
- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
accept an anonymous object instead of a hand-built dictionary, via a
shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
Attributes/CallScript route to it cross-site; adds site-side
RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
never existed.
Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
UseStaticFiles middleware ran before the MapStaticAssets endpoints and
served static assets (monaco-init.js, site.css, etc.) with no
Cache-Control header. Browsers then heuristically cached them and kept
serving stale copies across deploys — e.g. the Monaco editor ran an old
monaco-init.js that did not send the script kind, so inbound API method
scripts were analysed against the wrong globals and 'Route' was flagged
as undefined.
MapStaticAssets alone now serves every static asset, tagging
non-fingerprinted files with Cache-Control: no-cache so the browser
always revalidates via ETag.
The login page previously rendered inside MainLayout, showing the full
nav sidebar and the authenticated-user footer. It now uses a bare
LoginLayout (no nav, no session-expiry watchdog, no dialog host) and
just renders its own centred card.
SessionExpiry renders inside MainLayout, which also wraps the login
page. For a user with a still-present auth cookie but an expired
expires_at claim, it redirected /login back to /login indefinitely.
It now skips the redirect when already on the login page.
Deleting an instance only undeployed it from the site and set the state
to NotDeployed, leaving an orphan record that could never be removed —
the state-transition matrix rejected delete from NotDeployed.
Delete now removes the instance record entirely (deployment history,
snapshot, attribute/alarm overrides, and connection bindings go with
it), and is permitted from any state.
Script execution failures were only written to Serilog, never to the
site event log — SiteRuntime did not reference the SiteEventLogging
project. ScriptExecutionActor now resolves ISiteEventLogger and emits a
'script'/'Error' event on timeout and exception.
The event-log query handler was a per-node actor bound to that node's
local SQLite. A ClusterClient query could land on the standby (which
records no events) and return nothing. The handler is now a cluster
singleton with a proxy, so queries always reach the active node.
Previously a user idling past the 30-minute cookie expiry stayed parked
on a stale page until they tried to navigate. The auth cookie's UTC
expiry is now also stamped onto an expires_at claim at sign-in, and a
SessionExpiry component mounted in MainLayout schedules a delay until
expiry + 2s grace, then force-loads /login — at which point the standard
cookie middleware confirms the session is gone and serves the login page.
Adds a Debug View item to the instance context menu on /deployment/topology
that navigates to /deployment/debug-view with siteId and instanceId query
parameters; the page now auto-connects when those are present (falling
back to the existing localStorage auto-reconnect otherwise). Disabled for
non-Enabled instances since debug streaming only targets enabled ones.
Also fixes a latent NRE in DebugView.OnInitializedAsync: the toast ref
isn't bound yet during init, so transient load failures are now stashed
and surfaced from OnAfterRenderAsync where the toast is ready.
Admins can now check/uncheck which API methods this key is approved to
invoke directly on /admin/api-keys/{id}/edit, instead of having to bounce
through the Design role's API method editor. Membership is diffed against
the initial state and applied by mutating ApprovedApiKeyIds on each
affected ApiMethod in the same SaveChangesAsync.
Remove the LmxProxy work package (WP-8) from phase-3b, the CD-DCL-1..6
protocol details, Q9/Q-P3B-2 from the questions log, the LmxProxy
component-design rows in requirements-traceability, and the inline
mentions across phase-0, phase-4, the gRPC streaming plans, and the
primary/backup data-connection plans.
Delete the standalone ZB.MOM.WW.LmxProxy solution and loose adapter
stubs under deprecated/, plus the lone LmxProxy mention in
deprecated/windev.md. The protocol was never wired into the active
codebase and the runtime artifact has been removed from the cluster.
CentralCommunicationActor.HandleHeartbeat was forwarding each incoming
HeartbeatMessage to Context.Parent, which resolves to the /user
guardian — a non-actor. Every site heartbeat went straight to dead
letters (~1026 per central node per 30 minutes at the default ~2s
interval across three sites).
The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which
bumps LastReportReceivedAt on already-known sites (and clears IsOnline
if it had flipped) without touching LatestReport. Heartbeats from
unregistered sites are dropped — first registration still happens on
the first full report. CentralCommunicationActor calls this in place
of the no-op Tell.
The result: heartbeats now serve their stated health-monitoring
purpose (per CLAUDE.md) by keeping a site marked online between the
30s full reports if a single report is briefly delayed, and the dead
letter noise disappears entirely.
Triage was painful on the old layout: a lone Site dropdown sat on a sparse
row, errors were truncated mid-sentence with a per-row View/Hide toggle
that on expand pushed an unwrapped <pre> through the table and shoved the
Actions column off-screen, all rows looked the same regardless of age or
attempt count, and OriginInstance — which tells you which instance
produced the failure — wasn't displayed at all even though the data was
on the entity.
This pass:
- Adds a real filter bar: Site, Category, Target system, Origin instance,
Age window, free-text search. Category/Target/Origin/Age/Search filter
the loaded page client-side; Site still drives the server query (and
changing site now auto-queries — one fewer click).
- Replaces the in-table expansion with an Offcanvas detail drawer.
Clicking a row slides in a side panel with full message ID + copy,
category label, origin, attempts, both timestamps in relative + absolute
form, the complete error (pre-wrap, scrollable), and big Retry / Discard
buttons. The table never overflows.
- Stacks Target + Method into one column (target in semibold, method
small/muted below) and surfaces Origin as a code-styled chip in a new
column ("—" muted when null).
- Severity left-border on each row, derived client-side from
AttemptCount/MaxAttempts and age of the last attempt: red when retries
are exhausted and last attempt was in the past hour, amber when
exhausted but stale, muted grey otherwise.
- Mini attempt progress bar under the n/max count, red when fully
exhausted and amber while partial.
- Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on
hover via the title attribute — applies in both the table and the drawer.
- Bulk select: header checkbox selects the filtered set, per-row
checkboxes. When ≥1 selected, a sticky action strip slides in below the
filter bar offering Retry selected / Discard selected with the usual
confirm dialog. Toast reports per-item success/failure counts.
- Summary line next to the title: "N parked · K target systems · oldest
Xh ago" (and "(showing M of N)" when filters are active).
- ParkedMessageEntry contract extended additively with MaxAttempts,
Category, and OriginInstance so the UI has the data it needs for
severity, the category filter, and the new column.
- Bumped page size from 25 to 50 to better match the dense layout.
The ApiMethod entity had an ApprovedApiKeyIds column and ApiKeyValidator
read it, but no UI/CLI/seed code ever wrote to it. Result: any inbound
POST /api/{method} was rejected with 403 "API key not approved for this
method" regardless of which key was sent.
Add an "Approved API Keys" subsection to the method form, between
Timeout and Parameters: vertical list of checkboxes, one per ApiKey
row (with a "Disabled" badge for disabled keys, and a link to
/admin/api-keys when none exist). OnInitializedAsync loads all keys and
parses the existing comma-separated IDs; Save() serializes the selected
set back to the entity on both create and edit paths.
Re-uses IInboundApiRepository.GetAllApiKeysAsync — no repo or migration
changes needed.
The Parked Messages page returned "Parked message handler not available"
because no actor was ever registered for ParkedMessages, and Retry/Discard
requests had no Receive at all (would have hit deadletters). On top of
that, StoreAndForwardService.StartAsync() was never called anywhere, so
the sf_messages SQLite table was never created and the retry timer never
ran — silently breaking all of S&F.
- New ParkedMessageHandlerActor bridges StoreAndForwardService.{Get,Retry,Discard}
using the Sender→Task→PipeTo pattern already used in DeploymentManagerActor.
- SiteCommunicationActor now routes ParkedMessageRetryRequest and
ParkedMessageDiscardRequest the same way as the existing Query handler.
- AkkaHostedService.RegisterSiteActors() resolves StoreAndForwardService,
calls StartAsync() to create the schema and start the timer, then
creates and registers the handler actor.
CentralHealthAggregator is a per-node hosted singleton, but site health
reports flow through ClusterClient which round-robins each report to one
central node only. The other node's aggregator never saw those reports
and marked sites offline at the 60s threshold — sites constantly flapped
between online and offline on the monitoring page.
On receive, the active CentralCommunicationActor now republishes a
SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both
central nodes subscribe to the topic and process replicas through a
dedicated path that updates the local aggregator without re-broadcasting
(avoids fan-out loops). The aggregator's existing sequence-number
idempotency makes self-delivery a cheap no-op.
DistributedPubSubExtensionProvider is now listed in the HOCON
`akka.extensions` block so the mediator is initialised at cluster
start, eliminating a race where the first Subscribe arrived before the
extension was loaded.
The card badges were stuck on the pre-migration data shape: the param
counter only handled flat arrays (now JSON Schema objects), and the
return badge said "returns" regardless of the actual type. Count
`properties` for object schemas with array fallback, and label the
return badge with the schema's `type` (or `T[]` for arrays).
Three layers were each blind to nested composition in different ways:
- FlatteningPipeline only loaded compositions for templates in the parent's
inheritance chain, so depth-2 composed attributes (e.g.
Pump.AlarmSensor.SensorReading) never materialized. Walk composed chains
breadth-first so the flattener's nested step has the data it needs.
- InstanceConfigure's alarm trigger picker was fed only direct, non-locked
attributes, hiding inherited and composed-member paths. Feed it the full
flattened attribute list via FlatteningPipeline.
- ValidationService.ExtractAttributeNameFromTriggerConfig only recognized
"attributeName", silently passing alarms still using the legacy
"attribute" key. Accept both keys, matching FlatteningService,
AlarmActor, and AlarmTriggerConfigCodec.
Replaces the per-row JSON textbox with an Edit button that opens a modal
hosting the full AlarmTriggerEditor. The editor pre-populates with the
merged inherited + override config so the operator sees the effective
state, not the override delta.
On Save:
- HiLo: diff against inherited, store only changed keys
- Binary trigger types: whole-replace if the edited config differs
Value comparison in the diff is type-aware (decoded strings, numeric
GetDouble) so JSON-escape differences (e.g., literal em-dash vs —)
don't produce false-positive diffs that pollute the override JSON.
FlatteningService.MergeHiLoConfig is now public so the UI can pre-merge
the editor seed; new public DiffHiLoConfig handles the symmetric
direction. +2 encoding tests cover the new equivalence behavior.
The override row's summary column shows the diff'd keys + priority chip
so operators see what's overridden at a glance.
Adds an Alarm Overrides card to the per-instance Configure page (next to
the existing Attribute Overrides and Connection Bindings cards). Each
non-locked template alarm gets a row showing its trigger type, inherited
config, and inputs for an override JSON + priority override. A Clear
button removes the override; the Save Alarm Overrides button upserts
all dirty rows.
The HiLo merge / binary whole-replace semantics are surfaced via the JSON
placeholder hint per trigger type. Wired to the existing
InstanceService.SetAlarmOverrideAsync / DeleteAlarmOverrideAsync flow.
Adds a new HiLo alarm trigger type with four configurable setpoints
(LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority,
deadband (for hysteresis), and operator message. The site runtime emits
AlarmStateChanged with an AlarmLevel field so consumers can differentiate
warning vs critical bands.
Plumbing:
- new AlarmLevel enum + AlarmStateChanged.Level/Message init properties
- AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting
- AlarmTriggerConfigCodec extracted from the editor for testability
- sitestream.proto carries level + message over gRPC
- SemanticValidator enforces numeric attribute, setpoint ordering,
non-negative deadband
- on-trigger scripts get an Alarm global (Name/Level/Priority/Message)
so notification routing can branch by severity
- per-instance InstanceAlarmOverride entity + EF migration + flattening
step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary
types whole-replace
- DebugView shows a Level badge + per-band message tooltip
- App.razor auto-reloads on permanent Blazor circuit failure
- docker/regen-proto.sh automates the proto regen workflow (the linux/arm64
protoc segfault means generated files are checked in for now)
Replace raw-JSON text inputs with rich UI: script parameter/return types use
a JSON Schema builder (SchemaBuilder + JsonSchemaShapeParser, with a migration
to convert existing definitions); alarm trigger config uses a type-aware
editor with a flattened attribute picker (AlarmTriggerEditor). AlarmActor
gains optional direction (rising/falling/either) on RateOfChange triggers.
DeleteCompositionAsync only dropped the top-level derived template — the
cascaded inner derived rows (created when composing a composite source)
were left orphaned with dangling OwnerCompositionId references. Any
subsequent attempt to recompose the same source hit the name-collision
guard ('Motor Controller.Pump.TempSensor' already exists).
New CascadeDeleteDerivedAsync walks each composition on the derived
template, recursively removes the slot-owned child derived first, then
the composition row, then the derived itself. Mirrors the recursive
shape of CreateCascadedCompositionAsync.
Composition leaves were rendered flat — the cascaded inner derived
templates existed in the DB but the tree only showed the outer slot
name (e.g. "Tank Monitor > DrivePump") with no way to see DrivePump's
own TempSensor + AlarmSensor slots.
BuildCompositionLeaves now recurses: for each composition under a
template, look up the composed template (which after derive-on-compose
is a derived row carrying its own Compositions) and build its slot
leaves as children. HasChildrenSelector loses the
"not a composition" guard so nested leaves render with the expand
chevron.
When the user composes a template that already has compositions of its
own (e.g. \$Sensor → Probe1 slot), only the outer derived was created
— the source's children weren't replicated. AddCompositionAsync now
walks the source's composition graph and creates a parallel derived for
every slot it encounters, each linked back through ParentTemplateId so
override chains stay intact (\$Probe → \$Sensor.Probe1 → \$Pump.TempSensor.Probe1).
The cascade pre-flights every name it would create — a deep collision
aborts before any rows mutate. Internal helper
CreateCascadedCompositionAsync skips the "base templates only" check
since it operates on the source side which may legitimately reference
derived rows.
The right-click context menu is now the single entry point for every
per-row action — folders, templates, and composition leaves. Drop the
⋮ kebab buttons that duplicated the menu and the click-to-open
behavior that was easy to trigger by accident while navigating the
tree. Templates and composition slots open on double-click instead.
- RenderNodeKebab removed entirely.
- Selectable / SelectedKeyChanged / OnTreeNodeSelected dropped from
the TreeView wiring — single-click no longer navigates.
- New OpenTemplate(id) helper bound to @ondblclick on Template and
Composition labels.
Derived templates are slot-owned and reached only via their owning
parent's composition leaf in the tree — there's no scenario where
listing them as standalone root nodes is useful, so the toggle was
dead UI. Remove the form-switch, the _showDerived state, and the
OnToggleShowDerived handler; BuildTemplateTree filters derived
templates out unconditionally.
The custom right-click context menu didn't close after a menu item
opened a modal dialog (e.g. "Compose into…"), leaving the menu
floating behind the modal until the user clicked elsewhere or hit
Escape. Add @onclick="DismissContextMenu" on the menu container so
any click inside it (button, divider, padding) closes the menu after
the button's own handler bubbles up.
Move composition CRUD off the TemplateEdit page and onto the tree
context menu, matching Aveva's Template Toolbox flow.
- New ComposeIntoDialog: pick a parent template, slot name (defaults
to the source template's name).
- "Compose into…" on every base template's context menu (kebab + right
click) opens the dialog and calls AddCompositionAsync.
- "Rename…" on composition leaves opens a prompt and calls
TemplateService.RenameCompositionAsync. The owning composition row
AND its owned derived template are renamed atomically; duplicate
slot names or derived-name collisions abort with a clear error.
- "Delete" on composition leaves confirms + cascade-deletes the
composition (and its derived template via DeleteCompositionAsync).
- "New Derived Template" menu item renamed to "New Inheriting Template"
to disambiguate from the new derive-on-compose meaning.
TemplateEdit's Compositions tab, Add Composition form, and
Add/DeleteComposition handlers + state fields are deleted — the tree
is now the single source of truth.
All nine derive-on-compose phases are now implemented. The status doc
captures what shipped per phase, what was deferred (LockedInDerived
override warning toast, SCADA008 base-Parent hint), and the live-DB /
UI smoke checks worth running before merge.