Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.
- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
IncrementSiteAuditWriteFailures() counter, per-interval reset
semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
registration for the real bridge; called from
SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
HealthMetricsAuditWriteFailureCounter (not NoOp).
Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
Append-only data-access surface for the central AuditLog table — three
methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync
(filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and
SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until
M6 lands the non-aligned-index drop/rebuild dance for the partition
switch). No Update, no row-delete; bulk purge is partition-only.
Bundle D of the Audit Log #23 M1 Foundation plan.
The script-analysis sandbox Notify surface was stale after the Notification
Outbox change: SandboxNotifyTarget.Send returned Task<NotificationResult> and
there was no Status method, while production NotifyTarget.Send returns
Task<string> (a NotificationId) plus NotifyHelper.Status. A script that
test-ran cleanly in the sandbox would not compile against the real site
runtime.
- Move the NotificationDeliveryStatus record from ScadaLink.SiteRuntime.Scripts
into ScadaLink.Commons.Messages.Notification so both production and the
CentralUI sandbox reference the exact same type (CentralUI does not, and
should not, reference SiteRuntime). Production NotifyHelper.Status is
otherwise untouched.
- Rewrite SandboxNotifyHelper/SandboxNotifyTarget to be a signature-faithful
no-op fake: Send returns Task<string> (a fake NotificationId), Status returns
Task<NotificationDeliveryStatus>. Production now enqueues into the site S&F
engine, which has no central-side equivalent in the sandbox, so the fake no
longer carries an INotificationDeliveryService.
- Add script-analysis tests proving a script using the new Notify shape both
diagnoses clean and runs in the sandbox.
Inbound-API bearer credentials are no longer persisted in plaintext. ApiKey now
holds a KeyHash (peppered HMAC-SHA256); the key is shown once at creation and
only its hash is stored. Lookup and validation hash the presented candidate.
Cross-module: Commons (ApiKey, ApiKeyHasher), ConfigurationDatabase (mapping +
HashApiKeyValue migration), InboundAPI (ApiKeyValidator), ManagementService
(key creation), CentralUI (ApiKeys.razor). Existing keys must be re-issued.
Adds IsInherited/LockedInDerived to the TemplateAlarm entity (mirroring the
attribute/script override model), an EF migration, base-alarm copy-on-derive,
inherited-alarm flattening skip, and LockedInDerived override-rejection validation.
Adds DeploymentStateQuery request/response contracts (Commons), a site-side
handler (SiteRuntime), a CommunicationService query method (Communication), and
reconciliation in DeploymentService: when a prior record is InProgress or
Failed-on-timeout, query the site; if it already holds the target revision hash
mark the record Success without re-sending; on query failure fall through to a
normal deploy (site-side stale-rejection is the safety net).
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.
- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
accept an anonymous object instead of a hand-built dictionary, via a
shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
Attributes/CallScript route to it cross-site; adds site-side
RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
never existed.
Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
Deleting an instance only undeployed it from the site and set the state
to NotDeployed, leaving an orphan record that could never be removed —
the state-transition matrix rejected delete from NotDeployed.
Delete now removes the instance record entirely (deployment history,
snapshot, attribute/alarm overrides, and connection bindings go with
it), and is permitted from any state.
Triage was painful on the old layout: a lone Site dropdown sat on a sparse
row, errors were truncated mid-sentence with a per-row View/Hide toggle
that on expand pushed an unwrapped <pre> through the table and shoved the
Actions column off-screen, all rows looked the same regardless of age or
attempt count, and OriginInstance — which tells you which instance
produced the failure — wasn't displayed at all even though the data was
on the entity.
This pass:
- Adds a real filter bar: Site, Category, Target system, Origin instance,
Age window, free-text search. Category/Target/Origin/Age/Search filter
the loaded page client-side; Site still drives the server query (and
changing site now auto-queries — one fewer click).
- Replaces the in-table expansion with an Offcanvas detail drawer.
Clicking a row slides in a side panel with full message ID + copy,
category label, origin, attempts, both timestamps in relative + absolute
form, the complete error (pre-wrap, scrollable), and big Retry / Discard
buttons. The table never overflows.
- Stacks Target + Method into one column (target in semibold, method
small/muted below) and surfaces Origin as a code-styled chip in a new
column ("—" muted when null).
- Severity left-border on each row, derived client-side from
AttemptCount/MaxAttempts and age of the last attempt: red when retries
are exhausted and last attempt was in the past hour, amber when
exhausted but stale, muted grey otherwise.
- Mini attempt progress bar under the n/max count, red when fully
exhausted and amber while partial.
- Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on
hover via the title attribute — applies in both the table and the drawer.
- Bulk select: header checkbox selects the filtered set, per-row
checkboxes. When ≥1 selected, a sticky action strip slides in below the
filter bar offering Retry selected / Discard selected with the usual
confirm dialog. Toast reports per-item success/failure counts.
- Summary line next to the title: "N parked · K target systems · oldest
Xh ago" (and "(showing M of N)" when filters are active).
- ParkedMessageEntry contract extended additively with MaxAttempts,
Category, and OriginInstance so the UI has the data it needs for
severity, the category filter, and the new column.
- Bumped page size from 25 to 50 to better match the dense layout.
CentralHealthAggregator is a per-node hosted singleton, but site health
reports flow through ClusterClient which round-robins each report to one
central node only. The other node's aggregator never saw those reports
and marked sites offline at the 60s threshold — sites constantly flapped
between online and offline on the monitoring page.
On receive, the active CentralCommunicationActor now republishes a
SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both
central nodes subscribe to the topic and process replicas through a
dedicated path that updates the local aggregator without re-broadcasting
(avoids fan-out loops). The aggregator's existing sequence-number
idempotency makes self-delivery a cheap no-op.
DistributedPubSubExtensionProvider is now listed in the HOCON
`akka.extensions` block so the mediator is initialised at cluster
start, eliminating a race where the first Subscribe arrived before the
extension was loaded.
Adds a new HiLo alarm trigger type with four configurable setpoints
(LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority,
deadband (for hysteresis), and operator message. The site runtime emits
AlarmStateChanged with an AlarmLevel field so consumers can differentiate
warning vs critical bands.
Plumbing:
- new AlarmLevel enum + AlarmStateChanged.Level/Message init properties
- AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting
- AlarmTriggerConfigCodec extracted from the editor for testability
- sitestream.proto carries level + message over gRPC
- SemanticValidator enforces numeric attribute, setpoint ordering,
non-negative deadband
- on-trigger scripts get an Alarm global (Name/Level/Priority/Message)
so notification routing can branch by severity
- per-instance InstanceAlarmOverride entity + EF migration + flattening
step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary
types whole-replace
- DebugView shows a Level badge + per-band message tooltip
- App.razor auto-reloads on permanent Blazor circuit failure
- docker/regen-proto.sh automates the proto regen workflow (the linux/arm64
protoc segfault means generated files are checked in for now)
Phase 1 of the design at
docs/plans/2026-05-12-derive-on-compose-design.md.
Additive schema only — no behavior changes. Existing data and code
paths continue to work; subsequent phases will start writing the
new fields.
Template gains:
IsDerived true when this row was auto-created to back
a composition slot
OwnerCompositionId back-ref to the owning TemplateComposition
(plain int, not an EF nav property — managed
by TemplateService for cascade-delete)
TemplateAttribute / TemplateScript each gain:
IsInherited row copied from base and not yet overridden;
changes to the base flow downward
LockedInDerived on a base, blocks derived from overriding;
enforced at the service layer in later phases
EF Core migration AddDerivedTemplateFields adds four columns:
Templates.IsDerived bit NOT NULL DEFAULT 0
Templates.OwnerCompositionId int NULL
TemplateAttributes.IsInherited bit NOT NULL DEFAULT 0
TemplateAttributes.LockedInDerived bit NOT NULL DEFAULT 0
TemplateScripts.IsInherited bit NOT NULL DEFAULT 0
TemplateScripts.LockedInDerived bit NOT NULL DEFAULT 0
Existing rows get the defaults. Tests across SiteRuntime / TemplateEngine
/ CentralUI suites stay green (129 / 199 / 159).
Next: phase 2 — wire AddCompositionAsync to derive on compose for
new compositions. Old data still flows the direct-reference path
until phase 3's migration script.
Two caveats from the script-scope rollout addressed:
1. ITemplateEngineRepository.GetTemplatesComposingAsync — a scoped
query that returns only the templates referencing a given template
via Compositions, eager-loaded with their Attributes / Scripts /
Compositions. Replaces the GetAllTemplatesAsync + filter pattern
in TemplateEdit so the Monaco metadata fetch doesn't pull the
entire template catalog to find one parent.
2. Multi-parent picker. The previous implementation suppressed Parent
assistance entirely when more than one template composes the open
one. Now TemplateEdit collects every parent into _editorParents
and renders a small `select` above the script editor when there
are >1, letting the user choose which parent's metadata drives
Parent.Attributes / Parent.CallScript completion + diagnostics.
Single-parent templates skip the picker (no UI change). Zero
parents (root template) hide the picker and surface no Parent
assistance.
Browser-verified on the Sensor Module template (composed by both Pump
and Variable Speed Motor): picker shows both options, switching
updates the editor's parent metadata immediately via the existing
GetContext callback.
Test counts unchanged (159 / 199); the new repo method is exercised
end-to-end by the parent-picker browser path.
Phases 1+2 of the design at
docs/plans/2026-05-12-script-scope-access-design.md.
Adds ergonomic scope-aware accessors to compiled scripts. A script
on a composed TempSensor reads its own attribute via
Attributes["Temperature"]; reaches up to the parent via
Parent.Attributes["SpeedRPM"]; invokes a child script via
Children["TempSensor"].CallScript("Sample"). All resolve to the
existing flat Instance.GetAttribute / SetAttribute / CallScript
delegates by prepending the script's canonical path prefix.
Runtime types (SiteRuntime.Scripts.ScopeAccessors):
AttributeAccessor sync indexer + GetAsync / SetAsync
CompositionAccessor Attributes + CallScript
ChildrenAccessor Children["name"] => CompositionAccessor
ScriptGlobals gains Scope, Attributes, Children, Parent properties.
Sync indexer blocks on the Instance Actor Ask; explicit GetAsync /
SetAsync are also available for callers that want to await.
Plumbing:
- Commons.Types.Scripts.ScriptScope record (SelfPath / ParentPath).
- ResolvedScript.Scope (defaults to ScriptScope.Root for back-compat).
- FlatteningService emits new ScriptScope(prefix, "") for each
composed script so a script defined on TempSensor composed under
a parent gets SelfPath = "TempSensor".
- ScriptActor reads the Scope from its ResolvedScript and forwards
it through ScriptExecutionActor into ScriptGlobals on each call.
RevisionHashService not touched: the per-script canonical name
already encodes the composition path, so any structural change
already flips the hash.
10 new unit tests on the path arithmetic. Site/Template engine
suites stay green (129 + 199).
Editor surface (Phase 3: metadata fetch, Phase 4: completion +
SCADA006 / SCADA007 diagnostics) follows in the next commits.
Composable StaleTagMonitor class in Commons fires a Stale event when no
value is received within a configurable max silence period. Integrated
into both LmxProxyDataConnection and OpcUaDataConnection adapters via
optional HeartbeatTagPath/HeartbeatMaxSilence connection config keys.
When stale, the adapter fires Disconnected triggering the standard
reconnect cycle. 10 unit tests cover timer behavior.
Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider
that queries Akka cluster membership for all site-role nodes. HealthReportSender
populates ClusterNodes before each report. UI shows a row per node with
hostname, Online/Offline badge, and Primary/Standby badge. Falls back to
single-node display if ClusterNodes is not populated.
New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints
(primary/secondary), DataConnectionTagQuality (good/bad/uncertain),
ParkedMessageCount. New collector methods to populate them.
Health dashboard redesigned to match mockup: Nodes | Data Connections
(with per-connection tag quality) | Instances + S&F Buffers | Error
Counts + Parked Messages. Site names resolved from repository.
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
The LmxProxy client's ExtractArrayValue now returns proper .NET arrays
(bool[], int[], DateTime[], etc.) instead of ArrayValue objects. Removed
the reflection-based FormatArrayContainer logic — IEnumerable handling
is sufficient for all array types.