Notify-and-fetch follow-ups:
- PendingDeploymentPurgeActor: a central cluster singleton (not
readiness-gated, best-effort) that sweeps expired PendingDeployment
staging rows on CommunicationOptions.PendingDeploymentPurgeInterval
(default 1h). Modeled on the kpi-history-recorder pattern: self-scheduling
timer, per-tick DI scope -> IDeploymentManagerRepository, continue-on-error.
Wired in AkkaHostedService.RegisterCentralActors (manager + proxy + drain);
resolves the deferred TODO in DeploymentService. Correctness never depends
on it (supersession bounds rows to <=1/instance; the fetch endpoint enforces
the TTL), so it is deliberately absent from RequiredSingletonsHealthCheck.
- SQL Server integration test for StagePendingIfAbsentAsync re-staging an
instance's OWN DeploymentId over an expired row against the real UNIQUE
index on DeploymentId — confirms EF orders DELETE before INSERT in one
SaveChanges (SQLite's constraint timing differs from SQL Server's). Plus
a same-instance supersession variant on real SQL Server.
Tests: 2 TestKit actor tests + 2 SQL Server integration tests (both ran
green against the infra MSSQL container); 235 Communication + 15
PendingDeployment tests pass; Host builds 0 warnings.
The 128 KB Akka frame-size deploy trap is fixed and merged. Record the
resolution at the top of the writeup (notify-and-fetch on both the
central->site deploy hop and the site active->standby replication hop;
AskTimeout classification fix; startup reconciliation + topology perf fix),
link the design + plan docs, and note the live-smoke validation. The
diagnosis is retained as historical record.
Populates ScadaBridge:Communication:CentralFetchBaseUrl in every central
appsettings for the notify-and-fetch deploy flow. An empty value now
causes a fail-fast on deploy; this config prevents that regression.
- docker/: http://scadabridge-traefik (in-network Traefik LB, port 80)
- docker-env2/: http://scadabridge-env2-traefik (env2 Traefik LB, port 80)
- src/Host base: http://localhost:5000 (ASP.NET Core default for single-host dev)
- deploy/wonder-app-vd03: http://localhost:8085 (gitignored; edited in main repo)
RUNBOOK Upgrading section updated with note on this setting.
Add StoreDeployedConfigIfNewerAsync to SiteStorageService — an atomic
conditional upsert using SQLite's ON CONFLICT … WHERE clause so the
standby node only overwrites a deployed_configurations row when the
incoming deployed_at is strictly newer. The active-node path
(StoreDeployedConfigAsync) stays unguarded. Four deterministic tests
cover: no-row insert, older-overwrites, newer-is-noop, equal-is-noop;
all seed rows with explicit DateTimeOffset values via direct SQL to
avoid wall-clock timing dependencies.
Akka.Actor.AskTimeoutException does not derive from System.TimeoutException,
so the isTimeout check in DeployInstanceAsync's catch block missed it and
routed it to the generic "Deployment error:" branch. This broke the
DeploymentManager-006 reconciliation query (query-before-redeploy), which
keys off the "Communication failure:" prefix to detect a prior timeout-induced
failure. Add AskTimeoutException to the pattern; add a covering regression test.
Add CommunicationService.RefreshDeploymentAsync — the typed send method
for the small notify-and-fetch wire message (RefreshDeploymentCommand).
Mirrors DeployInstanceAsync exactly: SiteEnvelope + Ask<DeploymentStatusResponse>
bounded by DeploymentTimeout. CentralCommunicationActor needs no change
(HandleSiteEnvelope is fully generic — all SiteEnvelope messages forward
to /user/site-communication without a per-type switch). Adds a parallel
routing test asserting the envelope reaches the site ClusterClient.
Compact Native-AOT win-x64 console app DELMIA invokes to notify ScadaBridge of a
recipe download via POST /api/DelmiaRecipeDownload (X-API-Key). Drop-in CLI/output
parity with legacy WWNotifier; appsettings.json + SCADABRIDGE_API_KEY env var;
comma-list base URLs with connect-failure-only failover.
Match the shipped InboundDatabaseHelper throughout the implementation plan:
QuerySingle->QuerySingleAsync (and Query->QueryAsync) — async signatures
(async Task<T?>), awaited async ADO.NET (ExecuteScalarAsync/ExecuteReaderAsync/
ReadAsync with the deadline token), async Task test methods with await, and the
architecture/step/acceptance prose + pseudocode now call
await Database.QuerySingleAsync<T>(...). Sibling fix to the design-doc correction.
The IpsenMES MoveIn design-doc pseudocode and helper-surface sketch used the
synchronous, read-only `Database.QuerySingle<T>`/`Query`. The shipped
InboundDatabaseHelper is async and write-capable: `await QuerySingleAsync<T>`,
`QueryAsync`, `ExecuteAsync` (InboundAPI-026/027).
Three inbound methods authored from this draft (IpsenMESMoveIn, MesMoveIn,
MesMoveOut) failed Roslyn compilation in production until corrected to
`await Database.QuerySingleAsync<...>(...)` (2026-06-25). Fix the pseudocode,
the helper-surface bullet, and the inline reference, and add a dated correction
note pointing at the authoritative Component-InboundAPI.md surface.
The templates tree rendered a derived/composed member (e.g. LeftReactorSide,
derived from ReactorSide) as a flat leaf, omitting compositions it inherits
from its base (e.g. LeakTest composed onto ReactorSide). BuildCompositionLeavesFor
recursed only over a template's OWN composition rows; an inherited composition
row lives on the ancestor, and TemplateComposition has no IsInherited placeholder
(unlike attributes/alarms/scripts/native-sources), so the child's own Compositions
was empty. Same 'derived templates don't surface inherited members' family as
followups #1/#2, but for compositions. Deploy/flatten was always correct
(TemplateResolver.ResolveAllMembers walks the chain) — display-only.
Fix:
- BuildCompositionLeavesFor now renders the EFFECTIVE composition set (own +
inherited) via EffectiveCompositionsFor, which walks the inheritance chain
(leaf->root, child wins on InstanceName), mirroring the resolver.
- Inherited slots are flagged (TemplateTreeNode.IsInherited), badged 'inherited'
in the label, and their context menu offers only 'Open composed template'
(Rename/Delete edit the ancestor's slot, so suppressed on inherited nodes).
- The same inherited row can appear under several derived members (LeakTest under
both LeftReactorSide and RightReactorSide), so composition nodes use a
path-qualified KeyOverride to keep TreeView selection/expansion keys unique;
recursion is cycle-guarded.
Tests: +1 bUnit (TemplatesPageTests.Renders_InheritedComposition_UnderDerivedComposedMember);
CentralUI suite 867 green; full solution builds 0/0.
Docs: Component-CentralUI.md (effective composition set in tree); known-issues
tracker #9 recorded + resolved.
Note: CentralUI change — shows on wonder-app-vd03 only after that host is redeployed.
The HOST-021 fix (AkkaHostedService.GetOrCreateActorSystem + AddSingleton<ActorSystem>
in Program.cs/SiteServiceRegistration.cs) is committed and live on main; the tracker
still read 'pending commit' / 'Fix (pending, task #48)'. Status corrected.
Closes the four remaining items in the 2026-06-24 template-inheritance/CLI
follow-up tracker.
#4 — CLI `instance set-bindings` can now set DataSourceReferenceOverride.
`--bindings` accepts an optional 3rd element per entry:
[attributeName, dataConnectionId, dataSourceReferenceOverride]. A string
sets the override; a JSON null or an omitted 3rd element leaves it unset
(template default). TryParseBindings accepts 2- or 3-element entries and
rejects a non-string/non-null 3rd element or 4+ elements with a clean
error. Previously the CLI sent the override as null and silently wiped any
existing one (only a raw POST /management could set it).
#5 — `template update` is partial, not full-replace (fixed server-side so all
clients benefit). UpdateTemplateAsync now uses leave-unchanged semantics:
a null description keeps the stored value (pass "" to clear); a null
parentTemplateId keeps the existing parent. Parent stays immutable — a
non-null differing value is still rejected — but omitting --parent-id is
now a no-op instead of failing every derived-template update.
#6 — compact `template list`/`get` table output + `--detail`. Table output is
now id/name/description/parent/derived + member counts (#attrs/#alarms/
#scripts/#comps/#nativeAlarms) via TemplateTableProjection, fed through a
new optional tableProjector seam on CommandHelpers. `--detail` restores the
full dump. JSON output is left untouched (always full) so machine consumers
are unaffected — the projector only runs on the table path.
#8 — structured deploy-time validation error. New ValidationResult.SummarizeErrors()
(Commons) returns a grouped, capped summary: leading total count, one line
per ValidationCategory, and a per-module rollup (canonical name up to its
last dot) with counts + "... and N more module(s)" caps. DeploymentService
uses it for the "Pre-deployment validation failed" message and logs the full
per-entry list via LogWarning. Replaces the flat semicolon-joined dump that
became a wall of text for instances with 50-194 unbound attributes.
Tests: +8 Commons (SummarizeErrors), +8 CLI (4 binding 3-element / 4 table
projection), +2 net TemplateEngine (partial-update). Affected suites green:
Commons 587, CLI 341, TemplateEngine 447, DeploymentManager 101,
ManagementService 230, CentralUI 866; full solution builds 0/0.
Docs: Component-DeploymentManager.md "Validation Error Reporting"; CLI README
(set-bindings 3-element form, template update leave-unchanged, list/get
--detail); UpdateTemplateCommand doc; known-issues tracker #4/#5/#6/#8 resolved
(all 8 items now closed).
Derived templates store IsInherited placeholder rows mirroring inherited
members, but a base member added/changed/removed AFTER a child was derived
never reached the child — leaving the editor's editable tabs incomplete (#1)
and stored rows drifted from the resolved set (#2).
Fix (one order-independent reconcile, two entry points):
- Auto-propagation: every attribute/alarm/script add/update/delete now
reconciles the template's derived subtree (TemplateService.ReconcileDescendantsAsync),
hooked into all member-mutating paths incl. native-alarm-source CRUD in the
ManagementActor.
- Resync: ResyncInheritedMembersAsync repairs a template + its subtree on
demand — materialize missing placeholders, re-sync drifted ones, remove
orphans, across attributes/alarms/scripts/native sources. Exposed as
management ResyncInheritedMembersCommand (Designer-gated, audited) → CLI
`template resync-members` → a Resync button on the editor's staleness banner.
Reconcile drives off TemplateInheritanceResolver (same precedence + HiLo merge
as deploy), only ever touches IsInherited placeholders (never an authored
override), and matches the staleness comparison keys so the banner clears.
BuildDerivedTemplate now also materializes native-source placeholders at
compose time (previously omitted → any inherited native source was perpetually
stale).
Tests: +8 TemplateServiceTests (materialize / drift-update / orphan-remove /
override-untouched / base-cascade / multi-type / direct-propagate / end-to-end
add) + 1 ManagementService test fix (native-source add resolves TemplateService).
Affected suites green: TemplateEngine 446, ManagementService 230, CentralUI 866,
CLI 333, Transport 127, ConfigurationDatabase 307; full solution builds 0/0.
Docs: Component-TemplateEngine.md "Inherited-Member Propagation & Resync";
CLI README `template resync-members`; known-issues tracker #1/#2 resolved.
#3 — CollisionDetector counted a derived template's IsInherited placeholder
rows as a distinct origin from the parent members the inheritance walk
re-adds, reporting a spurious "Naming collision" for every inherited row and
blocking any attribute/composition add to a derived template. CollectDirectMembers
now skips IsInherited rows on the direct-template and inherited-parent walks;
it keeps them for the composed-module walk, where placeholders are the sole
representation of a derived module's inherited members (that walk does not
climb the composed template's parent chain).
#7 — SandboxAttributeAccessor (Central UI Test-Run host) omitted
WriteBatchAndWaitAsync / WaitAsync / WaitForAsync, so the editor false-flagged
valid instance scripts with CS1061 even though `template validate` and the
deploy gate accept them. Added the five overloads mirroring the runtime
AttributeAccessor; they throw a labelled ScriptSandboxException if run in
Test Run (the central sandbox has no device-batch / event-waiter transport).
Tests: +3 CollisionDetector unit + 1 end-to-end TemplateService (derived add
now succeeds); +2 ScriptAnalysisService diagnose-clean. Each new test verified
to fail without its fix with the exact user-facing symptom. Full suites green
(TemplateEngine.Tests 438, CentralUI.Tests 866).
Docs: Component-TemplateEngine.md (inherited-placeholder collision rule),
Component-ScriptAnalysis.md (third sandbox surface + its compile-clean guard),
known-issues tracker #3/#7 marked resolved and the minor note promoted to #8.
Flip DataConnectionLayer-029, InboundAPI-031, SiteRuntime-032/033,
StoreAndForward-028, AuditLog-017, CentralUI-037, ScriptAnalysis-009 from Open to
Resolved with the fixing commit + a one-line resolution each; regen README
(0 pending / 576 total across 25 modules).
Fixes the 8 findings from the 2026-06-24 re-review (commit c42bb485), with a
regression test per Medium finding:
- DataConnectionLayer-029 (Med): HandleAlarmSubscribeCompleted now mirrors the
tag-path re-check — if a feed is already stored for the source, release the
redundant just-created subscription instead of overwriting + leaking the first
one (the double-subscribe window DCL-023 reopened). +regression test.
- InboundAPI-031 (Med): remove WaitForAttribute's local 5s grace backstop (tighter
than the CommunicationService Ask's timeout+IntegrationTimeout round-trip budget,
so a slow-but-valid timed-out 'false' got cancelled into a 500). Link only the
client-abort + explicit caller tokens; the lower layer owns the backstop. +test.
- SiteRuntime-032 (Med): derive the deployed count from an authoritative set of
deployed config names (HashSet) instead of a map-presence-gated int, so deleting
a DISABLED instance decrements correctly (SiteRuntime-029's gate leaked it).
+deploy->disable->delete regression test.
- StoreAndForward-028 (Med): reset _bufferedCount in StopAsync alongside the
register-guard so a same-instance Stop->Start re-seeds from a clean base (no ~2N
gauge double-count). +restart regression test.
- AuditLog-017 (Low): test the OnIngestAsync scope-resolution guard (actor survives,
replies empty, counts the failure) — no longer unpinned.
- CentralUI-037 / ScriptAnalysis-009 / SiteRuntime-033 (Low): doc-comment + spec
fixes (Database-throws in the inbound sandbox; baseReferences param wording;
native-alarm cap return-to-normal + per-condition NativeAlarmDropped eviction).
Targeted suites green: SiteRuntime 5, StoreAndForward 6, InboundAPI 31,
DataConnectionLayer 10, AuditLog 5, ScriptAnalysis 40, CentralUI ScriptAnalysis 52.
Re-reviewed the modules whose source changed since the last review baseline
(full-review remediation fd618cf1 + InboundAPI Database-helper fixes b3c90143),
focused on whether the fixes are sound and regression-free. 9 of 17 modules
clean; 8 new findings (0 Critical, 0 High, 4 Medium, 4 Low), all code-verified
by the orchestrator before recording:
- DataConnectionLayer-029 (Med): DCL-023's unsubscribe-clears-in-flight reopens a
double-subscribe window that leaks an orphaned alarm feed; the alarm completion
handler overwrites the subscription id without the tag-path guard at line 908.
- InboundAPI-031 (Med): WaitForAttribute's 5s grace backstop is tighter than the
CommunicationService Ask's timeout+IntegrationTimeout (30s) round-trip slack, so
a slow-but-valid timed-out 'false' arriving in the 5-30s window is cancelled into
an unhandled OperationCanceledException/500 (contradicts spec 6 + its own comment).
- SiteRuntime-032 (Med): SiteRuntime-029's wasPresent guard skips the deployed-count
decrement when deleting a DISABLED instance (absent from both maps), drifting the
health-dashboard tally; self-heals on singleton restart (observational, hence Med).
- StoreAndForward-028 (Med): StoreAndForward-025 resets the register-guard but not
_bufferedCount, so a same-instance Stop->Start re-seeds the depth gauge to ~2N.
- AuditLog-017, CentralUI-037, ScriptAnalysis-009, SiteRuntime-033 (Low): a
test-coverage gap plus stale doc-comments/spec following the remediation.
Header commit/date bumped to 1f9de8a2 / 2026-06-24 on all 17 modules; README
regenerated (8 pending / 576 total).
Records the remediation in commit b3c90143: Database helper authorized + secured
(parameterized, writes allowed, async/deadline-bound), WaitForAttribute bound by the
wait timeout, and the newly-surfaced compile-surface-mirror gap (030) fixed. InboundAPI
drops to 0 open findings; aggregate README regenerated (0 pending / 568 total).