Well-localised perf fixes across 8 modules.
Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
(+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
contend with the hot-path INSERT lock — backlog probes on a 30s timer
can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
string (single-connection logger; mode was dormant config).
Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
Convert.TryFromBase64Chars straight into the FileStream — no more
full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
temp file under Path.GetTempPath() (replaces in-memory byte[] field);
page implements IDisposable to delete the temp file on reset / new
upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.
N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
bulk method; BundleImporter.PreviewAsync calls it once instead of per-
template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
GetInstanceByIdAsync per record).
Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
_adaptersCache (built lazily on first sweep) + actor-lifetime
_adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.
Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
commit ac96b83 — marked Resolved with that attribution. No code change.
Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
connection / per-delivery-factory contract (option (b) — transient
client per Send — rejected because it re-handshakes TLS per email).
10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:
Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
during the bounded-retry window aborts cleanly.
Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
when any row's idempotent insert is still being retried (per-EventId
retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
SPLIT loop — by class-doc construction the catch could only mask real
failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
per-interval counters before sending, restore via new
ISiteHealthCollector.AddIntervalCounters on transport failure so counts
aren't silently lost.
Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
with a bounded SweepShutdownWaitTimeout (10s).
Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
own try/catch so a throw doesn't leak the relay actor or _activeStreams
entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
(auth-failure exit-code contract unified).
Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
DeleteTimedOut audit entries (mirrors DeployFailed pattern).
Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.
20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
Each finding is a focused validation guard or upper bound at a trust boundary.
Highlights:
- Commons-015: EncryptionMetadata ctor now validates Algorithm (AES-256-GCM
only), Kdf (PBKDF2-SHA256 only), Iterations ([100k, 10M]), non-null Salt/IV.
- Transport-004: new BundleUnlockRateLimiter (sliding-window, per-key,
singleton) wired into BundleImporter.LoadAsync; over-budget callers see
BundleUnlockRateLimitedException. Per-bundle 3-strike + per-window cap.
- ESG-022: ExternalSystemClient.InvokeHttpAsync allow-lists the documented
GET/POST/PUT/PATCH/DELETE set (case-insensitive); unknown verbs throw.
- SEL-015: SiteEventLogger queue now bounded (10k cap, DropOldest); dropped
events fault their Task and increment FailedWriteCount so the drop is
observable instead of an unbounded memory growth.
- SEL-017: EventLogQueryService clamps caller-supplied PageSize to a new
MaxQueryPageSize cap (default 500) so int.MaxValue can't OOM the host.
- SEL-020: LogEventAsync rejects severities outside {Info, Warning, Error}
(matches SQLite BINARY-collation query filter).
- InboundAPI-020: ContentType "json" check now case-insensitive
(application/JSON no longer slips through as not-json).
- InboundAPI-024: _knownBadMethods capped at 1000 entries (drops new entries
once full); per-request DB lookup remains the correctness path.
- SR-025: HandleSetStaticAttribute validates the attribute name against the
deployed config; unknown names now return Success=false instead of
leaking orphan override rows into the SQLite store.
- TE-021: MoveTemplateAsync runs the sibling-name-collision check at the
destination, mirroring TemplateFolderService.MoveFolderAsync.
- TE-022: LockEnforcer's once-locked-stays-locked rule now also covers
LockedInDerived (was previously only IsLocked).
New regression tests across 8 test projects (EncryptionMetadata, rate
limiter, ESG client allow-list, SEL bounded channel / PageSize clamp /
severity validation, InboundAPI ContentType + bad-methods cap, SiteRT
unknown-attribute, TemplateEngine MoveTemplate + LockedInDerived).
Build clean; affected suites all green. README regenerated: 93 open (was 104).
Note: a separate manual re-run was needed for the SiteEventLogging hunk
because its initial subagent's source edits never landed on disk despite
reporting success (file-collision-style failure mode).
Bulk CommentChecker pass: fills in <param>/<inheritdoc> tags on public
APIs across all 23 src/ projects so the doc-coverage gate is green. Also
adds a Sister Projects section to CLAUDE.md pointing at the MxAccess
Gateway and OtOpcUa sibling repos, and gitignores local credential
captures (*login*.txt) and the wonder-app-vd03 deploy/ artifacts.
AuditWriteMiddleware previously buffered the FULL request and response
bodies into memory and only let DefaultAuditPayloadFilter trim them
after persistence. A 500 MiB upload allocated 500 MiB of MemoryStream
plus 1 GiB of UTF-16 string transiently before the filter pulled it
back to the 1 MiB inbound ceiling — the cap was real on the persisted
row but not at the capture site.
Inject IOptionsMonitor<AuditLogOptions> and read InboundMaxBytes
per-request (same convention as DefaultAuditPayloadFilter so a live
config change picks up the next request). The request reader now pulls
at most cap + 1 bytes into a UTF-8 byte-safe-truncated string and
rewinds the stream so the endpoint handler still sees the full body.
The response wrap is a new CapturedResponseStream that forwards every
Write / WriteAsync to the real sink (the client still receives all
bytes) while capturing at most cap + 1 bytes for the audit copy. The
middleware now sets PayloadTruncated itself when either body hit the
cap; the filter still OR's its own determination on top.
Adds a project reference from ScadaLink.InboundAPI to
ScadaLink.AuditLog so AuditLogOptions resolves. AuditLog does NOT
reference InboundAPI back, so no cycle is introduced.
Tests:
- All 21 existing AuditWriteMiddlewareTests still pass (the helper
gains an optional AuditLogOptions argument; default is the standard
1 MiB ceiling so existing small-body tests are unaffected).
- MiddlewareOrderTests' construction site updated for the new ctor
arg; a StaticAuditLogOptionsMonitor file-local double mirrors the
InboundChannelCapTests pattern.
- New RequestBody_AboveInboundMaxBytes_TruncatedToCap_PayloadTruncatedTrue
pins a 4 KiB cap against a 20 KB body: audit copy <= 4 KiB,
PayloadTruncated = true, downstream handler reads the full 20 KB.
- New ResponseBody_AboveInboundMaxBytes_TruncatedToCap_ClientStillReceivesAllBytes_PayloadTruncatedTrue
pins the same shape on the response side: client sink receives
20 KB, audit copy <= 4 KiB, PayloadTruncated = true.
InboundAPI test count: 133 -> 135.
Inbound-API bearer credentials are no longer persisted in plaintext. ApiKey now
holds a KeyHash (peppered HMAC-SHA256); the key is shown once at creation and
only its hash is stored. Lookup and validation hash the presented candidate.
Cross-module: Commons (ApiKey, ApiKeyHasher), ConfigurationDatabase (mapping +
HashApiKeyValue migration), InboundAPI (ApiKeyValidator), ManagementService
(key creation), CentralUI (ApiKeys.razor). Existing keys must be re-issued.
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.
- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
accept an anonymous object instead of a hand-built dictionary, via a
shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
Attributes/CallScript route to it cross-site; adds site-side
RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
never existed.
Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
- InboundScriptExecutor lazy-compiles scripts on first request, solving
the multi-node problem where methods created via CLI/UI were only compiled
on the ManagementActor's node, not the node handling the HTTP request.
- ManagementActor hot-registers API method scripts on create/update/delete
for the local node.
- FlatteningService prefixes the "attribute" field in composed alarm trigger
configs with the composition instance name so alarms evaluate against the
correct path-qualified attribute (e.g. CoolingTank.Level not Level).
Completes the Inbound API → site script call chain by adding RouteToCallRequest
handlers in SiteCommunicationActor and DeploymentManagerActor. Also replaces the
placeholder dispatch table in InboundScriptExecutor with Roslyn compilation of
API method scripts at startup, enabling user-defined inbound API methods to call
instance scripts across the cluster.
- WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host),
15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs
- WP-0.11: 12 per-component options classes with config binding
- WP-0.12: Sample appsettings for central and site topologies
- Add execution procedure and checklist template to generate_plans.md
- Add phase-0-checklist.md for execution tracking
- Resolve all 21 open questions from plan generation
- Update IDataConnection with batch ops and IAsyncDisposable
57 tests pass, zero warnings.
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.