Commit Graph

9 Commits

Author SHA1 Message Date
Joseph Doherty 55f46e7c92 perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.

Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
  (+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
  ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
  contend with the hot-path INSERT lock — backlog probes on a 30s timer
  can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
  string (single-connection logger; mode was dormant config).

Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
  Convert.TryFromBase64Chars straight into the FileStream — no more
  full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
  temp file under Path.GetTempPath() (replaces in-memory byte[] field);
  page implements IDisposable to delete the temp file on reset / new
  upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.

N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
  bulk method; BundleImporter.PreviewAsync calls it once instead of per-
  template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
  a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
  DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
  GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
  GetInstanceByIdAsync per record).

Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
  RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
  requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
  _adaptersCache (built lazily on first sweep) + actor-lifetime
  _adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.

Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
  commit ac96b83 — marked Resolved with that attribution. No code change.

Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
  connection / per-delivery-factory contract (option (b) — transient
  client per Send — rejected because it re-handshakes TLS per email).

10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
2026-05-28 07:47:24 -04:00
Joseph Doherty 6ae0fea558 fix(error-handling): close Theme 4 — 18 cancellation / fire-and-forget findings
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:

Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
  captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
  threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
  during the bounded-retry window aborts cleanly.

Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
  when any row's idempotent insert is still being retried (per-EventId
  retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
  abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
  SPLIT loop — by class-doc construction the catch could only mask real
  failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
  per-interval counters before sending, restore via new
  ISiteHealthCollector.AddIntervalCounters on transport failure so counts
  aren't silently lost.

Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
  via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
  with a bounded SweepShutdownWaitTimeout (10s).

Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
  own try/catch so a throw doesn't leak the relay actor or _activeStreams
  entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
  (auth-failure exit-code contract unified).

Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
  prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
  MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
  per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
  empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
  DeleteTimedOut audit entries (mirrors DeployFailed pattern).

Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.

20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
2026-05-28 07:13:28 -04:00
Joseph Doherty 7d87994ac0 feat(inboundapi): bound audit capture at InboundMaxBytes (memory safety)
AuditWriteMiddleware previously buffered the FULL request and response
bodies into memory and only let DefaultAuditPayloadFilter trim them
after persistence. A 500 MiB upload allocated 500 MiB of MemoryStream
plus 1 GiB of UTF-16 string transiently before the filter pulled it
back to the 1 MiB inbound ceiling — the cap was real on the persisted
row but not at the capture site.

Inject IOptionsMonitor<AuditLogOptions> and read InboundMaxBytes
per-request (same convention as DefaultAuditPayloadFilter so a live
config change picks up the next request). The request reader now pulls
at most cap + 1 bytes into a UTF-8 byte-safe-truncated string and
rewinds the stream so the endpoint handler still sees the full body.
The response wrap is a new CapturedResponseStream that forwards every
Write / WriteAsync to the real sink (the client still receives all
bytes) while capturing at most cap + 1 bytes for the audit copy. The
middleware now sets PayloadTruncated itself when either body hit the
cap; the filter still OR's its own determination on top.

Adds a project reference from ScadaLink.InboundAPI to
ScadaLink.AuditLog so AuditLogOptions resolves. AuditLog does NOT
reference InboundAPI back, so no cycle is introduced.

Tests:
 - All 21 existing AuditWriteMiddlewareTests still pass (the helper
   gains an optional AuditLogOptions argument; default is the standard
   1 MiB ceiling so existing small-body tests are unaffected).
 - MiddlewareOrderTests' construction site updated for the new ctor
   arg; a StaticAuditLogOptionsMonitor file-local double mirrors the
   InboundChannelCapTests pattern.
 - New RequestBody_AboveInboundMaxBytes_TruncatedToCap_PayloadTruncatedTrue
   pins a 4 KiB cap against a 20 KB body: audit copy <= 4 KiB,
   PayloadTruncated = true, downstream handler reads the full 20 KB.
 - New ResponseBody_AboveInboundMaxBytes_TruncatedToCap_ClientStillReceivesAllBytes_PayloadTruncatedTrue
   pins the same shape on the response side: client sink receives
   20 KB, audit copy <= 4 KiB, PayloadTruncated = true.

InboundAPI test count: 133 -> 135.
2026-05-23 09:25:00 -04:00
Joseph Doherty a8d2e13d4e feat(inboundapi): AuditWriteMiddleware captures response body on ApiInbound audit rows 2026-05-23 06:00:24 -04:00
Joseph Doherty d8453bfba2 feat(inboundapi): mint inbound ExecutionId early, carry it as RouteToCallRequest.ParentExecutionId 2026-05-21 17:22:13 -04:00
Joseph Doherty cfd8f1ecf4 feat(auditlog): inbound audit rows carry ExecutionId 2026-05-21 15:44:17 -04:00
Joseph Doherty 8243f61e96 feat(auditlog): per-script-execution correlation id on sync audit rows 2026-05-21 13:46:34 -04:00
Joseph Doherty 1c862989b4 feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4) 2026-05-20 16:35:13 -04:00
Joseph Doherty 3c3f7770c1 feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4) 2026-05-20 16:35:03 -04:00