8bd7656110f4e225d598a59e29ba5252b2105736
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7b0b9c7365 |
refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj, namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated. ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated. SQL roles/logins, LDAP domains, CLI command name, and CLI config dir (~/.scadalink → ~/.scadabridge) also renamed. Build green; 5 Host.Tests fail awaiting SQL login rename in next commit. Pre-existing StaleTagMonitor timing flakes unchanged. Rename script committed at tools/rename-to-scadabridge.sh. |
||
|
|
c1fe1c4f83 |
feat(audit): close AuditLog-001 — wire combined-telemetry dual-write transport
Closes the last open code-review finding. The unreachable
IngestCachedTelemetryAsync path now carries production cached-call
lifecycle traffic, delivering the design's "AuditLog + SiteCalls in one
MS SQL transaction" guarantee. Before this commit, the SiteCalls
operational half had NO production transport at all — central's
SiteCallAuditActor.OnUpsertAsync had zero producers, so cached-call
operational state never reached the central mirror.
Site-side partition (so neither path double-emits):
- ISiteAuditQueue.ReadPendingCachedTelemetryAsync — new method returning
rows where Kind ∈ {CachedSubmit, ApiCallCached, DbWriteCached,
CachedResolve} AND ForwardState = Pending.
- ISiteAuditQueue.ReadPendingAsync — XML doc updated, SQLite impl now
filters Kind NOT IN the cached set so cached rows no longer ride the
audit-only drain.
New cached-drain in SiteAuditTelemetryActor:
- Optional IOperationTrackingStore? ctor param (null on central
composition roots — the cached scheduler is never armed there).
- Independent CachedDrain message + scheduler tick parallel to the
existing Drain — a stall on one path can't block the other; shared
lifecycle CTS gates both.
- OnCachedDrainAsync: reads cached audit rows, joins each with its
matching SiteCallOperational snapshot via CorrelationId →
TrackedOperationId from the tracking store, builds CachedTelemetryBatch,
pushes via IngestCachedTelemetryAsync, marks ack'd rows Forwarded.
- Orphan rows (no tracking snapshot, thrown tracking-store call,
missing CorrelationId) logged at Warning + skipped — they stay
Pending so reconciliation/retry picks them up later. Best-effort
contract preserved.
Central side: AuditLogIngestActor.OnCachedTelemetryAsync was already
implemented (M3 Bundle G dead code today, alive after this commit) —
performs InsertIfNotExists for AuditLog + UpsertAsync for SiteCalls
inside a BeginTransactionAsync. The handler is idempotent on EventId,
so any duplicate arrivals from concurrent push + reconciliation are
silent no-ops.
Composition root: AkkaHostedService now resolves IOperationTrackingStore
via GetService<>() (site-only) and threads it through the actor's
Props.Create.
Tests added (+3 in SiteAuditTelemetryActorTests):
- Cached rows route through the new transport, not the audit-only drain.
- Orphan cached row (no tracking match) is logged + skipped, drain
doesn't crash.
- Ordinary audit rows still flow through the audit-only drain unchanged.
- ParentExecutionIdCorrelationTests now unions both queues to assert
all expected Kinds remain covered after the partition.
Build clean; AuditLog.Tests 250/251 (the 1 fail is the pre-existing
date-sensitive PartitionPurgeTests integration flake explicitly accepted
across the session); SiteRuntime.Tests 302/302.
README regenerated: 0 pending of 481 total.
Session-final totals: 136 of 136 originally-open Theme findings closed
across 11 commits (10 themed batches + this architectural close).
|
||
|
|
46cb6965ac |
fix(security): close Theme 7 — 8 secrets / redaction / append-only findings
Security-sensitive batch, handled main-thread for careful judgment on secret-leak and pepper-bypass paths. Secret leak / pepper bypass: - CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no longer hashes the candidate with the unpeppered ApiKeyHasher.Default — ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition roots without a pepper still bring up the repository), and the DI registration wires sp.GetService<IApiKeyHasher>() so the production peppered hasher matches the stored KeyHash. Regression test asserts positive (peppered roundtrip) AND negative (Default hasher misses the same key — proving the lookup uses the injected hasher). - MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs now project through SmtpConfigPublicShape so the response payload and audit-row afterState never carry the Credentials field — only a HasCredentials bool. The SMTP password / OAuth2 client secret no longer leaves the Admin-only UpdateSmtpConfig boundary the caller already supplied it to. Redaction: - AuditLog-008 (test-fixture under-redact): new SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header redaction for the always-sensitive defaults (Authorization, X-Api-Key, Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and AuditLogIngestActor (both ingest paths) default to it instead of null — composition roots that bypass AddAuditLog can no longer write unredacted auth headers to the audit store. - NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks the last colon-separated component (password / clientSecret) AND only if it's >= 12 chars (typical password heuristic). Short user names like "root" no longer become global redaction tokens that eat unrelated diagnostic text. The full packed string is always masked regardless of length. 3 new negative tests pin the no-over-mask contract. Audit-row correctness / fail-loud: - InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit, /api/management, /api/centralui, /api/script-analysis AND requires POST — the AuditWriteMiddleware no longer emits spurious ApiInbound rows for audit-log query/export endpoints (write-on-read recursion broken). - ESG-021: ApplyAuth now logs Warning (not silent) on empty AuthConfiguration for apikey/basic, unknown AuthType, and malformed Basic config. AuthConfiguration value NEVER logged. AuthType=none remains silent (documented unauthenticated sentinel). - Security-021: AddSecurity now logs a startup Warning when RequireHttpsCookie=false — an HTTP-only deployment that previously transmitted the cookie-embedded JWT silently in cleartext is now audible in the log. Defensive: - CD-021: SwitchOutPartitionAsync's monthBoundary format string now yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future sub-second / non-midnight boundary doesn't silently round to the wrong partition. Plus reconciled stale per-module Open-findings counters that had drifted from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService, Security). Build clean; all affected test projects green (Host 208, ConfigDB 242, ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog 247/248 — 1 pre-existing date-sensitive integration test flake on PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54). |
||
|
|
55f46e7c92 |
perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.
Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
(+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
contend with the hot-path INSERT lock — backlog probes on a 30s timer
can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
string (single-connection logger; mode was dormant config).
Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
Convert.TryFromBase64Chars straight into the FileStream — no more
full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
temp file under Path.GetTempPath() (replaces in-memory byte[] field);
page implements IDisposable to delete the temp file on reset / new
upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.
N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
bulk method; BundleImporter.PreviewAsync calls it once instead of per-
template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
GetInstanceByIdAsync per record).
Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
_adaptersCache (built lazily on first sweep) + actor-lifetime
_adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.
Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
commit
|
||
|
|
2ed5c6c379 |
fix(concurrency/lifetime): close Theme 5 — 10 concurrency / DI / scope findings
Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:
Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
locking on the captured StringWriter — intra-script Task fan-out can no
longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
(ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
an expected RowVersion and seeds entry.OriginalValues so EF emits
DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
(was a scoped instance shared via AuditService across runs).
DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
+ await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
descriptor check — calling twice no longer double-registers the hosted
service.
Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
cluster-leave task (drain-site-call-audit-singleton) that issues a
bounded GracefulStop(10s) so failover waits for in-flight upserts.
Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
registration with _notificationDeliveryHandlerRegistered + throws
InvalidOperationException on double-register to make the regression loud.
VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (
|
||
|
|
6ae0fea558 |
fix(error-handling): close Theme 4 — 18 cancellation / fire-and-forget findings
Async cancellation hygiene, fire-and-forget observability, retry/shutdown semantics, and audit-row coverage across 9 modules. Highlights: Cancellation & lifecycle: - AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the captured SyncContext that risked sync-over-async deadlock. - AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS, threaded through drain paths instead of CancellationToken.None. - Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls. - Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM during the bounded-retry window aborts cleanly. Cursor / retry / counter correctness: - AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since` when any row's idempotent insert is still being retried (per-EventId retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical abandon). No more silent abandonment of permanently-failing rows. - ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's SPLIT loop — by class-doc construction the catch could only mask real failures and let the next iteration create permanent partition holes. - HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot per-interval counters before sending, restore via new ISiteHealthCollector.AddIntervalCounters on transport failure so counts aren't silently lost. Fire-and-forget / shutdown waits: - InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks via OnlyOnFaulted continuation (Warning log; response unchanged). - SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep with a bounded SweepShutdownWaitTimeout (10s). Leak / refactor: - Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its own try/catch so a throw doesn't leak the relay actor or _activeStreams entry. - Comm-022: VERIFIED already-closed by Comm-016's dead-code purge. - CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync (auth-failure exit-code contract unified). Defensive / validation: - CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config prints a warning and returns defaults instead of crashing the CLI. - Host-022: ParseLevel emits stderr one-shot warning for unrecognised MinimumLevel instead of silently coercing to Information. - ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the per-call CTS is the sole timeout source (was clipped to 100s by .NET). - Security-020: New SecurityOptionsValidator (IValidateOptions) rejects empty LdapServer/LdapSearchBase with ValidateOnStart. - DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/ DeleteTimedOut audit entries (mirrors DeployFailed pattern). Plus reconciled stale per-module Open-findings counters that had drifted from prior sessions. 20+ new regression tests across 11 test projects; build clean; affected suites all green. README regenerated: 75 open (was 93). |
||
|
|
487859bff0 |
docs+code: close Theme 1 — 24 design-doc / XML-doc drift findings
Doc/XML-comment drift + small adherence fixes across 17 modules. Highlights: - Host-017: site CoordinatedShutdown ordering — SiteStreamGrpcServer gains CancelAllStreams() (refuse new streams, cancel active), wired into Program.cs site branch via ApplicationStopping. - InboundAPI-021: ParentExecutionId now travels on RouteToGet/SetAttributes symmetric with RouteToCallRequest; RouteHelper stamps from _parentExecutionId. - ClusterInfra-012: ClusterOptionsValidator now requires both seed nodes. - Comm-018: SiteCommunicationActor.HeartbeatMessage.IsActive derived from cluster leader check (was hardcoded true). - DM-020: reconciliation audit row attributes the current user, not prior deployer. - SEL-019: EventLogPurgeService early-exits on standby via active-node check. - Plus comment/XML-doc accuracy fixes across AuditLog, ConfigurationDatabase, NotificationOutbox, SiteRuntime, SiteCallAudit; doc refreshes for Component- Commons / -ManagementService / -CLI / -ExternalSystemGateway / -HealthMonitoring / -Transport / -ConfigurationDatabase; CD-023 index-name doc alignment. 11 new regression tests (RouteHelper x4, SiteStreamGrpcServer x2, ClusterOptionsValidator x1, SiteCommunicationActor x1, DeploymentService x1, EventLogPurgeService x3). Build clean (0 warnings); InboundAPI/Communication/ Host suites all green. README regenerated: 112 open (was 136). |
||
|
|
f93b7b99bb |
code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.
regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 /
|