scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	fdd1a4b886	refactor(auditlog): consolidate AuditEvent DTO mappers into Communication	2026-05-21 03:51:51 -04:00
Joseph Doherty	de5280d1c7	feat(auditlog): real ClusterClient-based site audit push client	2026-05-21 03:39:17 -04:00
Joseph Doherty	e93f655ce4	feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6)	2026-05-20 19:02:01 -04:00
Joseph Doherty	640fd07454	feat(comms): site-side PullAuditEvents handler (#23 M6)	2026-05-20 17:58:43 -04:00
Joseph Doherty	23c0fd417e	feat(health): AuditRedactionFailure counter + bridge (#23 M5) Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor over-redactions as a Site Health metric so a misconfigured / catastrophic regex shows up on /monitoring/health rather than disappearing into a NoOp sink. - SiteHealthReport: new 'AuditRedactionFailure' int field (defaulted to 0 for back-compat with existing producers/tests). - ISiteHealthCollector / SiteHealthCollector: new IncrementAuditRedactionFailure() — per-interval atomic counter with Interlocked, reset on CollectReport, mirroring the M2 Bundle G SiteAuditWriteFailures pattern. - HealthMetricsAuditRedactionFailureCounter: new bridge in ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter increments to ISiteHealthCollector — mirrors HealthMetricsAuditWriteFailureCounter one-for-one. - AddAuditLogHealthMetricsBridge: now ALSO Replaces the NoOpAuditRedactionFailureCounter binding with the health-metrics bridge, so a single AddAuditLogHealthMetricsBridge() call wires both the M2 Bundle G write-failure counter and the M5 Bundle C redaction-failure counter into the health report. Site-side only for M5 — the filter also runs on CentralAuditWriter and AuditLogIngestActor (where it just keeps the NoOp default), but a central-side health-metric surface for AuditRedactionFailure is deferred to M6 alongside the rest of the central health collector work. Tests: - AuditRedactionFailureMetricTests (HealthMonitoring) covers the SiteHealthCollector increment/report/reset shape (3 tests). - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers the AuditLog → HealthMonitoring bridge (3 tests). - Existing CountCapturingHealthCollector stub in DeploymentManagerRedeployTests extended with the new no-op interface method. Verified: dotnet build clean, all 24 test projects green (the only Failed at first ScadaLink.SiteRuntime.Tests run was the known-flaky InstanceActorChildAttributeRaceTests; passes on re-run in isolation and full suite, unrelated to these changes).	2026-05-20 17:28:33 -04:00
Joseph Doherty	047988e4c8	feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3) Wire the M3 cached-call audit pipeline end-to-end for the database channel and close the loop between the S&F lifecycle observer and the site-side dual emitter. * DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit packet on DbOutbound, threads the id into IDatabaseGateway, and is best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall coverage from E3. * CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver and lives alongside CachedCallTelemetryForwarder. The bridge ingests per-attempt notifications from the S&F retry loop and fans them out to the forwarder: - TransientFailure -> 1 Attempted row - Delivered -> Attempted + CachedResolve(Delivered) - PermanentFailure -> Attempted + CachedResolve(Parked) - ParkedMaxRetries -> Attempted + CachedResolve(Parked) Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached, DbOutbound->DbWriteCached). Best-effort top-level catch swallows any unexpected throw so the S&F retry bookkeeping is never disturbed. * Bridge tests (7) cover all four outcomes, channel mapping, provenance propagation, and the no-throw-on-forwarder-failure contract. Bundle F (Host registration) will instantiate the bridge and inject it into StoreAndForwardService.cachedCallObserver, closing the wiring path end-to-end. Bundle E task E6.	2026-05-20 14:55:17 -04:00
Joseph Doherty	2145b29d4d	feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3) Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry packet and fans it out to the two site-local stores. * AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter + SqliteAuditWriter chain — same site SQLite hot-path as sync calls). * SiteCallOperational half maps Audit.Kind to the matching IOperationTrackingStore method: - CachedSubmit -> RecordEnqueueAsync (insert-if-not-exists) - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic) - CachedResolve -> RecordTerminalAsync (first-write-wins) Best-effort contract (alog.md §7): independent try/catch per half so a thrown writer cannot starve the tracking row (and vice-versa); both failures are logged at warning level and swallowed — the calling script never sees them. Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays in effect; the forwarder writes only to the local stores in M3. The existing SiteAuditTelemetryActor drain loop will sweep the audit rows once a real gRPC client lands. Bundle E task E2.	2026-05-20 14:41:15 -04:00
Joseph Doherty	73719ee066	feat(auditlog): extend ISiteStreamAuditClient with IngestCachedTelemetryAsync (#23 M3) Add the second site→central RPC seam alongside the existing M2 IngestAuditEventsAsync. The Bundle D proto already lit up IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit just plumbs the client-side abstraction: * ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct). * NoOpSiteStreamAuditClient implements it returning an empty IngestAck (same shape as M2 — production gRPC client lands in M6). * SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub throws NotSupportedException from the new method so a regression that accidentally routes a cached packet through the sync stub fails loudly. * New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack contract for both batch shapes. Bundle E task E1.	2026-05-20 14:39:24 -04:00
Joseph Doherty	dd3351da93	feat(health): SiteAuditWriteFailures counter + AuditLog bridge (#23 ) Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary- failure counter into the Site Health Monitoring report payload so a sustained audit-write outage surfaces on /monitoring/health instead of disappearing into a NoOp sink. - SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive). - ISiteHealthCollector + SiteHealthCollector: new IncrementSiteAuditWriteFailures() counter, per-interval reset semantics matching ScriptErrorCount / DeadLetterCount. - HealthMetricsAuditWriteFailureCounter: adapter forwarding IAuditWriteFailureCounter.Increment() to the collector. - AddAuditLogHealthMetricsBridge(): swaps the NoOp default registration for the real bridge; called from SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog. - Existing host-wiring test updated: site composition now resolves HealthMetricsAuditWriteFailureCounter (not NoOp). Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new), full solution green.	2026-05-20 13:22:25 -04:00
Joseph Doherty	b679430d13	feat(auditlog): SiteAuditTelemetryActor + ISiteStreamAuditClient seam (#23 )	2026-05-20 12:40:49 -04:00
Joseph Doherty	ff8766ec8b	feat(auditlog): FallbackAuditWriter compose SQLite + ring + failure counter (#23 ) Adds the IAuditWriter composer that sits between the script-side ScriptRuntimeContext audit emission (Bundle F) and the primary SqliteAuditWriter. Honours the alog.md §7 guarantee that audit-write failures NEVER abort the user-facing action: - Primary throw -> log Warning, increment IAuditWriteFailureCounter (Bundle G's health-metric sink), stash the event in the drop-oldest RingBufferFallback, return success to the caller. - Primary success -> opportunistically drain the ring back through the primary in FIFO order, behind the triggering event. Drain is serialised via a SemaphoreSlim gate so concurrent recoveries don't double-replay; a drain-side re-throw re-enqueues at the tail and breaks out (the next successful write retries). Adds IAuditWriteFailureCounter as the lightweight DI seam (one void Increment()), and a TryDequeue helper on RingBufferFallback that the recovery path uses to pop one item without blocking. Tests (4 new, total 26 -> 30): - WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess - WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite (order: trigger first, then ring backlog in submission FIFO) - WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty - WriteAsync_FailureCounter_Incremented_Per_PrimaryFailure	2026-05-20 12:23:50 -04:00
Joseph Doherty	55fbcce7a8	feat(auditlog): RingBufferFallback with drop-oldest overflow (#23 ) Adds RingBufferFallback — an in-memory drop-oldest ring buffer used by the upcoming FallbackAuditWriter (Bundle B-T4) when the primary SQLite writer is throwing. Backed by Channel<AuditEvent> with BoundedChannelFullMode.DropOldest, fixed capacity (default 1024). Channel.CreateBounded(DropOldest) does NOT natively signal a drop on TryWrite, so overflow is detected by comparing Reader.Count before and after the enqueue: when the buffer is already at capacity and a new TryWrite succeeds while keeping the count at capacity, exactly one event was displaced and RingBufferOverflowed is raised (one event per drop). Public surface: - bool TryEnqueue(AuditEvent) — always succeeds unless completed. - IAsyncEnumerable<AuditEvent> DrainAsync(CancellationToken) — FIFO. - void Complete() — closes the channel so DrainAsync can finish. - event Action? RingBufferOverflowed — health counter hook. Tests (3 new, total 23 -> 26): - Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflowOnce - DrainAsync_Yields_FIFO_Then_Completes_When_Empty - TryEnqueue_AllSucceeds_ReturnsTrue	2026-05-20 12:20:55 -04:00
Joseph Doherty	01480c6ea2	feat(auditlog): SqliteAuditWriter Channel-based hot-path + ReadPendingAsync/MarkForwardedAsync (#23 ) Replaces the B-T1 stub WriteAsync with the production hot-path: - Bounded Channel<PendingAuditEvent> (BoundedChannelFullMode.Wait, capacity from options) feeds a background ProcessWriteQueueAsync loop that drains up to BatchSize events per transaction. - The loop INSERTs each event with explicit parameter binding (enums and DateTime stored as text); duplicate EventIds (SqliteException with ErrorCode 19 SQLITE_CONSTRAINT) are swallowed as first-write-wins per alog.md §11, and the pending TCS is still completed successfully so callers see idempotent semantics. - Site rows force ForwardState = Pending on enqueue when the inbound event leaves it null — site-side default per the M2 design. - ReadPendingAsync(limit) returns oldest-first pending rows for the Bundle D telemetry actor; EventId is the deterministic tiebreaker on identical OccurredAtUtc timestamps. MarkForwardedAsync(ids) flips a batch to Forwarded in one UPDATE with a parameterised IN list. - IAsyncDisposable graceful shutdown: TryComplete the writer, await the drain (5s budget), then dispose the connection. Tests (7 new, total 16 -> 23): - WriteAsync_FreshEvent_PersistsWithForwardStatePending - WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions - WriteAsync_DuplicateEventId_FirstWriteWins_NoException - WriteAsync_ForcesForwardStatePending_IfNull - ReadPendingAsync_Returns_OldestFirst_LimitedToN - MarkForwardedAsync_FlipsRowsToForwarded - MarkForwardedAsync_NonExistentId_NoThrow	2026-05-20 12:20:02 -04:00
Joseph Doherty	7173a79ad7	feat(auditlog): SqliteAuditWriter schema bootstrap (#23 ) Adds the site-side SqliteAuditWriter skeleton with schema bootstrap — 20-column AuditLog table + IX_SiteAuditLog_ForwardState_Occurred index + PRAGMA auto_vacuum = INCREMENTAL — and the SqliteAuditWriterOptions companion type. Mirrors the SiteEventLogger pattern: single owned SqliteConnection serialised behind a write lock; the Channel-based hot-path lands in Bundle B-T2. Adds Microsoft.Data.Sqlite + Microsoft.Extensions.Logging.Abstractions project refs to ScadaLink.AuditLog; adds Microsoft.Data.Sqlite + Microsoft.Extensions.Logging.Abstractions + NSubstitute test refs. Tests (3 new, total 13 -> 16): - Opens_Creates_AuditLog_Table_With_20Columns_And_PK_On_EventId - Opens_Creates_IX_ForwardState_Occurred_Index - PRAGMA_auto_vacuum_Is_INCREMENTAL	2026-05-20 12:17:02 -04:00

14 Commits