Bundle C of Audit Log #23 M3.
Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring
the ScadaLink.AuditLog scaffolding pattern (net10.0, central package
management, InternalsVisibleTo to the tests assembly).
SiteCallAuditActor is the central singleton entry point for Site Call Audit
(#22): it receives UpsertSiteCallCommand and persists the SiteCall via
ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order
or duplicate updates are silent no-ops at the repo). Audit-write failures
NEVER abort the user-facing action (CLAUDE.md): repository throws are
caught + logged, the actor replies Accepted=false, and the singleton stays
alive (Resume supervisor strategy as defence in depth).
Two constructors mirror AuditLogIngestActor:
- IServiceProvider production constructor resolves the scoped EF repository
from a fresh DI scope per message.
- ISiteCallAuditRepository test constructor injects a concrete repository so
the TestKit tests exercise the real monotonic-upsert SQL end to end.
UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same
home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct
them without taking a project reference on the actor's host project.
AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog /
AddNotificationOutbox; Bundle F will populate it with the actor's Props
factory + options bindings.
Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to
ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2):
- Receive_UpsertSiteCallCommand_Persists_Replies_Accepted
- Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency)
- Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive
Reconciliation, KPIs, and the central->site Retry/Discard relay are
deferred per CLAUDE.md scope discipline.
ScadaLink.slnx updated to include both new projects.
All 3 new tests pass against the running infra/mssql container; full suite
(2683 tests across 27 projects) passes with no regressions.
Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls
table introduced in B1+B2. UpsertAsync is insert-if-not-exists then
monotonic-status update so out-of-order telemetry, duplicate gRPC packets,
and reconciliation pulls all converge on the same row without rolling
state backward.
- src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs:
UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync.
- src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs +
SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range)
and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC),
mirrored on M1's AuditLog* equivalents.
- src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:
raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank
compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 —
terminal statuses are mutually exclusive so e.g. Delivered cannot
overwrite Parked). Duplicate-key violations (SQL 2601/2627) are
swallowed at Debug, identical to AuditLogRepository's race-fix.
QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate
string.Compare against the value-converted TrackedOperationId column
inside an expression tree.
- ServiceCollectionExtensions wires the repository (scoped, after
IAuditLogRepository).
- 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/
Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert,
monotonic advance, older-status no-op, same-status no-op,
terminal-over-terminal no-op, 50-way concurrent-insert race produces
exactly one row, Get known/unknown, filter by site, keyset paging no
overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.
Bundle B2 of Audit Log #23 M3: EF-generated migration that creates the
SiteCalls operational-state table on [PRIMARY], with the simple clustered
PK on TrackedOperationId and the two named indexes the entity config
declares.
No partition function / scheme / DB-role restriction — SiteCalls holds
mutable operational state (insert-once + monotonic-status update at the
repo layer), unlike the partitioned append-only AuditLog table from M1.
- Migration: 20260520180431_AddSiteCallsTable.cs (auto-generated;
EF emitted CREATE TABLE + 2 indexes without customisation needed).
- Model snapshot updated alongside.
- Integration test: tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/
AddSiteCallsTableMigrationTests.cs. Uses the existing MsSqlMigrationFixture
with [SkippableFact] + Skip.IfNot(fixture.Available). Asserts table +
twelve columns + PK on TrackedOperationId + both named indexes.
Bundle B1 of Audit Log #23 M3: introduces the SiteCall entity + EF mapping
for the central SiteCalls operational-state table. One row per
TrackedOperationId, mirrored from sites via best-effort telemetry then
periodic reconciliation; eventually-consistent mirror, not a dispatcher.
- src/ScadaLink.Commons/Entities/Audit/SiteCall.cs: append-once record
with required TrackedOperationId/Channel/Target/SourceSite/Status,
monotonic status update at the repo layer.
- src/ScadaLink.ConfigurationDatabase/Configurations/SiteCallEntityTypeConfiguration.cs:
table SiteCalls, PK on TrackedOperationId (stored as varchar(36) via
value conversion through the canonical 'D'-format GUID string —
matches the wire shape used by gRPC + SQLite columns), two named
indexes (IX_SiteCalls_Source_Created, IX_SiteCalls_Status_Updated).
- ScadaLinkDbContext: DbSet<SiteCall> SiteCalls in the existing Audit
section, after AuditLogs.
- Tests in tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/:
table name, PK, value-conversion shape, index presence + ordering.
Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.
- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
IncrementSiteAuditWriteFailures() counter, per-interval reset
semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
registration for the real bridge; called from
SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
HealthMetricsAuditWriteFailureCounter (not NoOp).
Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
Wraps IExternalSystemClient.CallAsync inside ScriptRuntimeContext's
ExternalSystemHelper so every script-initiated ExternalSystem.Call
produces exactly one ApiOutbound/ApiCall AuditEvent via IAuditWriter.
- Captures duration with Stopwatch.GetTimestamp() around the call.
- Builds the audit event with full provenance (SiteId, InstanceId,
SourceScript) and a fresh EventId; ForwardState=Pending.
- Maps Success → AuditStatus.Delivered, Failure (or thrown) → Failed;
parses HTTP {code} out of the ExternalSystemClient's error message
to populate HttpStatus.
- Audit emission is fully best-effort: event-build failures, sync
WriteAsync throws, AND async WriteAsync faults are all logged at
Warning and swallowed so the script's call path is never aborted
by an audit-write failure (alog.md §7).
- Original ExternalCallResult or original exception flows back to the
caller unchanged.
ScriptExecutionActor resolves IAuditWriter from DI and threads it
into ScriptRuntimeContext alongside the existing site identity.
Adds ExternalSystemCallAuditEmissionTests covering: success →
Delivered, HTTP 500 → Failed+httpStatus, HTTP 400 → Failed+httpStatus,
client-thrown network exception → Failed with original exception
re-thrown, audit-writer throw → original result returned, provenance
populated from context, DurationMs recorded.
Refs Audit Log #23 M2 Bundle F.
Wires Bundle E of the M2 site-sync pipeline:
- AddAuditLog extended to register the site writer chain (SqliteAuditWriter
singleton + ISiteAuditQueue forward + RingBufferFallback + FallbackAuditWriter
composing them) and the telemetry collaborators (SiteAuditTelemetryOptions,
SqliteAuditWriterOptions, IAuditWriteFailureCounter NoOp default,
ISiteStreamAuditClient NoOp default).
- AkkaHostedService central role: AuditLogIngestActor as ClusterSingletonManager
(singleton name 'audit-log-ingest') + ClusterSingletonProxy, mirroring the
Notification Outbox pattern. Proxy is offered to SiteStreamGrpcServer if it
resolves (Site path only today; M6 reconciliation will host gRPC on central).
- AkkaHostedService site role: SiteAuditTelemetryActor (per-site, NOT a
singleton because each site is its own cluster), bound to a dedicated
audit-telemetry-dispatcher (ForkJoinDispatcher, 2 dedicated threads).
- Program.cs + SiteServiceRegistration.Configure call AddAuditLog on both roles.
- AuditLogIngestActor gains a second constructor that takes IServiceProvider so
the cluster singleton can create a fresh scope per message — IAuditLogRepository
is a scoped EF Core service and cannot be pre-resolved from the root. The
IAuditLogRepository constructor remains for Bundle D's MSSQL-fixture tests.
NoOp ISiteStreamAuditClient is deliberate: no site→central gRPC channel exists
in M2 (sites talk to central via Akka ClusterClient; gRPC SiteStreamService is
hosted on sites for central→site streaming). M6 reconciliation introduces the
real gRPC site→central client + central-hosted gRPC server. Bundle H's
integration test substitutes a stub client directly via the actor's Props.
Tests:
- tests/ScadaLink.AuditLog.Tests/AddAuditLogTests.cs — 11 tests (was 3): writer
singleton, IAuditWriter as FallbackAuditWriter, ISiteAuditQueue same-instance
as SqliteAuditWriter, options bind round-trip, NoOp default assertions.
- tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs (new) — 13
tests: BuildHocon emits audit-telemetry-dispatcher block with the expected
type/throughput/thread-count; Central composition root resolves the writer
chain + options; Site composition root resolves the writer chain + options +
NoOp client.
Verified: dotnet build clean, 23 test suites green (Host 194 + AuditLog 54).
Adds the IAuditWriter composer that sits between the script-side
ScriptRuntimeContext audit emission (Bundle F) and the primary
SqliteAuditWriter. Honours the alog.md §7 guarantee that audit-write
failures NEVER abort the user-facing action:
- Primary throw -> log Warning, increment IAuditWriteFailureCounter
(Bundle G's health-metric sink), stash the event in the drop-oldest
RingBufferFallback, return success to the caller.
- Primary success -> opportunistically drain the ring back through the
primary in FIFO order, behind the triggering event. Drain is
serialised via a SemaphoreSlim gate so concurrent recoveries don't
double-replay; a drain-side re-throw re-enqueues at the tail and
breaks out (the next successful write retries).
Adds IAuditWriteFailureCounter as the lightweight DI seam (one void
Increment()), and a TryDequeue helper on RingBufferFallback that the
recovery path uses to pop one item without blocking.
Tests (4 new, total 26 -> 30):
- WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess
- WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite
(order: trigger first, then ring backlog in submission FIFO)
- WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty
- WriteAsync_FailureCounter_Incremented_Per_PrimaryFailure
Adds RingBufferFallback — an in-memory drop-oldest ring buffer used by
the upcoming FallbackAuditWriter (Bundle B-T4) when the primary SQLite
writer is throwing. Backed by Channel<AuditEvent> with
BoundedChannelFullMode.DropOldest, fixed capacity (default 1024).
Channel.CreateBounded(DropOldest) does NOT natively signal a drop on
TryWrite, so overflow is detected by comparing Reader.Count before and
after the enqueue: when the buffer is already at capacity and a new
TryWrite succeeds while keeping the count at capacity, exactly one
event was displaced and RingBufferOverflowed is raised (one event per
drop).
Public surface:
- bool TryEnqueue(AuditEvent) — always succeeds unless completed.
- IAsyncEnumerable<AuditEvent> DrainAsync(CancellationToken) — FIFO.
- void Complete() — closes the channel so DrainAsync can finish.
- event Action? RingBufferOverflowed — health counter hook.
Tests (3 new, total 23 -> 26):
- Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflowOnce
- DrainAsync_Yields_FIFO_Then_Completes_When_Empty
- TryEnqueue_AllSucceeds_ReturnsTrue
Replaces the B-T1 stub WriteAsync with the production hot-path:
- Bounded Channel<PendingAuditEvent> (BoundedChannelFullMode.Wait, capacity
from options) feeds a background ProcessWriteQueueAsync loop that drains
up to BatchSize events per transaction.
- The loop INSERTs each event with explicit parameter binding (enums and
DateTime stored as text); duplicate EventIds (SqliteException with
ErrorCode 19 SQLITE_CONSTRAINT) are swallowed as first-write-wins per
alog.md §11, and the pending TCS is still completed successfully so
callers see idempotent semantics.
- Site rows force ForwardState = Pending on enqueue when the inbound
event leaves it null — site-side default per the M2 design.
- ReadPendingAsync(limit) returns oldest-first pending rows for the
Bundle D telemetry actor; EventId is the deterministic tiebreaker on
identical OccurredAtUtc timestamps. MarkForwardedAsync(ids) flips a
batch to Forwarded in one UPDATE with a parameterised IN list.
- IAsyncDisposable graceful shutdown: TryComplete the writer, await the
drain (5s budget), then dispose the connection.
Tests (7 new, total 16 -> 23):
- WriteAsync_FreshEvent_PersistsWithForwardStatePending
- WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions
- WriteAsync_DuplicateEventId_FirstWriteWins_NoException
- WriteAsync_ForcesForwardStatePending_IfNull
- ReadPendingAsync_Returns_OldestFirst_LimitedToN
- MarkForwardedAsync_FlipsRowsToForwarded
- MarkForwardedAsync_NonExistentId_NoThrow
Two concurrent sessions can both pass the IF NOT EXISTS check and then both attempt the INSERT against UX_AuditLog_EventId; the loser surfaced as SqlException 2601 (or 2627 for PK violations) and aborted the audit write. First-write-wins idempotency is the documented contract, so the race outcome is semantically a no-op — catch the two duplicate-key error numbers and log at Debug, let any other SqlException bubble.
Tests:
- InsertIfNotExistsAsync_ConcurrentDuplicateInserts_ProduceExactlyOneRow: 50 parallel inserters with the same EventId end with exactly one row and no escaped exceptions.
- QueryAsync_Keyset_SameOccurredAtUtc_TiebreaksOnEventId: four rows sharing the same OccurredAtUtc page deterministically through the (OccurredAtUtc, EventId) keyset cursor — exercises the e.OccurredAtUtc == after && e.EventId.CompareTo(afterId) < 0 branch and verifies EF Core 10's Guid.CompareTo translation against SQL Server uniqueidentifier order (deferred Bundle D reviewer recommendation).
AuditLogRepository now takes an optional ILogger<AuditLogRepository> (NullLogger default, mirrors InboundApiRepository); DI registration unchanged.
EF Core implementation of IAuditLogRepository:
- InsertIfNotExistsAsync: single IF NOT EXISTS ... INSERT via
ExecuteSqlInterpolatedAsync, bypasses the change tracker. Enum
values converted to string in C# (columns are varchar(32) via
HasConversion<string>).
- QueryAsync: AsNoTracking, predicate-per-non-null-filter, keyset
paging on (OccurredAtUtc DESC, EventId DESC) — EF Core 10
translates Guid.CompareTo to a uniqueidentifier < comparison
natively (verified against MSSQL 2022).
- SwitchOutPartitionAsync: throws NotSupportedException naming M6;
the non-aligned UX_AuditLog_EventId unique index blocks
ALTER TABLE SWITCH PARTITION until the drop-and-rebuild dance
ships with the purge actor.
DI: AddScoped<IAuditLogRepository, AuditLogRepository>() added after
the NotificationOutboxRepository registration; existing DI smoke test
extended with an IAuditLogRepository assertion.
Integration tests (8 new) use the Bundle C MsSqlMigrationFixture and
scope by a per-test SourceSiteId guid so they neither collide nor
require cleanup.
Bundle D of the Audit Log #23 M1 Foundation plan.
Reviewer of Bundle C (#23 M1) flagged two blockers in the
AddAuditLogTableMigration integration tests:
1. Tests used 'if (!await EnsureMigrationApplied()) return;' which made
the xunit runner report them as Passed when the dev MSSQL container
was absent — a CI false-positive risk. xunit 2.9.x does NOT ship the
v3 Assert.Skip/SkipUnless/SkipWhen API surface (verified empirically
against xunit.assert 2.9.3 — only v3.x exposes those static methods),
so the canonical xunit-v2 equivalent is the Xunit.SkippableFact
package. Replaced [Fact] with [SkippableFact] and the early-return
pattern with 'Skip.IfNot(_fixture.Available, _fixture.SkipReason)' as
the first statement of each of the 8 audit-log test methods. The
runner now reports them as Skipped (not Passed) when MSSQL is down.
2. MsSqlMigrationFixture relied on SqlClient's 30s default connect
timeout, so a no-container fixture construction hung ~30s. Added
'Connect Timeout=3' to DefaultAdminConnectionString. Verified
fail-fast under ~4s end-to-end with a bad host via env-var override.
Additional fixture cleanups:
- Migration is now applied once in the fixture constructor (was per-test
via EnsureMigrationApplied for idempotency). Tests reach a fully-
migrated database with no extra setup. Removed the now-unused
EnsureMigrationApplied helper from the test class.
- Constructor narrowed its catch to SqlException + InvalidOperationException
for the OpenAsync step (the only legitimate connect-failure surfaces);
everything else (CREATE DATABASE, MigrateAsync) is treated as a hard
fixture failure and bubbles up. Added a best-effort
TryDropOrphanDatabase() pre-throw cleanup so partial construction
cannot leak guid-suffixed databases.
- Stale doc comments referencing the (non-existent) xunit 2.9.x Skip
shim removed; replaced with accurate notes about Xunit.SkippableFact.
Verified:
- dotnet build ScadaLink.slnx: clean (0 warnings, 0 errors).
- dotnet test ScadaLink.ConfigurationDatabase.Tests with MSSQL up:
Passed 150 / Skipped 0 / Failed 0.
- Same suite with SCADALINK_MSSQL_TEST_CONN pointed at a closed port:
the 8 AddAuditLogTableMigration tests report as Skipped (visible
'[SKIP]' lines in runner output), total elapsed ~3s.
Files touched:
- Directory.Packages.props: added Xunit.SkippableFact 1.5.61.
- tests/ScadaLink.ConfigurationDatabase.Tests/ScadaLink.ConfigurationDatabase.Tests.csproj:
added the SkippableFact PackageReference.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs:
Connect Timeout=3, constructor refactor, doc-comment fixes.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs:
[SkippableFact] + Skip.IfNot pattern across all 8 tests.
Untouched (per reviewer guidance):
- Migration file (Bundle C main artifact unchanged).
- Bundle B reconciliation (composite PK + UX_AuditLog_EventId).
- SqlClient VersionOverride 6.1.1 in the test csproj.
- infra/* (separate uncommitted local edits remain in working tree).
Bundle C of the #23 M1 foundation. Creates the centralized AuditLog table
with the partition function, partition scheme, partition-aligned
non-clustered indexes, and the two access-control roles documented in
alog.md §4.
Schema:
- pf_AuditLog_Month: RANGE RIGHT, 24 monthly boundaries (Jan 2026 – Dec 2027).
- ps_AuditLog_Month: ALL TO ([PRIMARY]) — dev/test parity.
- dbo.AuditLog: created via raw SQL ON ps_AuditLog_Month(OccurredAtUtc).
Composite clustered PK {EventId, OccurredAtUtc} (partition column must be
part of the clustered key). 22 columns matching the EF AuditEvent model.
- 5 reconciliation/query non-clustered indexes from alog.md §4
(Channel_Status_Occurred, CorrelationId filtered, OccurredAtUtc,
Site_Occurred, Target_Occurred filtered) — all partition-aligned.
- UX_AuditLog_EventId: non-aligned UNIQUE on EventId alone (preserves
InsertIfNotExistsAsync idempotency from M1-T8). Non-aligned because
partition-aligned unique indexes require the partition column in the key,
which would weaken to composite uniqueness; the purge story (M2/M3)
rebuilds this index around partition switches.
Access control:
- scadalink_audit_writer: GRANT INSERT + GRANT SELECT, DENY UPDATE + DENY DELETE
on AuditLog. The explicit DENY guarantees later db_datawriter membership
cannot quietly re-enable mutation.
- scadalink_audit_purger: GRANT SELECT on AuditLog, GRANT ALTER on SCHEMA::dbo
(enables ALTER PARTITION FUNCTION SWITCH and SWITCH PARTITION).
Both role definitions are idempotent (IF DATABASE_PRINCIPAL_ID IS NULL).
Down() drops in reverse dependency order with IF EXISTS guards.
Integration tests (tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/):
- MsSqlMigrationFixture: connects to the running infra/mssql container (or
the SCADALINK_MSSQL_TEST_CONN override), creates a unique per-fixture
database, applies the migrations, drops the DB on dispose. Marks itself
Available=false when MSSQL is unreachable so tests early-return cleanly
on CI without the dev container.
- AddAuditLogTableMigrationTests: 8 tests covering table existence,
partition function/scheme, partition-aligned PK, the 5 named indexes,
both roles' grants, and a smoke test that a writer-role user receives
SqlException with "permission" on UPDATE AuditLog.
ConfigurationDatabase tests: 142 passing -> 150 passing (8 new integration
tests). Full solution builds clean.
Package: tests project locally overrides Microsoft.Data.SqlClient to 6.1.1
(EF SqlServer 10.0.7 needs >= 6.1.1; central package version is pinned at
6.0.2 for the production ExternalSystemGateway).
Bundle C migration aligns AuditLog to ps_AuditLog_Month(OccurredAtUtc).
A partitioned table's clustered key must include the partition column, so
the PK becomes the composite {EventId, OccurredAtUtc} — a divergence from
Bundle B's single-column PK that needs reconciling in the EF mapping
before the migration is generated.
EventId remains the idempotency key for AuditLogRepository.InsertIfNotExistsAsync
(M1-T8), so a dedicated unique index UX_AuditLog_EventId preserves the
single-column uniqueness constraint.
Tests updated:
- Configure_MapsToAuditLogTable_WithCompositePrimaryKey (replaces the
WithEventIdAsPrimaryKey assertion) verifies {EventId, OccurredAtUtc}.
- Configure_DeclaresUniqueIndex_OnEventIdAlone_ForIdempotencyLookups
asserts the new UX_AuditLog_EventId is unique and on EventId alone.
- Configure_ExpectedIndexes_WithCorrectNames now expects six index names
(the original five plus UX_AuditLog_EventId).
FU3: thread the executing script identifier from the script-execution
context down to the Notify outbox API so NotifyTarget.Send stamps
NotificationSubmit.SourceScript instead of leaving it null.
- ScriptRuntimeContext / NotifyHelper / NotifyTarget take an optional
sourceScript value, carried through to NotificationSubmit.SourceScript.
- ScriptExecutionActor supplies "ScriptActor:<scriptName>", matching the
Site Event Logging "Source" convention used for script error events.
- AlarmExecutionActor builds the context without the S&F engine, so its
Notify API is inert; sourceScript defaults to null there.