M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:
- Central: register SiteCallAuditActor as a cluster singleton + proxy
(mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
in AddAuditLog (lazy factory — Central nodes degrade to audit-only
emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
site id to StoreAndForwardService without creating a
StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
scope and threads it into ScriptRuntimeContext.
CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.
Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
other suite unchanged.
Wire the M3 cached-call audit pipeline end-to-end for the database
channel and close the loop between the S&F lifecycle observer and the
site-side dual emitter.
* DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up
in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit
packet on DbOutbound, threads the id into IDatabaseGateway, and is
best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall
coverage from E3.
* CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver
and lives alongside CachedCallTelemetryForwarder. The bridge ingests
per-attempt notifications from the S&F retry loop and fans them out
to the forwarder:
- TransientFailure -> 1 Attempted row
- Delivered -> Attempted + CachedResolve(Delivered)
- PermanentFailure -> Attempted + CachedResolve(Parked)
- ParkedMaxRetries -> Attempted + CachedResolve(Parked)
Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached,
DbOutbound->DbWriteCached). Best-effort top-level catch swallows any
unexpected throw so the S&F retry bookkeeping is never disturbed.
* Bridge tests (7) cover all four outcomes, channel mapping, provenance
propagation, and the no-throw-on-forwarder-failure contract.
Bundle F (Host registration) will instantiate the bridge and inject it
into StoreAndForwardService.cachedCallObserver, closing the wiring path
end-to-end.
Bundle E task E6.
Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).
New seam:
* ICachedCallLifecycleObserver + CachedCallAttemptContext in
Commons.Interfaces.Services. Outcome enum
(Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
will map it to the AuditKind/AuditStatus pair when building the
CachedCallTelemetry packet.
* StoreAndForwardService gains an optional cachedCallObserver
constructor parameter + siteId. RetryMessageAsync fires the observer
exactly once per attempt with the appropriate outcome:
- handler returns true -> Delivered
- handler returns false -> PermanentFailure (and parks)
- handler throws + retries remaining -> TransientFailure
- handler throws + max retries hit -> ParkedMaxRetries (and parks)
Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).
Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).
Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).
Bundle E tasks E4 + E5.
Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3
combined-telemetry model:
* Mints a fresh TrackedOperationId and emits one CachedSubmit packet
via ICachedCallTelemetryForwarder BEFORE handing the call off — the
SiteCalls row is materialised before the first delivery attempt so
Tracking.Status(id) can observe a Submitted row even if immediate
delivery resolves before the helper returns.
* Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync
as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync
for the Database mirror set up here for E6). The gateway uses the id
as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can
recover it from StoreAndForwardMessage.Id.
* Returns the TrackedOperationId rather than ExternalCallResult — the
script's contract is now "get a tracking handle, observe outcome via
Tracking.Status". Best-effort emission: a thrown forwarder is logged
+ swallowed; the original call still runs and the id is still returned.
DatabaseHelper gets the matching siteId / sourceScript / forwarder
fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task
E6's Database.CachedWrite mirror plugs in without further runtime
wiring.
New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services
so SiteRuntime depends on Commons (existing arrow) rather than
ScadaLink.AuditLog (would have introduced a new dependency).
Bundle E task E3 (and helper-shape work for E6).
Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry
packet and fans it out to the two site-local stores.
* AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter
+ SqliteAuditWriter chain — same site SQLite hot-path as sync calls).
* SiteCallOperational half maps Audit.Kind to the matching
IOperationTrackingStore method:
- CachedSubmit -> RecordEnqueueAsync (insert-if-not-exists)
- ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic)
- CachedResolve -> RecordTerminalAsync (first-write-wins)
Best-effort contract (alog.md §7): independent try/catch per half so a
thrown writer cannot starve the tracking row (and vice-versa); both
failures are logged at warning level and swallowed — the calling script
never sees them.
Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays
in effect; the forwarder writes only to the local stores in M3. The
existing SiteAuditTelemetryActor drain loop will sweep the audit rows
once a real gRPC client lands.
Bundle E task E2.
Add the second site→central RPC seam alongside the existing M2
IngestAuditEventsAsync. The Bundle D proto already lit up
IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit
just plumbs the client-side abstraction:
* ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct).
* NoOpSiteStreamAuditClient implements it returning an empty IngestAck
(same shape as M2 — production gRPC client lands in M6).
* SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub
throws NotSupportedException from the new method so a regression that
accidentally routes a cached packet through the sync stub fails loudly.
* New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack
contract for both batch shapes.
Bundle E task E1.
Bundle C of Audit Log #23 M3.
Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring
the ScadaLink.AuditLog scaffolding pattern (net10.0, central package
management, InternalsVisibleTo to the tests assembly).
SiteCallAuditActor is the central singleton entry point for Site Call Audit
(#22): it receives UpsertSiteCallCommand and persists the SiteCall via
ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order
or duplicate updates are silent no-ops at the repo). Audit-write failures
NEVER abort the user-facing action (CLAUDE.md): repository throws are
caught + logged, the actor replies Accepted=false, and the singleton stays
alive (Resume supervisor strategy as defence in depth).
Two constructors mirror AuditLogIngestActor:
- IServiceProvider production constructor resolves the scoped EF repository
from a fresh DI scope per message.
- ISiteCallAuditRepository test constructor injects a concrete repository so
the TestKit tests exercise the real monotonic-upsert SQL end to end.
UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same
home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct
them without taking a project reference on the actor's host project.
AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog /
AddNotificationOutbox; Bundle F will populate it with the actor's Props
factory + options bindings.
Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to
ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2):
- Receive_UpsertSiteCallCommand_Persists_Replies_Accepted
- Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency)
- Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive
Reconciliation, KPIs, and the central->site Retry/Discard relay are
deferred per CLAUDE.md scope discipline.
ScadaLink.slnx updated to include both new projects.
All 3 new tests pass against the running infra/mssql container; full suite
(2683 tests across 27 projects) passes with no regressions.
Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls
table introduced in B1+B2. UpsertAsync is insert-if-not-exists then
monotonic-status update so out-of-order telemetry, duplicate gRPC packets,
and reconciliation pulls all converge on the same row without rolling
state backward.
- src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs:
UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync.
- src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs +
SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range)
and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC),
mirrored on M1's AuditLog* equivalents.
- src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:
raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank
compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 —
terminal statuses are mutually exclusive so e.g. Delivered cannot
overwrite Parked). Duplicate-key violations (SQL 2601/2627) are
swallowed at Debug, identical to AuditLogRepository's race-fix.
QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate
string.Compare against the value-converted TrackedOperationId column
inside an expression tree.
- ServiceCollectionExtensions wires the repository (scoped, after
IAuditLogRepository).
- 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/
Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert,
monotonic advance, older-status no-op, same-status no-op,
terminal-over-terminal no-op, 50-way concurrent-insert race produces
exactly one row, Get known/unknown, filter by site, keyset paging no
overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.
Bundle B2 of Audit Log #23 M3: EF-generated migration that creates the
SiteCalls operational-state table on [PRIMARY], with the simple clustered
PK on TrackedOperationId and the two named indexes the entity config
declares.
No partition function / scheme / DB-role restriction — SiteCalls holds
mutable operational state (insert-once + monotonic-status update at the
repo layer), unlike the partitioned append-only AuditLog table from M1.
- Migration: 20260520180431_AddSiteCallsTable.cs (auto-generated;
EF emitted CREATE TABLE + 2 indexes without customisation needed).
- Model snapshot updated alongside.
- Integration test: tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/
AddSiteCallsTableMigrationTests.cs. Uses the existing MsSqlMigrationFixture
with [SkippableFact] + Skip.IfNot(fixture.Available). Asserts table +
twelve columns + PK on TrackedOperationId + both named indexes.
Bundle B1 of Audit Log #23 M3: introduces the SiteCall entity + EF mapping
for the central SiteCalls operational-state table. One row per
TrackedOperationId, mirrored from sites via best-effort telemetry then
periodic reconciliation; eventually-consistent mirror, not a dispatcher.
- src/ScadaLink.Commons/Entities/Audit/SiteCall.cs: append-once record
with required TrackedOperationId/Channel/Target/SourceSite/Status,
monotonic status update at the repo layer.
- src/ScadaLink.ConfigurationDatabase/Configurations/SiteCallEntityTypeConfiguration.cs:
table SiteCalls, PK on TrackedOperationId (stored as varchar(36) via
value conversion through the canonical 'D'-format GUID string —
matches the wire shape used by gRPC + SQLite columns), two named
indexes (IX_SiteCalls_Source_Created, IX_SiteCalls_Status_Updated).
- ScadaLinkDbContext: DbSet<SiteCall> SiteCalls in the existing Audit
section, after AuditLogs.
- Tests in tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/:
table name, PK, value-conversion shape, index presence + ordering.
M3 head now records M2 realities:
- enum vocabulary (M1-aligned) drives CachedSubmit/ApiCallCached/etc.
- NoOpSiteStreamAuditClient stays until M6; M3 e2e tests reuse Bundle H's
DirectActorSiteStreamAuditClient (extract to Integration/Infrastructure/).
- Mapper duplication note (gRPC handler inlines DTO->entity decoding;
consider moving AuditEventMapper to Commons in M3).
- AuditIngestAskTimeout=30s hardcoded; M3 may expose via options.
- CachedCallTelemetry message MUST be created from scratch (additive
per Commons REQ-COM-5a; never renamed CachedOperationTelemetry).
- Central dual-write AuditLog + SiteCalls in one tx; reuse Bundle A
duplicate-key swallow pattern for CachedCallId.
M2 ships the first end-to-end audit emission. A script-initiated
ExternalSystem.Call() produces one ApiOutbound/ApiCall row in the central
AuditLog table via site SQLite hot-path + gRPC telemetry push + central
ingest actor. Audit-write failures NEVER abort the script.
Shipped (13 commits):
- Race-fix + tiebreaker: InsertIfNotExistsAsync swallows duplicate-key
races (SqlException 2601/2627); same-OccurredAt keyset test added.
- Site SQLite writer: SqliteAuditWriter (Channel<T> + background batch
inserter, sub-ms enqueue) + RingBufferFallback (1024-cap drop-oldest)
+ FallbackAuditWriter composing primary+ring+failure counter.
- gRPC layer: IngestAuditEvents unary RPC + AuditEventDto on
sitestream.proto; AuditEventMapper for AuditEvent <-> Dto round-trip
(ForwardState site-only, IngestedAtUtc central-only).
- Actors: SiteAuditTelemetryActor (per-site, dedicated dispatcher,
drain loop with 5s busy / 30s idle cadence); AuditLogIngestActor
(central singleton, scope-per-message via IServiceProvider ctor for
scoped repository, idempotent acks).
- Host wiring: cluster singleton + proxy on central, per-site actor
bound to audit-telemetry-dispatcher (ForkJoin, 2 threads). NoOp
ISiteStreamAuditClient registered as production default; real
site->central gRPC client deferred to M6 (orthogonal to M3).
- ESG emission: ScriptRuntimeContext.ExternalSystem.Call wraps
ExternalSystemClient.CallAsync; emits one ApiOutbound/ApiCall row per
call with provenance from context (SourceSiteId/Instance/Script).
Three nested fail-safe layers ensure audit failure never aborts script.
- Health metric: SiteAuditWriteFailures counter + Interlocked.Increment,
exposed in SiteHealthReport; HealthMetricsAuditWriteFailureCounter
bridge swaps the NoOp default when both AddHealthMonitoring + AddAuditLog
are registered.
- E2E: component-level test using TestKit + MsSqlMigrationFixture +
DirectActorSiteStreamAuditClient stub. Verifies push, retry, and
duplicate-collapse in <15s.
Tests: full solution dotnet test ScadaLink.slnx green (one isolated
SiteRuntime sandbox-timeout flake is pre-existing and not M2-related).
~80 net new tests across Commons.Tests / ConfigDb.Tests / Communication.Tests /
HealthMonitoring.Tests / AuditLog.Tests / SiteRuntime.Tests / Host.Tests.
Strict invariants honored: infra/* never touched on any branch commit;
no push to origin; explicit git add throughout; alog.md unchanged
(vocabulary correction from M1 stands).
Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.
- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
IncrementSiteAuditWriteFailures() counter, per-interval reset
semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
registration for the real bridge; called from
SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
HealthMetricsAuditWriteFailureCounter (not NoOp).
Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
Wraps IExternalSystemClient.CallAsync inside ScriptRuntimeContext's
ExternalSystemHelper so every script-initiated ExternalSystem.Call
produces exactly one ApiOutbound/ApiCall AuditEvent via IAuditWriter.
- Captures duration with Stopwatch.GetTimestamp() around the call.
- Builds the audit event with full provenance (SiteId, InstanceId,
SourceScript) and a fresh EventId; ForwardState=Pending.
- Maps Success → AuditStatus.Delivered, Failure (or thrown) → Failed;
parses HTTP {code} out of the ExternalSystemClient's error message
to populate HttpStatus.
- Audit emission is fully best-effort: event-build failures, sync
WriteAsync throws, AND async WriteAsync faults are all logged at
Warning and swallowed so the script's call path is never aborted
by an audit-write failure (alog.md §7).
- Original ExternalCallResult or original exception flows back to the
caller unchanged.
ScriptExecutionActor resolves IAuditWriter from DI and threads it
into ScriptRuntimeContext alongside the existing site identity.
Adds ExternalSystemCallAuditEmissionTests covering: success →
Delivered, HTTP 500 → Failed+httpStatus, HTTP 400 → Failed+httpStatus,
client-thrown network exception → Failed with original exception
re-thrown, audit-writer throw → original result returned, provenance
populated from context, DurationMs recorded.
Refs Audit Log #23 M2 Bundle F.
Wires Bundle E of the M2 site-sync pipeline:
- AddAuditLog extended to register the site writer chain (SqliteAuditWriter
singleton + ISiteAuditQueue forward + RingBufferFallback + FallbackAuditWriter
composing them) and the telemetry collaborators (SiteAuditTelemetryOptions,
SqliteAuditWriterOptions, IAuditWriteFailureCounter NoOp default,
ISiteStreamAuditClient NoOp default).
- AkkaHostedService central role: AuditLogIngestActor as ClusterSingletonManager
(singleton name 'audit-log-ingest') + ClusterSingletonProxy, mirroring the
Notification Outbox pattern. Proxy is offered to SiteStreamGrpcServer if it
resolves (Site path only today; M6 reconciliation will host gRPC on central).
- AkkaHostedService site role: SiteAuditTelemetryActor (per-site, NOT a
singleton because each site is its own cluster), bound to a dedicated
audit-telemetry-dispatcher (ForkJoinDispatcher, 2 dedicated threads).
- Program.cs + SiteServiceRegistration.Configure call AddAuditLog on both roles.
- AuditLogIngestActor gains a second constructor that takes IServiceProvider so
the cluster singleton can create a fresh scope per message — IAuditLogRepository
is a scoped EF Core service and cannot be pre-resolved from the root. The
IAuditLogRepository constructor remains for Bundle D's MSSQL-fixture tests.
NoOp ISiteStreamAuditClient is deliberate: no site→central gRPC channel exists
in M2 (sites talk to central via Akka ClusterClient; gRPC SiteStreamService is
hosted on sites for central→site streaming). M6 reconciliation introduces the
real gRPC site→central client + central-hosted gRPC server. Bundle H's
integration test substitutes a stub client directly via the actor's Props.
Tests:
- tests/ScadaLink.AuditLog.Tests/AddAuditLogTests.cs — 11 tests (was 3): writer
singleton, IAuditWriter as FallbackAuditWriter, ISiteAuditQueue same-instance
as SqliteAuditWriter, options bind round-trip, NoOp default assertions.
- tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs (new) — 13
tests: BuildHocon emits audit-telemetry-dispatcher block with the expected
type/throughput/thread-count; Central composition root resolves the writer
chain + options; Site composition root resolves the writer chain + options +
NoOp client.
Verified: dotnet build clean, 23 test suites green (Host 194 + AuditLog 54).
Adds the IAuditWriter composer that sits between the script-side
ScriptRuntimeContext audit emission (Bundle F) and the primary
SqliteAuditWriter. Honours the alog.md §7 guarantee that audit-write
failures NEVER abort the user-facing action:
- Primary throw -> log Warning, increment IAuditWriteFailureCounter
(Bundle G's health-metric sink), stash the event in the drop-oldest
RingBufferFallback, return success to the caller.
- Primary success -> opportunistically drain the ring back through the
primary in FIFO order, behind the triggering event. Drain is
serialised via a SemaphoreSlim gate so concurrent recoveries don't
double-replay; a drain-side re-throw re-enqueues at the tail and
breaks out (the next successful write retries).
Adds IAuditWriteFailureCounter as the lightweight DI seam (one void
Increment()), and a TryDequeue helper on RingBufferFallback that the
recovery path uses to pop one item without blocking.
Tests (4 new, total 26 -> 30):
- WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess
- WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite
(order: trigger first, then ring backlog in submission FIFO)
- WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty
- WriteAsync_FailureCounter_Incremented_Per_PrimaryFailure
Adds RingBufferFallback — an in-memory drop-oldest ring buffer used by
the upcoming FallbackAuditWriter (Bundle B-T4) when the primary SQLite
writer is throwing. Backed by Channel<AuditEvent> with
BoundedChannelFullMode.DropOldest, fixed capacity (default 1024).
Channel.CreateBounded(DropOldest) does NOT natively signal a drop on
TryWrite, so overflow is detected by comparing Reader.Count before and
after the enqueue: when the buffer is already at capacity and a new
TryWrite succeeds while keeping the count at capacity, exactly one
event was displaced and RingBufferOverflowed is raised (one event per
drop).
Public surface:
- bool TryEnqueue(AuditEvent) — always succeeds unless completed.
- IAsyncEnumerable<AuditEvent> DrainAsync(CancellationToken) — FIFO.
- void Complete() — closes the channel so DrainAsync can finish.
- event Action? RingBufferOverflowed — health counter hook.
Tests (3 new, total 23 -> 26):
- Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflowOnce
- DrainAsync_Yields_FIFO_Then_Completes_When_Empty
- TryEnqueue_AllSucceeds_ReturnsTrue
Replaces the B-T1 stub WriteAsync with the production hot-path:
- Bounded Channel<PendingAuditEvent> (BoundedChannelFullMode.Wait, capacity
from options) feeds a background ProcessWriteQueueAsync loop that drains
up to BatchSize events per transaction.
- The loop INSERTs each event with explicit parameter binding (enums and
DateTime stored as text); duplicate EventIds (SqliteException with
ErrorCode 19 SQLITE_CONSTRAINT) are swallowed as first-write-wins per
alog.md §11, and the pending TCS is still completed successfully so
callers see idempotent semantics.
- Site rows force ForwardState = Pending on enqueue when the inbound
event leaves it null — site-side default per the M2 design.
- ReadPendingAsync(limit) returns oldest-first pending rows for the
Bundle D telemetry actor; EventId is the deterministic tiebreaker on
identical OccurredAtUtc timestamps. MarkForwardedAsync(ids) flips a
batch to Forwarded in one UPDATE with a parameterised IN list.
- IAsyncDisposable graceful shutdown: TryComplete the writer, await the
drain (5s budget), then dispose the connection.
Tests (7 new, total 16 -> 23):
- WriteAsync_FreshEvent_PersistsWithForwardStatePending
- WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions
- WriteAsync_DuplicateEventId_FirstWriteWins_NoException
- WriteAsync_ForcesForwardStatePending_IfNull
- ReadPendingAsync_Returns_OldestFirst_LimitedToN
- MarkForwardedAsync_FlipsRowsToForwarded
- MarkForwardedAsync_NonExistentId_NoThrow
Two concurrent sessions can both pass the IF NOT EXISTS check and then both attempt the INSERT against UX_AuditLog_EventId; the loser surfaced as SqlException 2601 (or 2627 for PK violations) and aborted the audit write. First-write-wins idempotency is the documented contract, so the race outcome is semantically a no-op — catch the two duplicate-key error numbers and log at Debug, let any other SqlException bubble.
Tests:
- InsertIfNotExistsAsync_ConcurrentDuplicateInserts_ProduceExactlyOneRow: 50 parallel inserters with the same EventId end with exactly one row and no escaped exceptions.
- QueryAsync_Keyset_SameOccurredAtUtc_TiebreaksOnEventId: four rows sharing the same OccurredAtUtc page deterministically through the (OccurredAtUtc, EventId) keyset cursor — exercises the e.OccurredAtUtc == after && e.EventId.CompareTo(afterId) < 0 branch and verifies EF Core 10's Guid.CompareTo translation against SQL Server uniqueidentifier order (deferred Bundle D reviewer recommendation).
AuditLogRepository now takes an optional ILogger<AuditLogRepository> (NullLogger default, mirrors InboundApiRepository); DI registration unchanged.
- M2 head: honor M1 vocabulary (ApiCall/Delivered), harden InsertIfNotExistsAsync
(race window — first concurrent writer arrives in M2), add keyset-tiebreaker
test (Bundle D reviewer's deferred recommendation), reuse MsSqlMigrationFixture
+ Xunit.SkippableFact pattern.
- M6-T4 (AuditLogPurgeActor): replace M1's NotSupportedException stub with the
drop-and-rebuild dance for the non-aligned UX_AuditLog_EventId unique index;
acknowledge the small outage window during partition SWITCH.
- M6-T5 (partition maintenance): note M1 ships 24 monthly boundaries (Jan 2026 -
Dec 2027); service rolls the function forward via SPLIT RANGE.
M1 lands the schema + types every later milestone depends on. After M1 the
database is ready, the ScadaLink.AuditLog project is wired into the solution,
and dotnet test ScadaLink.slnx is green (2503 tests passed, 0 failed).
Shipped (15 commits):
- Commons audit types: AuditEvent record + Audit{Channel,Kind,Status,ForwardState}
enums + IAuditWriter/ICentralAuditWriter interfaces + AuditTelemetryEnvelope +
PullAuditEventsRequest/Response message DTOs.
- EF mapping: AuditEvent -> AuditLog table with composite PK (EventId, OccurredAtUtc),
five named indexes (IX_AuditLog_*) + UX_AuditLog_EventId unique index for
idempotency lookups. AuditLogEntry (config-audit) coexists, untouched.
- Migration AddAuditLogTable: monthly partition function pf_AuditLog_Month
(24 boundaries Jan 2026-Dec 2027) + partition scheme ps_AuditLog_Month on PRIMARY,
table aligned ON ps_AuditLog_Month(OccurredAtUtc); scadalink_audit_writer (INSERT/SELECT
with explicit DENY UPDATE/DELETE) and scadalink_audit_purger (SELECT + ALTER on
SCHEMA::dbo) DB roles created idempotently.
- IAuditLogRepository + EF impl: append-only surface (InsertIfNotExistsAsync,
QueryAsync with keyset paging, SwitchOutPartitionAsync). M1 honest contract:
SwitchOutPartitionAsync throws NotSupportedException pointing to M6 because
UX_AuditLog_EventId is non-partition-aligned (SQL Server requires partition
column in unique-index key); M6's purge actor will drop-and-rebuild around switches.
- New src/ScadaLink.AuditLog/ project: AuditLogOptions + validator (DefaultCapBytes
8KB, ErrorCapBytes 64KB, RetentionDays 365 [30..3650], header redact defaults
Authorization/X-Api-Key/Cookie/Set-Cookie).
- Spec corrections (#23): alog.md + Component-AuditLog.md vocabulary reconciled
to match M1 enums per user-authorized resolution of the cross-bundle review
finding (CLAUDE.md cached-call lifecycle vocabulary supersedes alog.md's earlier
Success/TransientFailure naming).
MSSQL integration tests gated by Xunit.SkippableFact + Connect Timeout=3 fast-fail;
when the infra/mssql container is up, all 8 migration tests + 8 repository tests
pass; when down, they Skip cleanly in ~3s.
Append-only invariant enforced at three layers:
1. DB writer role: DENY UPDATE, DENY DELETE on dbo.AuditLog.
2. Repo interface: no UpdateAsync, no row-DeleteAsync.
3. Repo impl: raw IF NOT EXISTS INSERT only.
infra/* working-tree mods are pre-existing and untouched throughout M1.
The M1 implementation (Bundle A) committed concrete AuditChannel /
AuditKind / AuditStatus enums that reflect CLAUDE.md's locked
cached-call lifecycle decisions. The older alog.md and
Component-AuditLog.md narratives still used pre-M1 vocabulary
(Success / TransientFailure / PermanentFailure / Enqueued / Retrying /
SyncCall / CachedEnqueued / Attempt / Terminal / Completed). This
commit reconciles both docs to the M1 vocabulary:
AuditChannel : ApiOutbound, DbOutbound, Notification, ApiInbound
AuditKind (10): ApiCall, ApiCallCached, DbWrite, DbWriteCached,
NotifySend, NotifyDeliver, InboundRequest,
InboundAuthFailure, CachedSubmit, CachedResolve
AuditStatus(8): Submitted, Forwarded, Attempted, Delivered, Failed,
Parked, Discarded, Skipped
Updates:
- Status column description + worked examples use the new 8 values.
- Kind table flattened from per-channel groupings to a single flat
list of the 10 discriminators (no more SyncCall / Cached* /
Attempt / Terminal / Completed).
- Cached-call lifecycle examples rewritten to the
CachedSubmit -> Forwarded -> Attempted... -> CachedResolve shape.
- Notification lifecycle examples rewritten to
NotifySend(Submitted) -> NotifyDeliver(Attempted) ->
NotifyDeliver(Delivered/Parked/Discarded).
- Inbound API examples split into InboundRequest (success path) and
InboundAuthFailure (401 path).
- 'Errors only' UI toggle, audit-error-rate KPI, and payload-cap
decision (#6 in §16) all switched from 'non-Success' to
Status IN ('Failed', 'Parked', 'Discarded').
- Per-site event-rate table in §13.1 renamed to the new kinds.
Pure design correction; no operational behavior change. Per the
goal-prompt invariant #6, alog.md may change when a design correction
is committed before the affected code change — this commit is that
correction, landed ahead of the M1 merge so the merge order reads
design-first, code-second.
No code, test, or infra file changes.
EF Core implementation of IAuditLogRepository:
- InsertIfNotExistsAsync: single IF NOT EXISTS ... INSERT via
ExecuteSqlInterpolatedAsync, bypasses the change tracker. Enum
values converted to string in C# (columns are varchar(32) via
HasConversion<string>).
- QueryAsync: AsNoTracking, predicate-per-non-null-filter, keyset
paging on (OccurredAtUtc DESC, EventId DESC) — EF Core 10
translates Guid.CompareTo to a uniqueidentifier < comparison
natively (verified against MSSQL 2022).
- SwitchOutPartitionAsync: throws NotSupportedException naming M6;
the non-aligned UX_AuditLog_EventId unique index blocks
ALTER TABLE SWITCH PARTITION until the drop-and-rebuild dance
ships with the purge actor.
DI: AddScoped<IAuditLogRepository, AuditLogRepository>() added after
the NotificationOutboxRepository registration; existing DI smoke test
extended with an IAuditLogRepository assertion.
Integration tests (8 new) use the Bundle C MsSqlMigrationFixture and
scope by a per-test SourceSiteId guid so they neither collide nor
require cleanup.
Bundle D of the Audit Log #23 M1 Foundation plan.
Append-only data-access surface for the central AuditLog table — three
methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync
(filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and
SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until
M6 lands the non-aligned-index drop/rebuild dance for the partition
switch). No Update, no row-delete; bulk purge is partition-only.
Bundle D of the Audit Log #23 M1 Foundation plan.
Reviewer of Bundle C (#23 M1) flagged two blockers in the
AddAuditLogTableMigration integration tests:
1. Tests used 'if (!await EnsureMigrationApplied()) return;' which made
the xunit runner report them as Passed when the dev MSSQL container
was absent — a CI false-positive risk. xunit 2.9.x does NOT ship the
v3 Assert.Skip/SkipUnless/SkipWhen API surface (verified empirically
against xunit.assert 2.9.3 — only v3.x exposes those static methods),
so the canonical xunit-v2 equivalent is the Xunit.SkippableFact
package. Replaced [Fact] with [SkippableFact] and the early-return
pattern with 'Skip.IfNot(_fixture.Available, _fixture.SkipReason)' as
the first statement of each of the 8 audit-log test methods. The
runner now reports them as Skipped (not Passed) when MSSQL is down.
2. MsSqlMigrationFixture relied on SqlClient's 30s default connect
timeout, so a no-container fixture construction hung ~30s. Added
'Connect Timeout=3' to DefaultAdminConnectionString. Verified
fail-fast under ~4s end-to-end with a bad host via env-var override.
Additional fixture cleanups:
- Migration is now applied once in the fixture constructor (was per-test
via EnsureMigrationApplied for idempotency). Tests reach a fully-
migrated database with no extra setup. Removed the now-unused
EnsureMigrationApplied helper from the test class.
- Constructor narrowed its catch to SqlException + InvalidOperationException
for the OpenAsync step (the only legitimate connect-failure surfaces);
everything else (CREATE DATABASE, MigrateAsync) is treated as a hard
fixture failure and bubbles up. Added a best-effort
TryDropOrphanDatabase() pre-throw cleanup so partial construction
cannot leak guid-suffixed databases.
- Stale doc comments referencing the (non-existent) xunit 2.9.x Skip
shim removed; replaced with accurate notes about Xunit.SkippableFact.
Verified:
- dotnet build ScadaLink.slnx: clean (0 warnings, 0 errors).
- dotnet test ScadaLink.ConfigurationDatabase.Tests with MSSQL up:
Passed 150 / Skipped 0 / Failed 0.
- Same suite with SCADALINK_MSSQL_TEST_CONN pointed at a closed port:
the 8 AddAuditLogTableMigration tests report as Skipped (visible
'[SKIP]' lines in runner output), total elapsed ~3s.
Files touched:
- Directory.Packages.props: added Xunit.SkippableFact 1.5.61.
- tests/ScadaLink.ConfigurationDatabase.Tests/ScadaLink.ConfigurationDatabase.Tests.csproj:
added the SkippableFact PackageReference.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs:
Connect Timeout=3, constructor refactor, doc-comment fixes.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs:
[SkippableFact] + Skip.IfNot pattern across all 8 tests.
Untouched (per reviewer guidance):
- Migration file (Bundle C main artifact unchanged).
- Bundle B reconciliation (composite PK + UX_AuditLog_EventId).
- SqlClient VersionOverride 6.1.1 in the test csproj.
- infra/* (separate uncommitted local edits remain in working tree).
Bundle C of the #23 M1 foundation. Creates the centralized AuditLog table
with the partition function, partition scheme, partition-aligned
non-clustered indexes, and the two access-control roles documented in
alog.md §4.
Schema:
- pf_AuditLog_Month: RANGE RIGHT, 24 monthly boundaries (Jan 2026 – Dec 2027).
- ps_AuditLog_Month: ALL TO ([PRIMARY]) — dev/test parity.
- dbo.AuditLog: created via raw SQL ON ps_AuditLog_Month(OccurredAtUtc).
Composite clustered PK {EventId, OccurredAtUtc} (partition column must be
part of the clustered key). 22 columns matching the EF AuditEvent model.
- 5 reconciliation/query non-clustered indexes from alog.md §4
(Channel_Status_Occurred, CorrelationId filtered, OccurredAtUtc,
Site_Occurred, Target_Occurred filtered) — all partition-aligned.
- UX_AuditLog_EventId: non-aligned UNIQUE on EventId alone (preserves
InsertIfNotExistsAsync idempotency from M1-T8). Non-aligned because
partition-aligned unique indexes require the partition column in the key,
which would weaken to composite uniqueness; the purge story (M2/M3)
rebuilds this index around partition switches.
Access control:
- scadalink_audit_writer: GRANT INSERT + GRANT SELECT, DENY UPDATE + DENY DELETE
on AuditLog. The explicit DENY guarantees later db_datawriter membership
cannot quietly re-enable mutation.
- scadalink_audit_purger: GRANT SELECT on AuditLog, GRANT ALTER on SCHEMA::dbo
(enables ALTER PARTITION FUNCTION SWITCH and SWITCH PARTITION).
Both role definitions are idempotent (IF DATABASE_PRINCIPAL_ID IS NULL).
Down() drops in reverse dependency order with IF EXISTS guards.
Integration tests (tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/):
- MsSqlMigrationFixture: connects to the running infra/mssql container (or
the SCADALINK_MSSQL_TEST_CONN override), creates a unique per-fixture
database, applies the migrations, drops the DB on dispose. Marks itself
Available=false when MSSQL is unreachable so tests early-return cleanly
on CI without the dev container.
- AddAuditLogTableMigrationTests: 8 tests covering table existence,
partition function/scheme, partition-aligned PK, the 5 named indexes,
both roles' grants, and a smoke test that a writer-role user receives
SqlException with "permission" on UPDATE AuditLog.
ConfigurationDatabase tests: 142 passing -> 150 passing (8 new integration
tests). Full solution builds clean.
Package: tests project locally overrides Microsoft.Data.SqlClient to 6.1.1
(EF SqlServer 10.0.7 needs >= 6.1.1; central package version is pinned at
6.0.2 for the production ExternalSystemGateway).
Bundle C migration aligns AuditLog to ps_AuditLog_Month(OccurredAtUtc).
A partitioned table's clustered key must include the partition column, so
the PK becomes the composite {EventId, OccurredAtUtc} — a divergence from
Bundle B's single-column PK that needs reconciling in the EF mapping
before the migration is generated.
EventId remains the idempotency key for AuditLogRepository.InsertIfNotExistsAsync
(M1-T8), so a dedicated unique index UX_AuditLog_EventId preserves the
single-column uniqueness constraint.
Tests updated:
- Configure_MapsToAuditLogTable_WithCompositePrimaryKey (replaces the
WithEventIdAsPrimaryKey assertion) verifies {EventId, OccurredAtUtc}.
- Configure_DeclaresUniqueIndex_OnEventIdAlone_ForIdempotencyLookups
asserts the new UX_AuditLog_EventId is unique and on EventId alone.
- Configure_ExpectedIndexes_WithCorrectNames now expects six index names
(the original five plus UX_AuditLog_EventId).
Bundles A-F per cadence memory. Brainstorm decisions locked:
infra/mssql test harness, single AuditEvent record (nullable IngestedAtUtc
+ ForwardState), PRIMARY filegroup, explicit index names.