- M2 head: honor M1 vocabulary (ApiCall/Delivered), harden InsertIfNotExistsAsync
(race window — first concurrent writer arrives in M2), add keyset-tiebreaker
test (Bundle D reviewer's deferred recommendation), reuse MsSqlMigrationFixture
+ Xunit.SkippableFact pattern.
- M6-T4 (AuditLogPurgeActor): replace M1's NotSupportedException stub with the
drop-and-rebuild dance for the non-aligned UX_AuditLog_EventId unique index;
acknowledge the small outage window during partition SWITCH.
- M6-T5 (partition maintenance): note M1 ships 24 monthly boundaries (Jan 2026 -
Dec 2027); service rolls the function forward via SPLIT RANGE.
M1 lands the schema + types every later milestone depends on. After M1 the
database is ready, the ScadaLink.AuditLog project is wired into the solution,
and dotnet test ScadaLink.slnx is green (2503 tests passed, 0 failed).
Shipped (15 commits):
- Commons audit types: AuditEvent record + Audit{Channel,Kind,Status,ForwardState}
enums + IAuditWriter/ICentralAuditWriter interfaces + AuditTelemetryEnvelope +
PullAuditEventsRequest/Response message DTOs.
- EF mapping: AuditEvent -> AuditLog table with composite PK (EventId, OccurredAtUtc),
five named indexes (IX_AuditLog_*) + UX_AuditLog_EventId unique index for
idempotency lookups. AuditLogEntry (config-audit) coexists, untouched.
- Migration AddAuditLogTable: monthly partition function pf_AuditLog_Month
(24 boundaries Jan 2026-Dec 2027) + partition scheme ps_AuditLog_Month on PRIMARY,
table aligned ON ps_AuditLog_Month(OccurredAtUtc); scadalink_audit_writer (INSERT/SELECT
with explicit DENY UPDATE/DELETE) and scadalink_audit_purger (SELECT + ALTER on
SCHEMA::dbo) DB roles created idempotently.
- IAuditLogRepository + EF impl: append-only surface (InsertIfNotExistsAsync,
QueryAsync with keyset paging, SwitchOutPartitionAsync). M1 honest contract:
SwitchOutPartitionAsync throws NotSupportedException pointing to M6 because
UX_AuditLog_EventId is non-partition-aligned (SQL Server requires partition
column in unique-index key); M6's purge actor will drop-and-rebuild around switches.
- New src/ScadaLink.AuditLog/ project: AuditLogOptions + validator (DefaultCapBytes
8KB, ErrorCapBytes 64KB, RetentionDays 365 [30..3650], header redact defaults
Authorization/X-Api-Key/Cookie/Set-Cookie).
- Spec corrections (#23): alog.md + Component-AuditLog.md vocabulary reconciled
to match M1 enums per user-authorized resolution of the cross-bundle review
finding (CLAUDE.md cached-call lifecycle vocabulary supersedes alog.md's earlier
Success/TransientFailure naming).
MSSQL integration tests gated by Xunit.SkippableFact + Connect Timeout=3 fast-fail;
when the infra/mssql container is up, all 8 migration tests + 8 repository tests
pass; when down, they Skip cleanly in ~3s.
Append-only invariant enforced at three layers:
1. DB writer role: DENY UPDATE, DENY DELETE on dbo.AuditLog.
2. Repo interface: no UpdateAsync, no row-DeleteAsync.
3. Repo impl: raw IF NOT EXISTS INSERT only.
infra/* working-tree mods are pre-existing and untouched throughout M1.
The M1 implementation (Bundle A) committed concrete AuditChannel /
AuditKind / AuditStatus enums that reflect CLAUDE.md's locked
cached-call lifecycle decisions. The older alog.md and
Component-AuditLog.md narratives still used pre-M1 vocabulary
(Success / TransientFailure / PermanentFailure / Enqueued / Retrying /
SyncCall / CachedEnqueued / Attempt / Terminal / Completed). This
commit reconciles both docs to the M1 vocabulary:
AuditChannel : ApiOutbound, DbOutbound, Notification, ApiInbound
AuditKind (10): ApiCall, ApiCallCached, DbWrite, DbWriteCached,
NotifySend, NotifyDeliver, InboundRequest,
InboundAuthFailure, CachedSubmit, CachedResolve
AuditStatus(8): Submitted, Forwarded, Attempted, Delivered, Failed,
Parked, Discarded, Skipped
Updates:
- Status column description + worked examples use the new 8 values.
- Kind table flattened from per-channel groupings to a single flat
list of the 10 discriminators (no more SyncCall / Cached* /
Attempt / Terminal / Completed).
- Cached-call lifecycle examples rewritten to the
CachedSubmit -> Forwarded -> Attempted... -> CachedResolve shape.
- Notification lifecycle examples rewritten to
NotifySend(Submitted) -> NotifyDeliver(Attempted) ->
NotifyDeliver(Delivered/Parked/Discarded).
- Inbound API examples split into InboundRequest (success path) and
InboundAuthFailure (401 path).
- 'Errors only' UI toggle, audit-error-rate KPI, and payload-cap
decision (#6 in §16) all switched from 'non-Success' to
Status IN ('Failed', 'Parked', 'Discarded').
- Per-site event-rate table in §13.1 renamed to the new kinds.
Pure design correction; no operational behavior change. Per the
goal-prompt invariant #6, alog.md may change when a design correction
is committed before the affected code change — this commit is that
correction, landed ahead of the M1 merge so the merge order reads
design-first, code-second.
No code, test, or infra file changes.
EF Core implementation of IAuditLogRepository:
- InsertIfNotExistsAsync: single IF NOT EXISTS ... INSERT via
ExecuteSqlInterpolatedAsync, bypasses the change tracker. Enum
values converted to string in C# (columns are varchar(32) via
HasConversion<string>).
- QueryAsync: AsNoTracking, predicate-per-non-null-filter, keyset
paging on (OccurredAtUtc DESC, EventId DESC) — EF Core 10
translates Guid.CompareTo to a uniqueidentifier < comparison
natively (verified against MSSQL 2022).
- SwitchOutPartitionAsync: throws NotSupportedException naming M6;
the non-aligned UX_AuditLog_EventId unique index blocks
ALTER TABLE SWITCH PARTITION until the drop-and-rebuild dance
ships with the purge actor.
DI: AddScoped<IAuditLogRepository, AuditLogRepository>() added after
the NotificationOutboxRepository registration; existing DI smoke test
extended with an IAuditLogRepository assertion.
Integration tests (8 new) use the Bundle C MsSqlMigrationFixture and
scope by a per-test SourceSiteId guid so they neither collide nor
require cleanup.
Bundle D of the Audit Log #23 M1 Foundation plan.
Append-only data-access surface for the central AuditLog table — three
methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync
(filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and
SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until
M6 lands the non-aligned-index drop/rebuild dance for the partition
switch). No Update, no row-delete; bulk purge is partition-only.
Bundle D of the Audit Log #23 M1 Foundation plan.
Reviewer of Bundle C (#23 M1) flagged two blockers in the
AddAuditLogTableMigration integration tests:
1. Tests used 'if (!await EnsureMigrationApplied()) return;' which made
the xunit runner report them as Passed when the dev MSSQL container
was absent — a CI false-positive risk. xunit 2.9.x does NOT ship the
v3 Assert.Skip/SkipUnless/SkipWhen API surface (verified empirically
against xunit.assert 2.9.3 — only v3.x exposes those static methods),
so the canonical xunit-v2 equivalent is the Xunit.SkippableFact
package. Replaced [Fact] with [SkippableFact] and the early-return
pattern with 'Skip.IfNot(_fixture.Available, _fixture.SkipReason)' as
the first statement of each of the 8 audit-log test methods. The
runner now reports them as Skipped (not Passed) when MSSQL is down.
2. MsSqlMigrationFixture relied on SqlClient's 30s default connect
timeout, so a no-container fixture construction hung ~30s. Added
'Connect Timeout=3' to DefaultAdminConnectionString. Verified
fail-fast under ~4s end-to-end with a bad host via env-var override.
Additional fixture cleanups:
- Migration is now applied once in the fixture constructor (was per-test
via EnsureMigrationApplied for idempotency). Tests reach a fully-
migrated database with no extra setup. Removed the now-unused
EnsureMigrationApplied helper from the test class.
- Constructor narrowed its catch to SqlException + InvalidOperationException
for the OpenAsync step (the only legitimate connect-failure surfaces);
everything else (CREATE DATABASE, MigrateAsync) is treated as a hard
fixture failure and bubbles up. Added a best-effort
TryDropOrphanDatabase() pre-throw cleanup so partial construction
cannot leak guid-suffixed databases.
- Stale doc comments referencing the (non-existent) xunit 2.9.x Skip
shim removed; replaced with accurate notes about Xunit.SkippableFact.
Verified:
- dotnet build ScadaLink.slnx: clean (0 warnings, 0 errors).
- dotnet test ScadaLink.ConfigurationDatabase.Tests with MSSQL up:
Passed 150 / Skipped 0 / Failed 0.
- Same suite with SCADALINK_MSSQL_TEST_CONN pointed at a closed port:
the 8 AddAuditLogTableMigration tests report as Skipped (visible
'[SKIP]' lines in runner output), total elapsed ~3s.
Files touched:
- Directory.Packages.props: added Xunit.SkippableFact 1.5.61.
- tests/ScadaLink.ConfigurationDatabase.Tests/ScadaLink.ConfigurationDatabase.Tests.csproj:
added the SkippableFact PackageReference.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs:
Connect Timeout=3, constructor refactor, doc-comment fixes.
- tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs:
[SkippableFact] + Skip.IfNot pattern across all 8 tests.
Untouched (per reviewer guidance):
- Migration file (Bundle C main artifact unchanged).
- Bundle B reconciliation (composite PK + UX_AuditLog_EventId).
- SqlClient VersionOverride 6.1.1 in the test csproj.
- infra/* (separate uncommitted local edits remain in working tree).
Bundle C of the #23 M1 foundation. Creates the centralized AuditLog table
with the partition function, partition scheme, partition-aligned
non-clustered indexes, and the two access-control roles documented in
alog.md §4.
Schema:
- pf_AuditLog_Month: RANGE RIGHT, 24 monthly boundaries (Jan 2026 – Dec 2027).
- ps_AuditLog_Month: ALL TO ([PRIMARY]) — dev/test parity.
- dbo.AuditLog: created via raw SQL ON ps_AuditLog_Month(OccurredAtUtc).
Composite clustered PK {EventId, OccurredAtUtc} (partition column must be
part of the clustered key). 22 columns matching the EF AuditEvent model.
- 5 reconciliation/query non-clustered indexes from alog.md §4
(Channel_Status_Occurred, CorrelationId filtered, OccurredAtUtc,
Site_Occurred, Target_Occurred filtered) — all partition-aligned.
- UX_AuditLog_EventId: non-aligned UNIQUE on EventId alone (preserves
InsertIfNotExistsAsync idempotency from M1-T8). Non-aligned because
partition-aligned unique indexes require the partition column in the key,
which would weaken to composite uniqueness; the purge story (M2/M3)
rebuilds this index around partition switches.
Access control:
- scadalink_audit_writer: GRANT INSERT + GRANT SELECT, DENY UPDATE + DENY DELETE
on AuditLog. The explicit DENY guarantees later db_datawriter membership
cannot quietly re-enable mutation.
- scadalink_audit_purger: GRANT SELECT on AuditLog, GRANT ALTER on SCHEMA::dbo
(enables ALTER PARTITION FUNCTION SWITCH and SWITCH PARTITION).
Both role definitions are idempotent (IF DATABASE_PRINCIPAL_ID IS NULL).
Down() drops in reverse dependency order with IF EXISTS guards.
Integration tests (tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/):
- MsSqlMigrationFixture: connects to the running infra/mssql container (or
the SCADALINK_MSSQL_TEST_CONN override), creates a unique per-fixture
database, applies the migrations, drops the DB on dispose. Marks itself
Available=false when MSSQL is unreachable so tests early-return cleanly
on CI without the dev container.
- AddAuditLogTableMigrationTests: 8 tests covering table existence,
partition function/scheme, partition-aligned PK, the 5 named indexes,
both roles' grants, and a smoke test that a writer-role user receives
SqlException with "permission" on UPDATE AuditLog.
ConfigurationDatabase tests: 142 passing -> 150 passing (8 new integration
tests). Full solution builds clean.
Package: tests project locally overrides Microsoft.Data.SqlClient to 6.1.1
(EF SqlServer 10.0.7 needs >= 6.1.1; central package version is pinned at
6.0.2 for the production ExternalSystemGateway).
Bundle C migration aligns AuditLog to ps_AuditLog_Month(OccurredAtUtc).
A partitioned table's clustered key must include the partition column, so
the PK becomes the composite {EventId, OccurredAtUtc} — a divergence from
Bundle B's single-column PK that needs reconciling in the EF mapping
before the migration is generated.
EventId remains the idempotency key for AuditLogRepository.InsertIfNotExistsAsync
(M1-T8), so a dedicated unique index UX_AuditLog_EventId preserves the
single-column uniqueness constraint.
Tests updated:
- Configure_MapsToAuditLogTable_WithCompositePrimaryKey (replaces the
WithEventIdAsPrimaryKey assertion) verifies {EventId, OccurredAtUtc}.
- Configure_DeclaresUniqueIndex_OnEventIdAlone_ForIdempotencyLookups
asserts the new UX_AuditLog_EventId is unique and on EventId alone.
- Configure_ExpectedIndexes_WithCorrectNames now expects six index names
(the original five plus UX_AuditLog_EventId).
Bundles A-F per cadence memory. Brainstorm decisions locked:
infra/mssql test harness, single AuditEvent record (nullable IngestedAtUtc
+ ForwardState), PRIMARY filegroup, explicit index names.
Per user request: every milestone now carries bite-sized TDD tasks
(write failing test -> run failing -> implement -> run passing -> commit),
matching M1's density. Each task lists exact file paths, numbered steps,
and a commit message.
Task counts per milestone:
- M1 Foundation: 11
- M2 Site pipeline (sync-only): 12
- M3 Cached operations + dual-write (inlines #22 + cached-call tracking): 18
- M4 Remaining boundary emission: 12
- M5 Payload + redaction policy: 10
- M6 Reconciliation, purge, partition maintenance, metrics: 12
- M7 Central UI: 16
- M8 CLI: 9
Total: ~100 bite-sized tasks.
The roadmap remains the contract; per-milestone execution still goes
through brainstorm -> writing-plans -> subagent-driven-development to
produce a milestone-specific .tasks.json. Tasks in this roadmap will
shift slightly as M1 reveals codebase realities; treat them as the
intended shape rather than immutable IDs.
Roadmap covering Audit Log (#23) code implementation across 8 milestones
(M1 Foundation → M8 CLI). Reflects the actual state of the codebase —
all 22 prior components have source + tests, but Site Call Audit (#22)
and cached-call tracking are design-only despite being on main; their
minimum surface is inlined into M3.
M1 is laid out at full TDD-level task detail (11 bite-sized tasks).
M2–M8 are at milestone-shape detail (goals, files, task headlines,
acceptance criteria, risk callouts). Per-milestone bite-sized plans
will be generated by brainstorm + writing-plans when each milestone is
about to execute — locking 80 task cards now would mostly be stale by
M5 as M1 reveals codebase realities.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8).
Spec: docs/requirements/Component-AuditLog.md + alog.md (commit
fec0bb1).
Adds new component #23 Audit Log: a central, append-only forensic +
operational record of every script-trust-boundary action — outbound API
calls (sync + cached), outbound DB operations (sync + cached, incl.
script-initiated reads), notifications, and inbound API requests.
Sits alongside the existing operational stores (Notifications #21 and
SiteCalls #22) without replacing them. Site-local SQLite hot-path append
+ best-effort gRPC telemetry + central reconciliation pull; cached calls
emit one combined telemetry packet that drives both the immutable
AuditLog insert and the operational SiteCalls upsert in a single
transaction. Central direct-write for Inbound API middleware and
Notification Outbox dispatcher events.
Key invariants:
- Strictly append-only at central (enforced via DB roles + CI grep
guard); monthly partitioning, 365-day default retention via partition
switch (no row-level deletes).
- Site SQLite purge requires ForwardState in {Forwarded, Reconciled};
central outage cannot cause audit loss at sites.
- Audit-write failure never aborts the user-facing action.
- Payload: metadata + truncated bodies (8 KB default, 64 KB on errors);
headers redacted by default, SQL parameter values captured by default
with per-connection opt-out.
- New top-level Audit nav group in Central UI with drill-ins from
Notifications, Site Calls, External Systems, Inbound API Keys, Sites,
Instances.
Deferred to v1.x: hash-chain tamper evidence, Parquet archival,
per-channel retention overrides.
23 commits, 17 files changed (+1,419/-21). Component-AuditLog.md (new)
plus cross-references in 11 existing component docs, README,
HighLevelReqs (AL-1..AL-12), and CLAUDE.md.
Final cross-bundle reviewer identified 7 inconsistencies that the per-bundle
reviewers couldn't see; all fixed in one logical commit.
Critical:
- HighLevelReqs AL-3: drop 'then upsert-on-newer-status' — AuditLog is
strictly append-only (correct for SiteCalls/Notifications, wrong for
the immutable AuditLog shadow).
- Component-AuditLog Error rate KPI: align with HealthMonitoring's
exclusion list (Success/Delivered/Enqueued) rather than just non-Success;
otherwise every Delivered notification or Enqueued cached call would be
counted as an error.
Important:
- Component-AuditLog line 154: ISiteAuditWriter -> IAuditWriter (canonical
name per Commons and the rest of this doc).
- Component-AuditLog Central direct-write paragraph: convert remaining
slash notation (ApiInbound/Completed, Notification/Attempt,
Notification/Terminal) to dot notation used everywhere else.
- Component-ClusterInfrastructure: scope SiteCallAuditActor to
reconciliation + KPIs + Retry/Discard relay; cached-telemetry ingest is
AuditLogIngestActor's role per Combined Telemetry contract.
- Component-CentralUI Audit Log page: state the OperationalAudit read
permission and the read-vs-export split (matching CLI doc).
- Component-NotificationOutbox: add never-fail-the-action invariant for
dispatcher audit writes.
Minor:
- Component-InboundAPI: 'Non-blocking semantics' was ambiguous (could be
read as async); reword to 'Fail-soft' — the write is still synchronous
before flush, but failures are caught and don't change the response.
- Component-CLI: realign audit-query/audit-export flags to actually match
the Central UI Audit Log filter set (channel, kind, status, site,
instance, target, actor, correlation-id, errors-only); drop --user and
--entity-id which are IAuditService concepts, not Audit Log columns.
- Component-AuditLog KPI tile names: 'Volume/Error rate/Backlog' ->
'Audit volume/Audit error rate/Audit backlog' (matches Central UI and
Health Monitoring); drop the two orphan KPIs (Top inbound callers, Top
outbound 5xx) that were never surfaced anywhere.
- Component-AuditLog Interactions: re-attribute DbOutbound emissions to
ESG (where Database.* lives) with a note that Site Runtime is the API
surface for scripts.
- HighLevelReqs AL-12: drop 'and reconciliation operations' (CLI has no
reconcile command; reconciliation is an internal self-healing pull).
Add note that verify-chain becomes operational once AL-11's hash chain
ships.
Task 10's reviewer noted that Component-CentralUI.md renamed the
IAuditService page from 'Audit Log Viewer' to 'Configuration Audit Log
Viewer' to avoid collision with the new operational Audit Log page (#23).
Two stale lowercased refs in Component-ConfigurationDatabase.md needed
the same disambiguation.
Bundle D code-review feedback on 0ae1a25 and e6f7a7f:
- Audit error rate (HealthMonitoring tile) was described as a combined
view of CentralAuditWriteFailures + AuditRedactionFailure (writer
health). Per alog.md §10.3 / §14.1 it is the operational error rate
of audited operations: % of central AuditLog rows with Status not
in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit
writer issues surface separately via the dedicated metrics.
- Audit volume description gains the spec-mandated 'events/min, global
+ per-site sparkline' shape.
- CLI: scadalink audit was claiming all three subcommands need both
OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read
(query, verify-chain) needs OperationalAudit; bulk export
additionally requires AuditExport. Restored the spec's split.
Reviewer flag on 1bbfad3: "per Component-AuditLog.md, §6.2" pointed at
alog.md numbering, not at any anchor in Component-AuditLog.md (which uses
prose subsection titles). Switch to the prose anchor (Ingestion Paths →
Telemetry forward) so the link resolves.
Task 2's spec reviewer flagged that the plan used a non-existent name
'CachedOperationTelemetry' when describing the additively-evolved cached
telemetry message. The existing message is 'CachedCallTelemetry'; renaming
would violate Commons REQ-COM-5a (additive-only). Plan now reflects the
in-place additive evolution and warns against rename.
Code-review feedback on c334de0:
- Ingestion Paths intro said 'Three write paths' but the section has four
subsections (site hot-path append + 3 central writers). Reword to 'Four
paths feed the central AuditLog -- one site originator and three central
writers'.
- Purpose: 'dashboards plus drilldowns plus filter queries' read awkwardly;
switch to standard comma list.
See docs/plans/2026-05-20-centralized-audit-log.md and peer .tasks.json.
17 tasks covering Component-AuditLog.md plus cross-references across
11 affected component docs, README, HighLevelReqs, and CLAUDE.md.
Spec is alog.md at commit fec0bb1.
Validated design for a new append-only AuditLog covering the script
trust boundary: outbound API calls (sync + cached), outbound DB
operations (sync + cached, incl. script-initiated reads), notifications,
and inbound API requests. Layered alongside existing Notifications (#21)
and SiteCalls (#22) operational tables.
Key decisions:
- One row per lifecycle event; strictly append-only.
- Site SQLite hot-path append + best-effort gRPC telemetry + central
reconciliation pull. Site purge requires ForwardState=Forwarded.
- Cached calls: site emits; one telemetry packet feeds both the
immutable AuditLog row and the operational SiteCalls upsert.
- Payload: metadata + truncated bodies (8 KB default, 64 KB on errors).
Headers redacted; SQL parameter values captured by default.
- Audit-write failures never abort the user-facing action.
- Monthly partitioning at central; 365-day global retention.
- New Audit nav group + drill-in links from existing pages.
Deferred to v1.x: hash-chain tamper evidence, Parquet archival,
per-channel retention overrides. Provisional component #23.