M6 head records M5 realities: - IOptionsMonitor hot-reload pattern verified; M6 retention config can reuse. - AuditRedactionFailure counter site-only in M5; M6 wires central side. - Filter integration is at 3 writer entry points; purge actor doesn't emit so no filter integration needed. - SwitchOutPartitionAsync drop-and-rebuild dance required (M1 reality + M6-T4 already documents it). - M6 should land the real ISiteStreamAuditClient (Option A) so push telemetry leaves NoOp behind.
116 KiB
Audit Log (#23) Code Implementation Roadmap
For Claude: REQUIRED SUB-SKILL FLOW per milestone:
brainstorming→writing-plans→subagent-driven-development. Usedocs/requirements/Component-AuditLog.md+alog.mdas the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. M1 carries full TDD-level task detail; M2–M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.
Goal: Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
Architecture: Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See alog.md for the locked design decisions and Component-AuditLog.md for the component spec.
Tech Stack: Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing SiteStream channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
Spec: /Users/dohertj2/Desktop/scadalink-design/alog.md (validated, immutable; commit fec0bb1). Component design at /Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md.
Codebase Reality Check (what already exists)
- All 22 prior components have source + tests. Audit Log slots in as a new
src/ScadaLink.AuditLog/project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface). - Existing patterns to copy from:
- Singleton wiring:
src/ScadaLink.Host/Actors/AkkaHostedService.cs:272–280(NotificationOutboxActor) —ClusterSingletonManager.Props+ manager/proxy pair. - EF migration:
src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs— table create + indexes; no partitioning yet — Audit Log will be the first. - Site SQLite hot-path:
src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98— single connection, write lock, Channel-based background writer. - Site-buffer + forwarder:
src/ScadaLink.StoreAndForward/—StoreAndForwardStorage+NotificationForwardershow the Pending → Forwarded transition we'll mirror. - Actor + repo + test trio:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csandtests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20— TestKit base class, NSubstitute repo,Sys.ActorOf,ExpectMsg<T>. - gRPC additive:
src/ScadaLink.Communication/Protos/sitestream.proto— currently carries onlyAttributeValueUpdateandAlarmStateUpdatein aoneof; we extend it. - CLI command shape:
src/ScadaLink.CLI/Commands/AuditLogCommands.cs:1–53— System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay). - Blazor listing page:
src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor— filter bar + keyset paging + status badges idiom.
- Singleton wiring:
AuditLog.razorandAuditLogCommands.csalready exist but they're the IAuditService config-change viewer. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.- Test framework: xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under
tests/ScadaLink.IntegrationTests/. Playwright UI tests undertests/ScadaLink.CentralUI.PlaywrightTests/. Atests/ScadaLink.PerformanceTests/exists for load tests.
Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
The design for both is merged on main (alog.md cached-call tracking section; Component-SiteCallAudit.md), but grep finds zero references to TrackedOperationId or CachedCallTelemetry in src/. This matters because M3 (cached operations + dual-write transaction) cannot be built without them.
Three ways to handle this — pick before M3:
- Inline into M3 (Recommended): Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the
CachedCallTelemetrymessage, the operational-tracking SQLite table at sites, theSiteCallstable + repo +SiteCallAuditActorskeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end). - M0 prerequisite milestone: Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
- Ship Audit Log sync-only first, retrofit cached path later: M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
Default choice in this roadmap: (1). M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
Milestone index
| M | Title | Ships | Touches | Depends on |
|---|---|---|---|---|
| M1 | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
| M2 | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync Call() audited from script to central row). |
Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
| M3 | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
| M4 | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
| M5 | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
| M6 | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
| M7 | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing AuditLog.razor renamed to ConfigurationAuditLog. |
CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
| M8 | CLI — scadalink audit query / export / verify-chain |
Operator surface for query/export; verify-chain is a no-op stub until v1.x hash chain ships. |
CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
Ship-state at end of each milestone is the shippable slice — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
M1 — Foundation: schema, types, DB roles, partitioning
Goal: Land the new AuditLog table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
Affected projects:
src/ScadaLink.Commons/— entity, enums, interfaces, message DTOs.src/ScadaLink.ConfigurationDatabase/— EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.tests/ScadaLink.Commons.Tests/— enum + record tests.tests/ScadaLink.ConfigurationDatabase.Tests/— migration tests, repo tests.
Acceptance criteria:
dotnet buildof the solution succeeds.dotnet ef database updateagainst a dev MS SQL applies the migration;AuditLogtable exists, partitioned monthly onOccurredAtUtc, with PK onEventIdand the five expected indexes.scadalink_audit_writerandscadalink_audit_purgerSQL roles exist with the documented grants; a smoke test confirmsUPDATE AuditLogfrom the writer role fails.AuditEventrecord,AuditChannel/AuditKind/AuditStatusenums,IAuditWriter/ICentralAuditWriterinterfaces,AuditTelemetryEnvelope/PullAuditEventsmessage DTOs all exist in Commons in the right folders.IAuditLogRepositoryinterface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposesInsertIfNotExistsAsync, paged read, andSwitchOutPartitionAsync— no update or row-delete.- All new tests pass; no existing tests regress.
M1 — Tasks (TDD-detail)
M1-T1: Add audit enums to Commons
Files:
- Create:
src/ScadaLink.Commons/Types/Enums/AuditChannel.cs,AuditKind.cs,AuditStatus.cs. - Create:
tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs.
Steps:
- Write failing test verifying
AuditChannelhas exactlyApiOutbound | DbOutbound | Notification | ApiInbound(assertingEnum.GetValueslength and members). - Same for
AuditKind(10 members perComponent-AuditLog.md). - Same for
AuditStatus(8 members). - Run: tests fail (enums don't exist). Implement the three enums.
- Run tests: pass.
- Commit:
feat(commons): add Audit{Channel,Kind,Status} enums for #23.
M1-T2: Add AuditEvent record + ForwardState enum
Files:
- Create:
src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs— public record carrying all 20 central columns (peralog.md§4) plus a nullableForwardState?for the site-local variant. - Create:
src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs—Pending | Forwarded | Reconciled. - Create:
tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs.
Steps:
- Write failing test that constructs an
AuditEvent, sets every property, and round-trips viawithexpressions — asserts immutability and required-property behaviour. - Run: fail (type doesn't exist). Implement the record.
- Run: pass.
- Commit:
feat(commons): add AuditEvent record + ForwardState enum.
M1-T3: Add IAuditWriter and ICentralAuditWriter
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs,ICentralAuditWriter.cs. - Create:
tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs(smoke — only that the interfaces exist and have the documented signatures).
Steps:
- Write failing reflection-based test asserting both interfaces expose
Task WriteAsync(AuditEvent, CancellationToken). - Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
- Run: pass.
- Commit:
feat(commons): add IAuditWriter and ICentralAuditWriter.
M1-T4: Add audit telemetry + pull message DTOs
Files:
- Create:
src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs,PullAuditEventsRequest.cs,PullAuditEventsResponse.cs. - Create:
tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs.
Steps:
- Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
- Failing test: pull request carries
SinceUtc+BatchSize; response carries events +MoreAvailable. - Implement.
- Run: pass.
- Commit:
feat(commons): add audit telemetry + pull message DTOs.
M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
Files:
- Modify:
src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs— addpublic DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();at the appropriate position (afterNotifications). - Create:
src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs—IEntityTypeConfiguration<AuditEvent>mapping the columns, types, length constraints, and indexes peralog.md§4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively. - Modify:
OnModelCreating— apply the new configuration. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs— useModelBuilderdirectly to verify the entity is mapped toAuditLogtable, PK isEventId, and the expected columns + indexes are declared.
Steps:
- Failing test asserts mapped table name, PK column, and column count.
- Implement entity configuration; apply in
OnModelCreating. - Failing test asserts the five expected indexes exist on the model.
- Add
HasIndexdeclarations. - Run: pass.
- Commit:
feat(configdb): map AuditEvent to AuditLog table with PK and indexes.
M1-T6: Generate and customize EF migration for AuditLog
Files:
- Create:
src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.csviadotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase. - Modify: the generated
Up()/Down()to:- Create the partition function
pf_AuditLog_Monthand partition schemeps_AuditLog_Month(raw SQL viamigrationBuilder.Sql(...)), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting). - Alter the
CreateTablecall (or follow up withSql) to align the table tops_AuditLog_Month(OccurredAtUtc). - Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
- Create the partition function
- Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs— applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
Steps:
- Run
dotnet ef migrations add AddAuditLogTable. - Failing integration test: apply migration, query
sys.partition_functionsandsys.partition_schemesfor the expected names. - Edit migration to add the partition function + scheme + alignment.
- Re-run test: pass.
- Failing test: query
sys.indexesfor the five expected named indexes. - Adjust migration if any index name drifts.
- Run: pass.
- Commit:
feat(configdb): add AuditLog migration with monthly partitioning.
M1-T7: Add DB roles in migration
Files:
- Modify: the M1-T6 migration
Up()to also create thescadalink_audit_writer(INSERT + SELECT only) andscadalink_audit_purger(ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (IF NOT EXISTS). - Modify:
Down()— drop the roles. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs— applies migration, then runsSELECTonsys.database_role_members/sys.database_permissionsto assert the role grants. Plus a smoke test: connect as a user mapped toscadalink_audit_writer, attemptUPDATE AuditLog SET Status = 'X'and expect a permission error.
Steps:
- Failing test asserts both roles exist with documented grants.
- Add
migrationBuilder.Sql(...)blocks. - Run: pass.
- Failing test:
UPDATE AuditLogas audit writer → expect SqlException with permission error. - Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
- Run: pass.
- Commit:
feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles.
M1-T8: Add IAuditLogRepository + EF implementation
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs—InsertIfNotExistsAsync(AuditEvent, CancellationToken),QueryAsync(filter, paging, CancellationToken),SwitchOutPartitionAsync(monthBoundary, CancellationToken). Deliberately noUpdateAsyncor row-levelDeleteAsync. - Create:
src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs— implementation using the DbContext;InsertIfNotExistsAsyncusesMERGEor rawINSERT … WHERE NOT EXISTSto satisfy idempotency without throwing on dupes. - Modify:
ServiceCollectionExtensions.cs— registerIAuditLogRepository→AuditLogRepositoryin DI. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs.
Steps:
- Failing test:
InsertIfNotExistsAsyncfor a freshEventIdwrites one row; calling again with the sameEventIdis a no-op (no exception, no second row). - Implement; use a
MERGEorINSERT … WHERE NOT EXISTSstrategy that does NOT rely on EF change tracking. - Run: pass.
- Failing test: paged
QueryAsyncreturns rows in(OccurredAtUtc desc, EventId desc)order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range). - Implement filter projection + keyset paging.
- Run: pass.
- Failing test:
SwitchOutPartitionAsyncfor the oldest partition removes its rows from the live table. - Implement via
migrationBuilder-styleSql("ALTER TABLE ... SWITCH PARTITION ... TO ...")(against a staging table the implementation creates and drops within the same transaction). - Run: pass.
- Commit:
feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge).
M1-T9: Add AuditLogOptions configuration class + binding
Files:
- Create:
src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs(new project — see M1-T11) — ownsDefaultCapBytes,ErrorCapBytes,HeaderRedactList,GlobalBodyRedactors,PerTargetOverrides,RetentionDays, validation attributes. - Add: validation on startup (
IValidateOptions<AuditLogOptions>). - Test: ensure
appsettings.jsonbind round-trips and validation rejects out-of-rangeRetentionDays.
Steps:
- Failing test: bind a valid section → values present.
- Implement options class + binding.
- Failing test: bind invalid
RetentionDays→ validator rejects. - Implement validator.
- Run: pass.
- Commit:
feat(auditlog): add AuditLogOptions config binding.
M1-T10: Add ScadaLink.AuditLog project skeleton
Files:
- Create:
src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj— TargetFramework matches the rest of the solution; ProjectReferences toScadaLink.CommonsandScadaLink.ConfigurationDatabase. - Create:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs—AddAuditLog(this IServiceCollection, IConfiguration)that registersAuditLogOptions,IAuditLogRepository, plus placeholders that later milestones will fill (writer impls, actors). - Create:
tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csprojwith one smoke test. - Modify:
ScadaLink.slnx— add both projects to the solution. - Modify:
Directory.Packages.propsif any new package versions are needed.
Steps:
- Create projects via
dotnet new classlib/dotnet new xunit; add references; add to slnx. - Failing test: smoke-test
AddAuditLog()populates DI withIAuditLogRepositoryandIOptions<AuditLogOptions>. - Implement
ServiceCollectionExtensions.AddAuditLog. - Run: pass.
- Commit:
feat(auditlog): scaffold ScadaLink.AuditLog project.
M1-T11: Update Component-Host.md responsibilities + README component table
Files:
- Modify:
docs/requirements/Component-Host.md— listScadaLink.AuditLogin the central role's registration set. - Modify:
README.md— confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
Steps:
- Edit, verify cross-refs, commit:
docs(audit): register ScadaLink.AuditLog project in Host role.
M2 — Site pipeline (sync-only path)
Goal: First end-to-end audit emission: a script-initiated ExternalSystem.Call() produces an audit row in the central AuditLog table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: ApiOutbound / ApiCall.
Affected projects: Commons, AuditLog (new), Communication, Host, ExternalSystemGateway, all matching *.Tests/, tests/ScadaLink.IntegrationTests/.
M1 realities to honor:
- Vocabulary: M1 enums use
AuditKind.ApiCall(sync) andAuditStatus.Delivered|Failed. The original spec'sSyncCall/Successnames were superseded; alog.md + Component-AuditLog.md were reconciled in the M1 merge.- Idempotent insert race: M1's
AuditLogRepository.InsertIfNotExistsAsyncuses non-lockingIF NOT EXISTS … INSERT. M2 is the first concurrent writer (AuditLogIngestActorwill receive batches from multiple sites). Harden the repo before relying on it — either addWITH (UPDLOCK, HOLDLOCK)to the existence check, or catch SqlException numbers 2601/2627 (duplicate key onUX_AuditLog_EventId) and swallow. Add a new task at the head of M2 for this fix and its concurrency test.- Keyset tiebreaker test gap: M1's
QueryAsync_Keyset_NextPageStartsAfterCursortest uses five rows with distinctOccurredAtUtc, so theGuid.CompareTotiebreaker branch is never exercised. Add a same-OccurredAt test in M2 (Bundle D reviewer's deferred recommendation).- Reusable MSSQL fixture:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs+[SkippableFact]+Skip.IfNot(_fixture.Available, _fixture.SkipReason)is the established pattern. Consider promoting it to a[CollectionDefinition]-shared fixture when M2+ adds more MSSQL-dependent test classes.- Project layout:
src/ScadaLink.AuditLog/is wired into the solution withConfiguration/AuditLogOptions.cs+ validator +ServiceCollectionExtensions.AddAuditLog(). M2'sSite/andCentral/subfolders attach to this project; the DI extension is the registration point.
Acceptance criteria:
- Site-local
IAuditWriterwrites to a per-site SQLiteauditlog.dbon the hot path withForwardState = 'Pending'; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric. SiteAuditTelemetryActordrains pending rows in batches via a newIngestAuditEventsRPC on the existingSiteStreamgRPC service; on success flipsForwardState = 'Forwarded'.AuditLogIngestActor(central singleton) receives the batch, performsInsertIfNotExistsAsyncper event, returns ack.ExternalSystem.Call()emits oneApiOutbound.SyncCallrow viaIAuditWriteron every call completion; audit-write failure does NOT abort the script.- Integration test in
tests/ScadaLink.IntegrationTests/boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the centralAuditLogwithin N seconds. - No regressions in existing ExternalSystemGateway or Communication tests.
M2 — Tasks (TDD-detail)
M2-T1: SqliteAuditWriter — schema + connection bootstrap
Files:
- Create:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs— implementsIAuditWriter. Constructor takes aSqliteOptions(path); singleSqliteConnectionper instance gated bySemaphoreSlim(1,1). CallsInitializeSchema()on first use. Pattern fromsrc/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs.
Steps:
- Failing test: opening a writer against a
:memory:SQLite produces anAuditLogtable with the documented columns (the 20 central columns minusIngestedAtUtc, plusForwardState). - Run: fail (class doesn't exist).
- Implement
InitializeSchema()withCREATE TABLE IF NOT EXISTS AuditLog (...). Use SQLite column types matching the EF mapping where reasonable (TEXTfor IDs,INTEGERfor status enums,BLOBnot used). - Run: pass.
- Commit:
feat(auditlog): SqliteAuditWriter schema bootstrap.
M2-T2: SqliteAuditWriter — hot-path WriteAsync
Files:
- Modify:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs.
Steps:
- Failing test:
WriteAsync(event)inserts one row withForwardState = Pending. - Failing test: 1,000 concurrent
WriteAsynccalls all complete without exception and produce exactly 1,000 rows (write-lock correctness). - Run: fail.
- Implement using a parameterized
INSERTunderSemaphoreSlimlock. - Run: pass.
- Commit:
feat(auditlog): SqliteAuditWriter hot-path INSERT with write lock.
M2-T3: RingBufferFallback — in-memory fallback
Files:
- Create:
src/ScadaLink.AuditLog/Site/RingBufferFallback.cs—Channel<AuditEvent>withBoundedChannelFullMode.DropOldest, default capacity 1024. - Create:
tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs.
Steps:
- Failing test: enqueueing 1,025 events into a 1,024-cap ring drops the oldest and emits a
RingBufferOverflownotification (incrementing a passed-in counter). - Failing test:
DrainTo(writer)writes all buffered events in FIFO order and clears the ring. - Implement.
- Run: pass.
- Commit:
feat(auditlog): RingBufferFallback with drop-oldest overflow.
M2-T4: FallbackAuditWriter — compose primary + ring behind IAuditWriter
Files:
- Create:
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs— primary writer isSqliteAuditWriter; on transient exception, enqueues intoRingBufferFallbackand incrementsSiteAuditWriteFailures(M2-T11). On the next successful primary write, drains the ring back through the primary. - Create:
tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs.
Steps:
- Failing test: when the primary throws, the event lands in the ring and the call returns successfully.
- Failing test: when primary writes succeed again, the ring drains in FIFO order.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): FallbackAuditWriter composing SQLite + ring.
M2-T5: Extend sitestream.proto with IngestAuditEvents RPC
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— addmessage AuditEventDto { string event_id = 1; google.protobuf.Timestamp occurred_at_utc = 2; ... }(all 20 central fields),message AuditEventBatch { repeated AuditEventDto events = 1; },message IngestAck { repeated string accepted_event_ids = 1; }, andrpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);onSiteStreamService. - Build:
dotnet build src/ScadaLink.Communication/regenerates the C# stubs. - Create:
tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs.
Steps:
- Failing test: round-trip serialize/deserialize a populated
AuditEventDto; assert all fields survive. - Edit proto; rebuild.
- Run: pass.
- Commit:
feat(comms): add IngestAuditEvents RPC + AuditEvent proto messages.
M2-T6: AuditEvent ↔ AuditEventDto mapper
Files:
- Create:
src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs— staticToDto(AuditEvent)andFromDto(AuditEventDto). - Create:
tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs.
Steps:
- Failing test: round-trip a populated
AuditEventthroughToDto→FromDto; assert equality on all 20 columns. - Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditEvent ↔ proto Dto mapper.
M2-T7: SiteAuditTelemetryActor — drain loop
Files:
- Create:
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs—ReceiveActorwith aDrainself-tick. OnDrain: read up toBatchSizePendingrows from SQLite; send via gRPC; mark accepted rowsForwarded. - Create:
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs—BatchSize = 256,BusyIntervalSeconds = 5,IdleIntervalSeconds = 30. - Create:
tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.csusingTestKit+ NSubstitute for the gRPC client.
Steps:
- Failing test: when SQLite has 50 pending rows, a
Draintick sends one batch via the mocked gRPC client. - Failing test: on ack, the corresponding rows flip to
Forwardedin SQLite. - Failing test: when gRPC throws, rows stay
Pendingand the next tick retries. - Failing test: cadence is 5s after a tick that drained ≥1 row, 30s after a tick that drained 0.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): SiteAuditTelemetryActor drain loop.
M2-T8: AuditLogIngestActor + gRPC server handler
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs—ReceiveActoracceptingIngestAuditEventsCommand(batch); callsIAuditLogRepository.InsertIfNotExistsAsyncfor each event inside a singleDbContexttransaction; replies withIngestAck(acceptedEventIds). - Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs— implement the newIngestAuditEventsmethod as a thin gRPC↔Akka adapter (Askagainst the central singleton's proxy, mapped to the gRPC reply). - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs.
Steps:
- Failing test: actor receives a batch of 5 events; repo is called 5 times; reply lists all 5 EventIds as accepted.
- Failing test: when 2 of 5 events already exist (repo returns
Inserted = false), the reply still lists all 5 as accepted (idempotent semantics). - Failing test: gRPC handler routes to actor and returns its reply.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditLogIngestActor + gRPC server handler.
M2-T9: Host registration with dedicated dispatcher
Files:
- Modify:
src/ScadaLink.Host/Actors/AkkaHostedService.cs— alongside the existing wiring at:272–280, registerAuditLogIngestActoras central singleton andSiteAuditTelemetryActoras site singleton bound toaudit-telemetry-dispatcher. Manager + proxy pair for both. - Modify: Host HOCON (likely
src/ScadaLink.Host/Configuration/akka.confor similar) — addaudit-telemetry-dispatcher { type = ForkJoinDispatcher; parallelism-min = 1; parallelism-max = 2; }. - Modify:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs— register actorPropsfactories so Host can resolve them. - Create:
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs.
Steps:
- Failing test: starting the host with the audit module loaded produces healthy
IActorRefproxies for both singletons. - Failing test:
SiteAuditTelemetryActoris bound toaudit-telemetry-dispatcher(assert via Akka actor cell inspection or via a known-good dispatcher-tagged behaviour). - Implement.
- Run: pass.
- Commit:
feat(host): register AuditLog singletons with dedicated dispatcher.
M2-T10: ESG ExternalSystemClient.CallAsync audit emission
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs(syncCallAsyncaround line 45–70) — injectIAuditWritervia constructor. After the call completes (success OR exception), build anAuditEvent(channel=ApiOutbound, kind=SyncCall, status from outcome,DurationMs,HttpStatus, target = system+method, provenance fromScriptExecutionContext). Call_auditWriter.WriteAsync(evt)inside atry/catchthat swallows + logs + incrementsSiteAuditWriteFailures. - Modify:
src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs— acceptIAuditWriterfrom DI. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientAuditEmissionTests.cs.
Steps:
- Failing test: sync
CallAsyncsuccess → exactly one event withStatus=Success,Channel=ApiOutbound,Kind=SyncCall. - Failing test: sync
CallAsyncHTTP 500 →Status=TransientFailure,HttpStatus=500. - Failing test: sync
CallAsyncHTTP 400 →Status=PermanentFailure,HttpStatus=400. - Failing test: when
IAuditWriter.WriteAsyncthrows, the script call still completes normally and the script sees the original (non-audit) result. - Implement.
- Run: pass.
- Commit:
feat(esg): emit ApiOutbound.SyncCall audit event on every sync call.
M2-T11: SiteAuditWriteFailures health metric
Files:
- Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— add aSiteAuditWriteFailurescounter; expose it in the site health report payload. - Modify:
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs(M2-T4) — acceptIHealthMetrics(or the project's existing health counter abstraction) and increment per failed primary write. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SiteAuditWriteFailuresMetricTests.cs.
Steps:
- Failing test: 3 simulated SQLite failures → counter reports 3 in the next snapshot.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditWriteFailures metric.
M2-T12: End-to-end integration test
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/SyncCallEmissionTests.cs— boots a site + central pair via the existing IntegrationTests harness; deploys a tiny script that calls a stub external system; asserts the centralAuditLogtable has exactly one row with the expected channel/kind/status within 10s. - Possibly modify:
infra/reseed.shif integration tests need a fresh AuditLog table per run.
Steps:
- Sketch the test using existing IntegrationTests fixtures.
- Run: fail somewhere (gaps in earlier tasks surface here).
- Iterate fixes back through M2-T1..M2-T11 until end-to-end passes.
- Commit:
test(auditlog): end-to-end sync call emission integration test.
M2 — Risk callouts
- SiteStream proto evolution: adding a new top-level RPC is wire-compatible; confirm generated
Sitestream.csrebuilds cleanly and existing tests still pass. - Dedicated dispatcher misconfiguration: if
SiteAuditTelemetryActorlands on the script blocking-I/O dispatcher, scripts will starve during telemetry bursts. Add a runtime assertion inM2-T9that the actor's dispatcher matches expectation. - Script execution context plumbing: ESG emission (M2-T10) needs
SourceInstanceId/SourceScript; confirm these are reachable via the existingScriptExecutionContext(or equivalent in SiteRuntime) before starting M2-T10. - Integration-test DB isolation: target an isolated MS SQL database (or a dedicated schema) so the test doesn't clash with other integration tests.
M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
Goal: Cached external calls (ExternalSystem.CachedCall) and cached DB writes (Database.CachedWrite) produce four audit rows per operation (Kind=CachedSubmit Status=Submitted, Kind=ApiCallCached/DbWriteCached Status=Forwarded, Kind=ApiCallCached/DbWriteCached Status=Attempted × N, Kind=CachedResolve Status=Delivered|Failed|Parked|Discarded) AND populate the operational SiteCalls table at central — in one transaction at central, from a single combined telemetry packet.
M2 realities to honor:
- Vocabulary: use the M1-aligned enums. M3 will be the first code to populate
AuditKind.ApiCallCached,DbWriteCached,CachedSubmit,CachedResolve. The locked spec (alog.md + Component-AuditLog.md) was reconciled in the M1 merge.- Site→central gRPC client deferred to M6: M2 ships
NoOpSiteStreamAuditClientas the production default. Site SQLite rows accumulate asPendingforever in production until M6. M3 component tests should use Bundle H'sDirectActorSiteStreamAuditClientpattern (seetests/ScadaLink.AuditLog.Tests/Integration/SyncCallEmissionEndToEndTests.cs:277-340). Extract that helper intotests/ScadaLink.AuditLog.Tests/Integration/Infrastructure/so M3 cached-call E2E tests can reuse it without re-defining.- Mapper duplication:
SiteStreamGrpcServer.IngestAuditEventsinlines DTO→entity decoding (intentional, to avoid the AuditLog→Communication project-ref cycle). The mapper lives atsrc/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs. M3 should add a comment in both spots tying them together, OR move the mapper intosrc/ScadaLink.Commons/(project-ref clean) so both consumers can share it.AuditIngestAskTimeout = 30sis hardcoded inSiteStreamGrpcServer.cs:37. M3 may want to expose this viaCommunicationOptionsorAuditLogOptionsas central reconciliation/dual-write traffic grows.- CachedCallTelemetry message: per CLAUDE.md, the existing
CachedCallTelemetrymessage does not yet exist in code. M3 must create it from scratch (additively, per Commons REQ-COM-5a — DO NOT rename itCachedOperationTelemetry). It carries BOTH the AuditLog rows (4+) AND the SiteCalls upsert in one packet.- Dual-write transaction: central writes
AuditLog+SiteCallsin one MS SQL transaction. The repository'sInsertIfNotExistsAsyncswallows duplicates (M2 Bundle A fix); the SiteCalls upsert usesMERGE(or insert-if-not-exists then upsert-on-newer-status per CLAUDE.md). M3 must ensure the same Bundle A swallow pattern applies if duplicateCachedCallIdarrives.- AuditEvent ForwardState semantics in M3: cached-operation telemetry rows are site-emitted just like sync M2 rows, so the same site SQLite hot-path +
Pending→Forwardedlifecycle applies. The four lifecycle rows share a CorrelationId (the TrackedOperationId), but each is its own AuditEvent with a distinct EventId.
Affected projects: Commons, AuditLog, SiteCallAudit (new — minimum-viable surface), ConfigurationDatabase (new SiteCalls table migration), ExternalSystemGateway, StoreAndForward, Host. Tests across all of them + IntegrationTests.
Prerequisite call-out: This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — TrackedOperationId, site-local operation tracking SQLite, SiteCalls table at central, the existing-message CachedCallTelemetry (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
Acceptance criteria:
- New
SiteCallsMS SQL table + repo (no partitioning needed; this is operational state, not audit). - New
CachedCallTelemetrymessage in Commons carrying BOTH the cached-call operational fields AND anAuditEventpayload. - Site path:
CachedCallwrites the audit row to site SQLite (Kind = CachedEnqueued), creates the site operation-tracking row, and sends a combined telemetry packet. - Central path:
AuditLogIngestActor(extended) receives the combined packet, performs one transaction containing both theAuditLoginsert and theSiteCallsupsert. - Retry attempt →
Kind = CachedAttemptaudit row +SiteCallsstatus transition. Terminal →Kind = CachedTerminalaudit row +SiteCallsterminal status. - Integration test asserts: triggering a
CachedCallthat fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row withStatus = Delivered, all sharing the sameTrackedOperationIdcorrelation key.
M3 — Tasks (TDD-detail)
M3-T1: TrackedOperationId strong-typed ID
Files:
- Create:
src/ScadaLink.Commons/Types/TrackedOperationId.cs— readonly record struct wrappingGuid;New()/Parse(string)/ToString(). - Create:
tests/ScadaLink.Commons.Tests/Types/TrackedOperationIdTests.cs.
Steps:
- Failing test: round-trip via
ToString()/Parse()and equality semantics. - Implement.
- Run: pass.
- Commit:
feat(commons): TrackedOperationId strong type.
M3-T2: Site-local operation-tracking SQLite table + repo
Files:
- Create:
src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs— SQLite-backed store with columns:TrackedOperationId,Kind,TargetSummary,Status,RetryCount,LastError,CreatedAtUtc,UpdatedAtUtc,TerminalAtUtc, source provenance. Schema bootstrap on first use; uses the same write-lock pattern asSqliteAuditWriter. ImplementsIOperationTrackingStore(interface in Commons). - Create:
src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs—RecordEnqueueAsync,RecordAttemptAsync,RecordTerminalAsync,GetStatusAsync(TrackedOperationId),PurgeTerminalAsync(olderThanUtc). - Create:
tests/ScadaLink.SiteRuntime.Tests/Tracking/OperationTrackingStoreTests.cs.
Steps:
- Failing test: schema bootstrap creates the table.
- Failing test:
RecordEnqueueAsyncinserts aPendingrow;RecordAttemptAsyncupdatesStatus/RetryCount/LastError;RecordTerminalAsyncfinalises. - Failing test:
GetStatusAsyncreturns the latest snapshot (answersTracking.Status(id)site-locally). - Failing test:
PurgeTerminalAsyncremoves terminal rows older than threshold; non-terminal rows are kept regardless of age. - Implement.
- Run: pass.
- Commit:
feat(siteruntime): site-local operation tracking SQLite store.
M3-T3: Tracking.Status(id) API surface in SiteRuntime
Files:
- Modify:
src/ScadaLink.SiteRuntime/Scripting/TrackingApi.cs(new or existing — confirm via repo) — publicStatus(TrackedOperationId)method routed throughIOperationTrackingStore. - Modify: script trust-model allow-list to include the new
Tracking.*surface (confirm via grep). - Create:
tests/ScadaLink.SiteRuntime.Tests/Scripting/TrackingApiTests.cs.
Steps:
- Failing test:
Tracking.Status(unknownId)returns a documented "not found" sentinel. - Failing test:
Tracking.Status(knownId)returns the latest snapshot. - Implement.
- Run: pass.
- Commit:
feat(siteruntime): Tracking.Status(id) script API.
M3-T4: CachedCallTelemetry Commons message — carries both operational + audit content
Files:
- Create:
src/ScadaLink.Commons/Messages/Integration/CachedCallTelemetry.cs— fields:TrackedOperationId,Kind(CachedEnqueued/CachedAttempt/CachedTerminalaudit kind), operational status, retry count, last error, timestamps, and a nestedAuditEventcarrying the audit row content. Documented as additive-only per Commons REQ-COM-5a. - Create:
tests/ScadaLink.Commons.Tests/Messages/Integration/CachedCallTelemetryTests.cs.
Steps:
- Failing test: construct a telemetry packet for each of the three lifecycle kinds; verify the nested AuditEvent's channel/kind alignment (e.g., a
CachedAttemptpacket must carry anAuditEventwithKind = CachedAttempt). - Failing test: serialization round-trip preserves both layers.
- Implement.
- Run: pass.
- Commit:
feat(commons): CachedCallTelemetry carrying combined operational + audit content.
M3-T5: SiteCalls MS SQL table — EF mapping
Files:
- Create:
src/ScadaLink.Commons/Entities/Audit/SiteCall.cs— POCO record per Component-SiteCallAudit.md. - Create:
src/ScadaLink.ConfigurationDatabase/Entities/SiteCallEntityTypeConfiguration.cs—IEntityTypeConfiguration<SiteCall>with PK onTrackedOperationId, indexes on(SourceSite, CreatedAtUtc)and(Status, UpdatedAtUtc). - Modify:
ScadaLinkDbContext.cs—public DbSet<SiteCall> SiteCalls => Set<SiteCall>();. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Entities/SiteCallEntityTypeConfigurationTests.cs.
Steps:
- Failing test: model exposes
SiteCallstable with documented columns and indexes. - Implement.
- Run: pass.
- Commit:
feat(configdb): map SiteCall to SiteCalls table.
M3-T6: SiteCalls migration
Files:
- Create:
src/ScadaLink.ConfigurationDatabase/Migrations/<ts>_AddSiteCallsTable.csviadotnet ef migrations add AddSiteCallsTable. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddSiteCallsTableMigrationTests.cs.
Steps:
- Failing test: applying the migration creates the
SiteCallstable with PK + indexes. - Generate + adjust migration.
- Run: pass.
- Commit:
feat(configdb): add SiteCalls migration.
M3-T7: ISiteCallAuditRepository + EF impl
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs—UpsertAsync(SiteCall)(insert-if-not-exists byTrackedOperationId, otherwise update-on-newer-status using monotonic status progression),GetAsync(TrackedOperationId),QueryAsync(filter, paging),PurgeTerminalAsync(olderThanUtc). - Create:
src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs. - Modify:
ServiceCollectionExtensions.cs— register. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs.
Steps:
- Failing test: first
UpsertAsyncinserts; secondUpsertAsyncwith an advanced status updates; anUpsertAsyncwith an older status is a no-op (monotonic progression). - Failing test: paged query supports the documented filter set.
- Implement.
- Run: pass.
- Commit:
feat(configdb): ISiteCallAuditRepository + EF impl.
M3-T8: SiteCallAuditActor skeleton (central singleton)
Files:
- Create:
src/ScadaLink.SiteCallAudit/(new project) —SiteCallAuditActor.cs+ScadaLink.SiteCallAudit.csproj+ServiceCollectionExtensions.cs. Actor handlesUpsertSiteCallCommandmessages by callingISiteCallAuditRepository.UpsertAsync. Note: full reconciliation, KPIs, and Retry/Discard relay are explicitly deferred — this is the minimum-viable surface for M3. - Modify:
ScadaLink.slnxto include the new project. - Create:
tests/ScadaLink.SiteCallAudit.Tests/SiteCallAuditActorTests.cs.
Steps:
- Failing test: actor receives
UpsertSiteCallCommand, calls repo, replies with ack. - Failing test: actor swallows transient DB errors and surfaces them as health metrics (does NOT crash the central singleton).
- Implement.
- Run: pass.
- Commit:
feat(scaudit): SiteCallAuditActor minimum viable surface.
M3-T9: Extend sitestream.proto with IngestCachedTelemetry RPC OR extend IngestAuditEvents
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— preferred approach: add a new top-level RPCrpc IngestCachedTelemetry(CachedTelemetryBatch) returns (IngestAck);and amessage CachedTelemetryPacket { AuditEventDto audit_event = 1; SiteCallOperationalDto operational = 2; }plusmessage CachedTelemetryBatch { repeated CachedTelemetryPacket packets = 1; }. Decision should be confirmed during M3's brainstorm. - Build to regenerate.
- Create:
tests/ScadaLink.Communication.Tests/Protos/CachedTelemetryProtoTests.cs.
Steps:
- Failing test: round-trip a populated
CachedTelemetryPacket. - Add proto + rebuild.
- Run: pass.
- Commit:
feat(comms): IngestCachedTelemetry RPC + combined telemetry messages.
M3-T10: Extend AuditLogIngestActor for combined telemetry — dual-write transaction
Files:
- Modify:
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs— add a handler for the cached telemetry message. Inside a singleDbContexttransaction: (a) callIAuditLogRepository.InsertIfNotExistsAsync(auditEvent), then (b) callISiteCallAuditRepository.UpsertAsync(operationalState). Both must succeed or both must roll back. - Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs— route the new RPC to the central actor. - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorCombinedTelemetryTests.cs.
Steps:
- Failing test: a single combined packet produces one AuditLog row AND one SiteCalls row (or upsert).
- Failing test: when the SiteCalls upsert throws, the AuditLog insert is rolled back (no orphan rows).
- Failing test: when the AuditLog insert is a no-op (duplicate
EventId), the SiteCalls upsert still runs. - Failing test: when both rows already exist with monotonic-equal statuses, the operation is a no-op overall (full idempotency).
- Implement.
- Run: pass.
- Commit:
feat(auditlog): combined telemetry dual-write transaction.
M3-T11: ESG CachedCallAsync — emit CachedEnqueued on enqueue
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:75–136(cached call) — at the moment of buffering into S&F: build anAuditEvent(channel=ApiOutbound, kind=CachedEnqueued) AND aSiteCallOperationalDto(status=Pending); package as aCachedTelemetryPacket; hand to the combined-telemetry forwarder. - Modify:
src/ScadaLink.ExternalSystemGateway/Cached/CachedCallTelemetryForwarder.cs(new) — accumulates packets and posts toSiteAuditTelemetryActor(or a sibling actor — decision in milestone brainstorm). - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/CachedCallEnqueueEmissionTests.cs.
Steps:
- Failing test: an enqueued cached call produces exactly one packet with
kind=CachedEnqueued. - Implement.
- Run: pass.
- Commit:
feat(esg): CachedCall emits CachedEnqueued combined telemetry on buffering.
M3-T12: ESG CachedCallAsync — emit CachedAttempt per retry
Files:
- Modify:
src/ScadaLink.StoreAndForward/retry loop (locate the per-attempt callback site) to emit aCachedAttemptpacket on each attempt (success OR transient failure). - Create:
tests/ScadaLink.StoreAndForward.Tests/CachedCallAttemptEmissionTests.cs.
Steps:
- Failing test: an attempt that returns HTTP 500 produces a packet with
kind=CachedAttempt,status=TransientFailure,HttpStatus=500. - Failing test: a successful attempt produces a packet with
kind=CachedAttempt,status=Success,HttpStatus=200. - Implement.
- Run: pass.
- Commit:
feat(snf): CachedCall emits CachedAttempt per retry.
M3-T13: ESG CachedCallAsync — emit CachedTerminal on terminal state
Files:
- Modify: same retry-loop terminal-transition site — on
Delivered/Failed/Parked/Discarded, emit one finalCachedTerminalpacket. - Create:
tests/ScadaLink.StoreAndForward.Tests/CachedCallTerminalEmissionTests.cs.
Steps:
- Failing test: a cached call that succeeds on attempt 3 produces (in order): 1
CachedEnqueued, 3CachedAttempt, 1CachedTerminal(withstatus=Delivered). - Failing test: a cached call that exhausts retries produces a final
CachedTerminalwithstatus=Parked. - Implement.
- Run: pass.
- Commit:
feat(snf): CachedCall emits CachedTerminal on lifecycle terminal.
M3-T14: Database.CachedWrite — mirror the three-lifecycle emission for DB cached writes
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs(or equivalent — confirm via repo) — same three-event emission pattern as ESG cached calls, butchannel=DbOutbound. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/CachedWriteLifecycleEmissionTests.cs.
Steps:
- Failing test: a
CachedWritethat succeeds first try producesCachedEnqueued+CachedAttempt(Success)+CachedTerminal(Delivered). - Failing test: a
CachedWritewith transient retry mirrors the ESG pattern. - Implement.
- Run: pass.
- Commit:
feat(esg): Database.CachedWrite emits three-lifecycle combined telemetry.
M3-T15: Host registration — SiteCallAuditActor central singleton
Files:
- Modify:
src/ScadaLink.Host/Actors/AkkaHostedService.cs— registerSiteCallAuditActorcentral singleton + proxy alongsideAuditLogIngestActor. - Modify:
src/ScadaLink.SiteCallAudit/ServiceCollectionExtensions.cs— register actor props. - Modify:
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs— extend to assertSiteCallAuditActorproxy resolves.
Steps:
- Failing test: starting host produces the new singleton's proxy.
- Implement.
- Run: pass.
- Commit:
feat(host): register SiteCallAuditActor central singleton.
M3-T16: Integration test — cached external call audit (end-to-end)
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs— site + central; stub external system returns 500 twice then 200; script invokesExternalSystem.CachedCall("System","Method", args); assert AuditLog has 5 rows (Enqueued + 3 Attempts + Terminal) AND SiteCalls has 1 row withStatus=DeliveredANDTracking.Status(id)reports the same.
Steps:
- Sketch test against IntegrationTests harness.
- Run: fail (likely surfacing earlier-task gaps).
- Iterate fixes until pass.
- Commit:
test(auditlog): cached call combined telemetry end-to-end.
M3-T17: Integration test — cached DB write audit (end-to-end)
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CachedWriteCombinedTelemetryTests.cs— mirror M3-T16 against the DB cached path.
Steps:
- Sketch.
- Iterate.
- Commit:
test(auditlog): cached DB write combined telemetry end-to-end.
M3-T18: Idempotency test — duplicate telemetry doesn't double-insert / double-upsert
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CombinedTelemetryIdempotencyTests.cs— force the same packet to arrive twice (simulated telemetry retry); assert AuditLog still has exactly one row and SiteCalls upsert is monotonic.
Steps:
- Sketch.
- Pass.
- Commit:
test(auditlog): combined telemetry idempotency on retried packets.
M3 — Risk callouts
- Combined telemetry packet evolution: design the proto so future cached audit-kind additions are non-breaking (avoid
oneoffor fields you'll extend; use sparse field numbers). - Dual-write transaction failure modes: the single
DbContexttransaction at central spans two tables; ensure retry behaviour on transient connection errors works as expected (existingIDbExecutionStrategypatterns may apply). - Idempotency cross-table: AuditLog dedups on
EventId, SiteCalls dedups onTrackedOperationIdwith status-monotonic update. A retried packet whose AuditLog row exists must still upsert SiteCalls (no short-circuit). - Scope discipline: M3 inlines the minimum surface for #22 and cached-call tracking. Full #22 reconciliation, KPIs, and Retry/Discard relay are deferred. Note in the milestone brainstorm whether any extra #22 surface is genuinely needed for M3 acceptance criteria — if not, defer aggressively.
Tracking.Statussemantics: confirmed authoritative site-locally per design; no central round-trip. Ensure the test in M3-T3 reflects this.
M4 — Remaining boundary emission
Goal: Every channel × kind from Component-AuditLog.md produces a row when its boundary call fires.
Affected projects: ExternalSystemGateway (sync DB writes/reads, cached DB writes), SiteRuntime (Database surface exposing them), NotificationOutbox (central direct-write of Attempt/Terminal), InboundAPI (middleware). Tests across all.
M3 realities to honor:
- Vocabulary: use the M1-aligned enums. The roadmap's old
SyncWrite/SyncRead/Notification.Attempt/Notification.Terminal/Notification.Enqueued/ApiInbound.Completed/PermanentFailurestrings are pre-M1 spec wording — DO NOT use those names in code. Translation:
- sync DB write/read →
AuditKind.DbWrite(Channel=DbOutbound); distinguish read vs write viaExtra(e.g.,{"op": "read", "rowsReturned": 42}).- notification delivery attempt →
AuditKind.NotifyDeliverwithAuditStatus.Attempted.- notification delivery terminal →
AuditKind.NotifyDeliverwithAuditStatus.Delivered|Failed|Parked|Discarded.- notification submit (site-emit) →
AuditKind.NotifySendwithAuditStatus.Submitted.- inbound API success →
AuditKind.InboundRequestwithAuditStatus.Delivered.- inbound API auth failure →
AuditKind.InboundAuthFailurewithAuditStatus.Failed.- "permanent failure" →
AuditStatus.Failed. "Transient failure" never lands a terminal row.- Mapper consolidation: M3 surfaced 4 DTO mappers (AuditEventMapper, SiteStreamGrpcServer inline, SiteCall DTO mapper, DirectActorSiteStreamAuditClient test stub). M4 should extract a single
IntegrationMappershelper insrc/ScadaLink.Commons/Messages/Integration/or similar to consolidate before adding more channels. The project-ref cycle that motivated the inline duplication can be broken by moving the mapper into Commons (proto types are auto-generated in Communication; the mapper just needs the proto types reachable from Commons via a transitive ref).OnCachedTelemetryWithoutDualWriteAsynctest-mode fallback: inAuditLogIngestActorfor the single-repo ctor. M4 may deprecate the single-repo constructor entirely and migrate tests to the IServiceProvider+harness pattern.- Site SQLite drain for OperationTrackingStore: M3 wrote the tracking half site-locally but no drain pipeline pushes it to central — central reads SiteCalls operational state via the dual-write transaction only. If M4 needs central visibility into in-flight (non-terminal) tracking entries, plan a drain.
SiteCallAuditActor: wired in M3 as a cluster singleton + proxy but not on the M3 hot path. M4 (or M6 reconciliation) is the natural first direct caller — wire one production code path through it.- Vocabulary correction in the body of M4 below: every M4-T*1-N step that still says
Status=PermanentFailure,Kind=SyncWrite/SyncRead/Completed/Attempt/Terminal/Enqueuedis stale; apply the translation above when implementing.
Acceptance criteria:
- Sync
Database.Connection().Execute()→DbOutbound.DbWriterow (withExtra.op = "write"androwsAffected);ExecuteReader→DbOutbound.DbWriterow (withExtra.op = "read"androwsReturned). Parameter values captured by default; per-connection redaction opt-in supported. Database.CachedWrite→ three lifecycle rows via the combined telemetry built in M3.- Notification Outbox dispatcher: every delivery attempt writes
NotifyDeliverwithStatus=Attempted; terminal writesNotifyDeliverwithStatus={Delivered|Failed|Parked|Discarded}. Site-emittedNotifySend(Status=Submitted) flows through the standard site→central audit path. Audit-write failure never affects delivery. - Inbound API middleware writes one
ApiInbound.InboundRequestrow per request, beforeawait next()returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response. Auth failures emitApiInbound.InboundAuthFailurewithStatus=Failed.
M4 — Tasks (TDD-detail)
M4-T1: ESG Database.Connection().ExecuteAsync audit emission — DbOutbound.SyncWrite
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs(or wherever the script-facingExecute*lives — confirm via repo) — wrap the call site to emit anAuditEvent(channel=DbOutbound, kind=SyncWrite) on everyExecute/ExecuteScalar. Capture statement text, parameter values (default; redaction in M5),DurationMs,rowsAffectedinExtra. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncWriteEmissionTests.cs.
Steps:
- Failing test:
Execute("INSERT INTO ...", new {...})emits one event withChannel=DbOutbound,Kind=SyncWrite, statement text + parameter values captured. - Failing test:
ExecuteScalaremits the same kind. - Failing test: execute that throws → emission with
Status=PermanentFailure,ErrorMessagepopulated. - Failing test: audit-write failure does NOT abort the SQL call (script sees the original outcome).
- Implement.
- Run: pass.
- Commit:
feat(esg): emit DbOutbound.SyncWrite on script-initiated Execute*.
M4-T2: ESG Database.Connection().ExecuteReaderAsync audit emission — DbOutbound.SyncRead
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs— wrapExecuteReaderto emitDbOutbound.SyncRead. Capture statement, parameter values,DurationMs,rowsReturnedinExtra. Response body capture defaults to NOT including rows; opt-in via per-connection config (M5). - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncReadEmissionTests.cs.
Steps:
- Failing test:
Query<T>("SELECT ...")emits one event withChannel=DbOutbound,Kind=SyncRead. - Failing test:
rowsReturnedappears inExtra. - Implement.
- Run: pass.
- Commit:
feat(esg): emit DbOutbound.SyncRead on script-initiated reads.
M4-T3: NotificationOutboxActor — inject ICentralAuditWriter
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:22–68— constructor acceptsICentralAuditWriter. Wire into DI inServiceCollectionExtensions.cs. - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAuditInjectionTests.cs.
Steps:
- Failing test: actor's
Propsfactory accepts anICentralAuditWriter; constructor stores it. - Implement.
- Run: pass.
- Commit:
feat(notif): NotificationOutboxActor accepts ICentralAuditWriter.
M4-T4: NotificationOutboxActor — emit Notification.Attempt per dispatcher attempt
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csdispatcher attempt branch (after each delivery attempt resolves) — emitNotification.Attemptrow withStatusmapped from attempt result (Success,TransientFailure,PermanentFailure). - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAttemptEmissionTests.cs.
Steps:
- Failing test: a successful attempt → exactly one event with
Kind=Attempt,Status=Success. - Failing test: a transient-failure attempt →
Status=TransientFailure,ErrorMessagepopulated. - Failing test: when
ICentralAuditWriter.WriteAsyncthrows, the dispatcher's per-attemptNotificationsrow update STILL succeeds (audit must never block delivery). - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Attempt per dispatcher attempt.
M4-T5: NotificationOutboxActor — emit Notification.Terminal on terminal transition
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csterminal branches (Delivered/Parked/Discardedtransitions) — emitNotification.Terminalrow. - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorTerminalEmissionTests.cs.
Steps:
- Failing test: a notification that succeeds emits one
Terminalevent withStatus=Delivered. - Failing test: a Parked transition emits
Status=Parked. - Failing test: an operator Discard emits
Status=Discarded. - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Terminal on terminal transitions.
M4-T6: Site-emitted Notification.Enqueued
Files:
- Modify:
src/ScadaLink.NotificationService/(or wherever the site-sideNotify.To().Send()runs — confirm via repo) — at the moment of buffering into the site S&F: emit a site-sideAuditEvent(channel=Notification, kind=Enqueued) viaIAuditWriter. Telemetry forwards as usual. - Create:
tests/ScadaLink.NotificationService.Tests/NotifyEnqueueAuditEmissionTests.cs.
Steps:
- Failing test:
Notify.To("list").Send("subject", "body")emits one event withChannel=Notification,Kind=Enqueued, target=list name, body captured (subject too). - Failing test: audit-write failure does not abort
Send(). - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Enqueued from site-side Notify.Send.
M4-T7: Inbound API — AuditWriteMiddleware
Files:
- Create:
src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs— ASP.NET Core middleware. Afterawait next()(so the response is fully resolved but BEFORE flush — usingHttpResponse.OnStartingor buffered body), build anAuditEvent(channel=ApiInbound, kind=Completed,Actor=API key NAME from request context,Target=method name,HttpStatus,DurationMs,RequestSummary/ResponseSummary). CallICentralAuditWriter.WriteAsyncinsidetry/catch— failures never affect the response. - Modify:
src/ScadaLink.InboundAPI/Startup.cs(or wherever the pipeline is configured) — register middleware. - Create:
tests/ScadaLink.InboundAPI.Tests/Middleware/AuditWriteMiddlewareTests.cs.
Steps:
- Failing test: a successful POST to
/api/{method}produces oneApiInbound.Completedevent withHttpStatus=200. - Failing test: a 400/401/500 response produces an event with the matching
HttpStatusandStatusmapped (PermanentFailurefor 4xx,TransientFailurefor 5xx). - Failing test:
Actorcarries the API key NAME (never the key material). - Failing test: when
ICentralAuditWriter.WriteAsyncthrows, the HTTP response is unchanged (success stays success). - Failing test: request remote IP and User-Agent appear in
Extra. - Implement.
- Run: pass.
- Commit:
feat(inbound): AuditWriteMiddleware emitting ApiInbound.Completed per request.
M4-T8: Register middleware in the ASP.NET pipeline
Files:
- Modify:
src/ScadaLink.InboundAPI/Startup.cs/Program.cs—app.UseMiddleware<AuditWriteMiddleware>()placed AFTER auth (soActorresolves) and BEFORE the script-execution handler. - Create:
tests/ScadaLink.InboundAPI.Tests/Middleware/MiddlewareOrderTests.cs.
Steps:
- Failing test: pipeline ordering puts AuditWrite after auth, before script execution.
- Implement.
- Run: pass.
- Commit:
feat(inbound): register AuditWriteMiddleware in pipeline.
M4-T9: Integration test — DB sync emission
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/DatabaseSyncEmissionTests.cs— script invokesDatabase.Connection().Execute("INSERT ...")andQuery<T>("SELECT ..."); assert central AuditLog has oneDbOutbound.SyncWriterow and oneDbOutbound.SyncReadrow.
Steps:
- Sketch, iterate, commit:
test(auditlog): DB sync emission integration test.
M4-T10: Integration test — Notify dispatcher audit trail
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/NotifyDispatcherAuditTrailTests.cs— script callsNotify.To(list).Send(...); stub SMTP returns transient then success; assert AuditLog hasEnqueued+ 2Attempt(one transient, one success) + 1Terminal(Delivered).
Steps:
- Sketch, iterate, commit:
test(auditlog): Notify dispatcher audit trail end-to-end.
M4-T11: Integration test — Inbound API request audit
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/InboundApiAuditTests.cs— POST to/api/{method}with a valid API key; assert oneApiInbound.Completedrow with the expectedActor(key name),HttpStatus=200, request/response bodies captured. - Also test: POST with a bad API key → row with
Actor=NULL(or ""),HttpStatus=401,ExtracarriesremoteIp.
Steps:
- Sketch, iterate, commit:
test(auditlog): Inbound API request audit end-to-end.
M4-T12: Integration test — audit-write failure never aborts the action
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/AuditWriteFailureSafetyTests.cs— inject a brokenICentralAuditWriter(always throws) for one test; assert that ESG sync calls, ESG cached calls, DB writes, Inbound API calls, and Notification dispatch all still complete successfully and the script/caller sees the normal outcome.
Steps:
- Sketch test with broken-writer DI override per scenario.
- Run, fix any spots where audit-write exceptions leak.
- Commit:
test(auditlog): audit failures never abort user-facing actions.
M4 — Risk callouts
- Inbound API correlation IDs: if upstream tracing headers (W3C
traceparent) are present, prefer them asCorrelationId; otherwise generate. Confirm whether existing middleware sets a request ID we can reuse. AuditWriteMiddlewareplacement: must run AFTER authentication so the API key NAME is inHttpContext.User. Verify with the middleware-order test in M4-T8.- Notification dispatcher loop hot-path: audit emission must NOT extend per-attempt latency materially. Bench in M4-T10 if there's any concern.
- DB parameter capture: parameter values are captured verbatim by default (per design); redaction is opt-in (M5). For M4, just capture — don't try to second-guess what's sensitive.
M5 — Payload + redaction policy
M4 realities to honor:
- Decorator surfaces to filter:
AuditingDbConnection/AuditingDbCommand/AuditingDbDataReader(Bundle A) emitRequestSummaryas raw SQL + parameters today. M5'sIAuditPayloadFilterruns between event construction and writer call; the AuditingDb decorators must call into the filter beforeWriteAsync.- CentralAuditWriter wraps
IAuditLogRepository.InsertIfNotExistsAsync(Bundle B). M5 should plug the filter into BOTH the site-sideFallbackAuditWriterand the central-sideCentralAuditWriterso direct-write paths (NotificationOutboxActor, AuditWriteMiddleware) are also filtered. Plugin location: in each writer'sWriteAsyncBEFORE the storage call.- InboundAPI middleware
RequestSummaryalready populates,ResponseSummary = null(Bundle D punted response-body capture). M5 should add response-body buffering OR document that ResponseSummary stays null for v1 (acceptable per the spec — captures are best-effort).AuditWriteMiddlewarepath-scoped viaUseWhen(/api/)— M5 may want to introduce per-target redaction overrides; that path-scoped setup gives a natural hook for per-route redaction (e.g.,/api/secrets/*has stricter caps).- Error-row vocabulary: cap raised to 64 KB on rows with
Status NOT IN ('Delivered', 'Submitted', 'Forwarded'). The new vocabulary (Failed/Parked/Discarded/Attempted/Skipped) is what triggers the elevated cap. NOT "non-Success" wording from the original spec.- InternalsVisibleTo precedent: AuditLog.Tests can reach internals of SiteRuntime + NotificationOutbox + (newly) AuditLog. M5 redaction tests can exercise internal helpers similarly.
Goal: Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
Affected projects: AuditLog (policy engine + options), ExternalSystemGateway (HTTP header redactors, SQL param redaction hook), InboundAPI (header redactors, body capture), NotificationOutbox (subject/body capture follows existing rules). Tests.
Acceptance criteria:
- A
IAuditPayloadFilterservice is invoked between event construction and write. Truncates to default cap; raises to error cap on non-Successrows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and incrementsAuditRedactionFailure. - Configuration test: changing
appsettings.jsonredactors changes runtime behaviour (no rebuild needed for regex changes). - Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
M5 — Tasks (TDD-detail)
M5-T1: IAuditPayloadFilter interface
Files:
- Create:
src/ScadaLink.AuditLog/Payload/IAuditPayloadFilter.cs— single methodAuditEvent Apply(AuditEvent rawEvent)that returns a filtered copy (truncation + redaction applied). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/PayloadFilterContractTests.cs.
Steps:
- Failing test: interface exists, method signature matches.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): IAuditPayloadFilter contract.
M5-T2: DefaultAuditPayloadFilter — truncation (default + error cap)
Files:
- Create:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— composesTruncationStage+ redactors (M5-T3/T4/T5). Truncation rule: default cap =AuditLogOptions.DefaultCapBytes(8 KB); error cap =ErrorCapBytes(64 KB) applied whenStatusis NOT in {Success,Delivered,Enqueued}. UTF-8 byte-safe boundary (no mid-character cuts). SetPayloadTruncated = truewhen applied. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/TruncationTests.cs.
Steps:
- Failing test: 10 KB success body → truncated to 8 KB;
PayloadTruncated = true. - Failing test: 10 KB body on
Status=TransientFailure→ not truncated (under 64 KB cap);PayloadTruncated = false. - Failing test: 70 KB body on
Status=PermanentFailure→ truncated to 64 KB;PayloadTruncated = true. - Failing test: multi-byte UTF-8 character that would straddle the cap is not split mid-character.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety.
M5-T3: HTTP header redaction
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— add header-redaction stage. Strips header values for names inAuditLogOptions.HeaderRedactList(default:Authorization,Cookie,Set-Cookie,X-API-Key) and any matching configured regex. Replacement:<redacted>. - Headers travel in
RequestSummary/ResponseSummary(JSON of headers + body) OR inExtra— confirm format during M5 brainstorm and document. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/HeaderRedactionTests.cs.
Steps:
- Failing test:
Authorization: Bearer xyzinRequestSummarybecomesAuthorization: <redacted>. - Failing test: case-insensitive match (
authorizationredacted too). - Failing test: custom redact-list extension works (operator adds
X-Custom-Token). - Implement.
- Run: pass.
- Commit:
feat(auditlog): HTTP header redaction.
M5-T4: Body regex redaction with safety net
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— add body-regex stage. Global redactors apply to all bodies; per-target redactors apply to matchingTarget. Patterns precompiled at startup; rejected if compile takes >100ms. - Safety net: if a regex throws at runtime, replace the body with
<redacted: redactor error>and incrementAuditRedactionFailure(M5-T7). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/BodyRegexRedactionTests.cs.
Steps:
- Failing test:
"password":"hunter2"in a JSON body →"password":"<redacted>"when the default global redactor pattern matches. - Failing test: per-target redactor only applies to matching
Target. - Failing test: a redactor that throws → body becomes
<redacted: redactor error>AND the counter increments. - Failing test: catastrophic backtracking regex rejected at startup.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): body regex redaction with over-redaction safety net.
M5-T5: SQL parameter redaction (per-connection opt-in)
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— forChannel=DbOutboundevents, parseExtra.paramsand redact parameter VALUES whose NAME matches the connection's configured regex (fromAuditLogOptions.PerTargetOverrides[<connection name>].RedactSqlParamsMatching). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/SqlParamRedactionTests.cs.
Steps:
- Failing test: no opt-in config → params captured verbatim (default behaviour).
- Failing test: opt-in regex
@apikey|@tokenredacts those param VALUES but keeps OTHER param values intact. - Failing test: regex applies to parameter NAMES (not values) and is case-insensitive.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): per-connection SQL parameter redaction (opt-in).
M5-T6: Wire filter into emission paths
Files:
- Modify: ESG (M2-T10, M3-T11/12/13, M4-T1/T2), InboundAPI middleware (M4-T7), NotificationOutbox (M4-T4/T5), NotificationService site path (M4-T6) — every emission site receives
IAuditPayloadFilterfrom DI and callsfilter.Apply(rawEvent)before handing to the writer. - Modify:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs— registerDefaultAuditPayloadFilterasIAuditPayloadFiltersingleton. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/FilterIntegrationTests.cs— assert each emitter calls through the filter before the writer.
Steps:
- Failing test: ESG emission writes the filter-applied event (not the raw one).
- Failing test: same for each other emitter.
- Implement by injecting the filter into each emitter and routing through it.
- Run: pass.
- Commit:
feat(auditlog): wire payload filter into all emission paths.
M5-T7: AuditRedactionFailure health metric
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs(or equivalent) — addAuditRedactionFailurecounter. - Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— increment on every redactor exception. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/AuditRedactionFailureMetricTests.cs.
Steps:
- Failing test: 5 redactor exceptions → counter shows 5.
- Implement.
- Run: pass.
- Commit:
feat(health): AuditRedactionFailure metric.
M5-T8: Configuration test — appsettings.json round-trip
Files:
- Create:
tests/ScadaLink.AuditLog.Tests/Configuration/AuditLogOptionsBindingTests.cs— bind a realisticappsettings.jsonblock (with header-redact list, body redactors, per-target overrides, retention) and assert values appear inIOptions<AuditLogOptions>. Re-bind with a hot-reload simulation and assert filter behaviour changes accordingly.
Steps:
- Failing test: bind + read → matches.
- Failing test: change config → filter behaviour updates without restart (
IOptionsMonitorpattern). - Implement (likely needs adjusting M1-T9 from
IOptionstoIOptionsMonitor). - Run: pass.
- Commit:
feat(auditlog): hot-reloadable AuditLogOptions.
M5-T9: Performance test — hot-path latency budget
Files:
- Create:
tests/ScadaLink.PerformanceTests/AuditLog/HotPathLatencyTests.cs— benchfilter.Apply(event)for a 4 KB JSON body with the default redactor set; target P95 < 50 µs (number set during M5 brainstorm based on baseline measurements). Also benchSqliteAuditWriter.WriteAsyncend-to-end target P95 < 500 µs.
Steps:
- Sketch test using BenchmarkDotNet or the existing performance test harness.
- Run baseline; if over budget, profile + optimise.
- Commit:
test(auditlog): hot-path latency budget.
M5-T10: Safety-net test — bad regex over-redacts
Files:
- Create:
tests/ScadaLink.AuditLog.Tests/Payload/RedactionSafetyNetTests.cs— register a deliberately bad regex that throws; assert the body is over-redacted (<redacted: redactor error>) rather than under-redacted (passing through unmodified).
Steps:
- Failing test.
- Verify the safety net from M5-T4 covers this.
- Commit:
test(auditlog): redaction safety net over-redacts on regex failure.
M5 — Risk callouts
- Regex catastrophic backtracking: validate patterns at startup with a short-running compile test; reject patterns that exceed a timeout. Document the rejection behaviour.
- Order of stages matters: truncation BEFORE redaction means a redaction target halfway through the cap could get cut. Confirm the chosen order during M5 brainstorm; current draft applies redaction FIRST, then truncation — that way the redacted-replacement text is what gets truncated, not a half-secret.
- Body capture format: decide whether headers travel in
RequestSummary/ResponseSummaryorExtra. Affects M5-T3's redaction strategy. Lock during the M5 brainstorm. - Hot-reload semantics:
IOptionsMonitorsnapshots — ensure pre-compiled regex cache invalidates when config changes.
M6 — Reconciliation, purge, partition maintenance, health metrics
M5 realities to honor:
- IOptionsMonitor pattern works for hot reload (M5 verified via tests). M6's retention/partition cadence options can use the same pattern.
- AuditRedactionFailure counter is wired SITE-ONLY (M5 Bundle C deferred central wiring here). M6 ships the central wiring as part of "all five new health metrics live".
- Filter pattern is integrated at the THREE writer entry points (FallbackAuditWriter, CentralAuditWriter, AuditLogIngestActor). M6's AuditLogPurgeActor does NOT emit events; it only reads + partitions, so no filter integration required.
SwitchOutPartitionAsyncblocked by UX_AuditLog_EventId: per the M1 reality note, M6-T4 must replace the M1 NotSupportedException stub with the drop-and-rebuild dance around the non-aligned unique index (already in the roadmap M6-T4 section).- Partition function pre-seeded with 24 monthly boundaries (Jan 2026 – Dec 2027). M6-T5's partition maintenance must SPLIT a new boundary for the upcoming month.
- Site→central gRPC client still NoOpSiteStreamAuditClient: M6's
SiteAuditReconciliationActoris naturally the first central component that needs to pull from sites; alternatively, the production push path can ship here. EITHER (a) M6 implements the realISiteStreamAuditClientto enable push telemetry (and reconciliation pulls leverage it bidirectionally), OR (b) M6 implements ONLY reconciliation pull (push stays NoOp until a later milestone). Recommended (a) — push is more general and reconciliation is the catch-up.
Goal: Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
Affected projects: AuditLog (3 new actors: SiteAuditReconciliationActor, AuditLogPurgeActor, partition-maintenance worker), Communication (the PullAuditEvents RPC), HealthMonitoring (5 new metrics), ConfigurationDatabase (partition-roll-forward SQL helper).
Acceptance criteria:
SiteAuditReconciliationActorruns every 5 minutes per site; pulls events the site reports asPending; central performsInsertIfNotExistsAsyncthen signals the site to flip those rows toReconciled.AuditLogPurgeActorruns daily; for each partition older thanRetentionDays, switches it out to a staging table and drops the staging table. Emits anAuditLog:Purgedevent with rowcount + duration.- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
- 5 new health metrics published per site:
SiteAuditBacklog(count + oldest + bytes),SiteAuditWriteFailures,SiteAuditTelemetryStalled; and per central node:CentralAuditWriteFailures,AuditRedactionFailure. - Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
M6 — Tasks (TDD-detail)
M6-T1: Extend sitestream.proto with PullAuditEvents RPC
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— addrpc PullAuditEvents(PullAuditEventsRequest) returns (PullAuditEventsResponse);and the corresponding request/response messages (sinceUtc,batchSize,events,more_available). - Build: regenerate stubs.
- Create:
tests/ScadaLink.Communication.Tests/Protos/PullAuditEventsProtoTests.cs.
Steps:
- Failing test: round-trip request and response messages.
- Add proto + rebuild.
- Run: pass.
- Commit:
feat(comms): PullAuditEvents RPC for audit reconciliation.
M6-T2: Site-side handler for PullAuditEvents
Files:
- Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs(the site-side server inside each site cluster) — handlePullAuditEventsby readingPendingrows older thanSinceUtcfromSqliteAuditWriter(read-only path) and streaming them back. After ack, mark themReconciled. - Create:
tests/ScadaLink.Communication.Tests/SiteStreamPullAuditEventsTests.cs.
Steps:
- Failing test: a pull request with N pending rows returns those rows; rows flip to
Reconciledafter the response is acked. - Implement.
- Run: pass.
- Commit:
feat(comms): site-side PullAuditEvents handler.
M6-T3: SiteAuditReconciliationActor — central, timer-driven
Files:
- Create:
src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs— central singleton; on a 5-minute timer (configurable), for each known site, ask: "what's your oldestPendingrow?" If the site reports a non-draining backlog (compared with the previous tick), issue aPullAuditEventsand ingest the returned rows viaIAuditLogRepository.InsertIfNotExistsAsync. Keeps a per-siteLastReconciledAtcursor. - Create:
tests/ScadaLink.AuditLog.Tests/Central/SiteAuditReconciliationActorTests.cs.
Steps:
- Failing test: actor's timer fires every 5 minutes (test via
TestKitvirtual scheduler). - Failing test: when site reports non-draining backlog over two consecutive ticks, the actor issues a pull and ingests results.
- Failing test: idempotency — re-running the pull doesn't double-insert (relies on AuditLog PK).
- Implement.
- Run: pass.
- Commit:
feat(auditlog): SiteAuditReconciliationActor.
M6-T4: AuditLogPurgeActor — daily partition-switch purge
M1 reality:
IAuditLogRepository.SwitchOutPartitionAsyncships in M1 as aNotSupportedExceptionstub because the non-alignedUX_AuditLog_EventIdunique index (necessary for first-write-wins idempotency without includingOccurredAtUtcin the unique key) blocksALTER TABLE … SWITCH PARTITION. M6 must replace the stub with the drop-and-rebuild dance: (1)DROP INDEX UX_AuditLog_EventId ON dbo.AuditLog;(2) create the staging table on[PRIMARY]with identical schema; (3)ALTER TABLE dbo.AuditLog SWITCH PARTITION <n> TO dbo.<staging>;(4)DROP TABLE dbo.<staging>;(5)CREATE UNIQUE NONCLUSTERED INDEX UX_AuditLog_EventId ON dbo.AuditLog(EventId) ON [PRIMARY];. The small unique-index outage window during the switch is acceptable — partition switches are O(seconds) andInsertIfNotExistsAsynccallers will see a transient retry surface; document this in the actor.
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs— central singleton; daily timer. For each partition whose latestOccurredAtUtcis older thanAuditLogOptions.RetentionDays, callIAuditLogRepository.SwitchOutPartitionAsync(partitionBoundary). Emit anAuditLogPurgedevent (logged + metricked) with partition range, row count, and duration. - Modify:
src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs— replace the M1NotSupportedExceptionstub with the drop-and-rebuild dance described above. Wrap in a transaction. Add a regression test asserting the unique index is rebuilt and the data left behind matches the un-switched partitions. - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogPurgeActorTests.cs.
Steps:
- Failing test: with retention = 30 days, partitions older than 30 days are switched out; newer partitions are kept.
- Failing test: purge emits the
AuditLogPurgedevent with correct row count. - Failing test: partition switch under the
scadalink_audit_purgerrole completes successfully (requires the role to ALSO be granted permission to DROP/CREATE theUX_AuditLog_EventIdindex — extend the role grants in this milestone if not in M1's role definition; M1 grantedALTER ON SCHEMA::dbowhich should cover this). - Failing test: post-switch,
InsertIfNotExistsAsynccontinues to enforce first-write-wins (unique index successfully rebuilt). - Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditLogPurgeActor with partition-switch purge (drop-and-rebuild around UX_AuditLog_EventId).
M6-T5: AuditLogPartitionMaintenanceService — monthly roll-forward
M1 reality: the partition function
pf_AuditLog_Monthships with 24 explicit monthly boundaries (Jan 2026 through Dec 2027) on filegroup[PRIMARY]. M6's hosted service must keep this rolling — split a new boundary for the upcoming month and (if a separate hot/cold filegroup strategy is adopted later) drop oldest boundaries via MERGE after purge.
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogPartitionMaintenanceService.cs—IHostedServicethat runs on startup AND every month: ensures the next month's partition range exists onpf_AuditLog_Monthand the partition scheme has a destination filegroup. Implemented via raw SQL (ALTER PARTITION FUNCTION pf_AuditLog_Month SPLIT RANGE (<next-month-boundary>)); ensure the scheme staysALL TO ([PRIMARY])unless production deployment overrides per-filegroup. - Create:
tests/ScadaLink.AuditLog.Tests/Central/PartitionMaintenanceServiceTests.cs(integration viaMsSqlMigrationFixture; runs against a temp DB).
Steps:
- Failing test: against a DB seeded with the M1 migration (covering through Dec 2027), running the service in Apr 2028 splits a Jan 2028 boundary so the function has a range for "current month + at least the next month".
- Implement.
- Failing test: subsequent monthly runs add successive future boundaries (idempotent: already-split boundaries are no-ops, not errors).
- Run: pass.
- Commit:
feat(auditlog): partition maintenance HostedService (SPLIT RANGE roll-forward).
M6-T6: Health metric SiteAuditBacklog
Files:
- Modify:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs— exposeGetBacklogStatsAsync()returning(pendingCount, oldestPendingUtc, onDiskBytes). - Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— addSiteAuditBacklogmetric (3-tuple), populated per site-health-report tick. - Create:
tests/ScadaLink.HealthMonitoring.Tests/SiteAuditBacklogMetricTests.cs.
Steps:
- Failing test: with 100 pending rows in SQLite, the metric reports
pendingCount=100. - Failing test: oldest pending age is reported in seconds since
OccurredAtUtc. - Failing test: on-disk bytes ≈ SQLite file size.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditBacklog metric (count + age + bytes).
M6-T7: Health metric SiteAuditTelemetryStalled
Files:
- Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— add booleanSiteAuditTelemetryStalled. - Modify:
src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs— set the flag when reconciliation detects a non-draining backlog over two consecutive cycles. - Create:
tests/ScadaLink.HealthMonitoring.Tests/SiteAuditTelemetryStalledTests.cs.
Steps:
- Failing test: two consecutive non-draining cycles → flag set.
- Failing test: a subsequent draining cycle → flag cleared.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditTelemetryStalled flag.
M6-T8: Health metric CentralAuditWriteFailures
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs— addCentralAuditWriteFailurescounter. - Modify: every
ICentralAuditWritercall site (Inbound API middleware M4-T7, NotificationOutboxActor M4-T4/T5) — increment on caught exceptions. - Create:
tests/ScadaLink.HealthMonitoring.Tests/CentralAuditWriteFailuresTests.cs.
Steps:
- Failing test: 3 forced central direct-write failures → counter reports 3.
- Implement.
- Run: pass.
- Commit:
feat(health): CentralAuditWriteFailures metric.
M6-T9: Surface AuditRedactionFailure in central health
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs— register the counter created in M5-T7 so it appears in the central health report payload. - Create:
tests/ScadaLink.HealthMonitoring.Tests/AuditRedactionFailureSurfacingTests.cs.
Steps:
- Failing test: incrementing the counter is visible in the next central health snapshot.
- Implement.
- Run: pass.
- Commit:
feat(health): surface AuditRedactionFailure in central health.
M6-T10: Integration test — central outage + reconciliation recovery
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/OutageReconciliationTests.cs— site + central; simulate a 5-minute central gRPC outage; during outage, site emits 200 events; restore central; assert reconciliation pulls catch up within one cycle and all 200 events land in central AuditLog with no duplicates.
Steps:
- Sketch, iterate, commit:
test(auditlog): outage + reconciliation recovery end-to-end.
M6-T11: Integration test — partition switch purge
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/PartitionPurgeTests.cs— pre-populate AuditLog with rows in three monthly partitions (one older than retention, two newer); triggerAuditLogPurgeActor; assert the oldest partition's rows are gone and newer partitions are untouched.
Steps:
- Sketch, iterate, commit:
test(auditlog): partition-switch purge end-to-end.
M6-T12: Integration test — partition maintenance roll-forward
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/PartitionMaintenanceTests.cs— assert that afterAuditLogPartitionMaintenanceServiceruns, the partition function covers the next month's range.
Steps:
- Sketch, iterate, commit:
test(auditlog): partition maintenance roll-forward end-to-end.
M6 — Risk callouts
- Partition switch on a live table: SQL Server
ALTER TABLE ... SWITCH PARTITIONis metadata-only when source and target match in structure and filegroup; verify with a load test that ingest isn't paused during purge. - Pull cadence vs ingest rate: a site producing >
BatchSize/5s sustained may never let telemetry catch up — reconciliation must close the gap. The non-draining detection in M6-T3 is the safety net. - Site SQLite
ForwardStateflip after reconciliation: must be atomic with the central ack; otherwise a site crash mid-flip can re-send rows (idempotent at central, harmless but worth noting). - HostedService scheduling: ensure the partition maintenance service runs on the ACTIVE central node only (not both — would cause SQL errors trying to add the same range twice).
M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
Goal: User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
Affected projects: CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests.
Acceptance criteria:
- New
Components/Pages/Audit/AuditLogPage.razorexists; new "Audit" nav group sibling to Notifications. - All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per
Component-AuditLog.md§10. - Existing
Components/Pages/Monitoring/AuditLog.razor(the IAuditService config-change viewer) renamed in code toConfigurationAuditLog.razor, with URL/audit/configurationto match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added. - 3 KPI tiles added to the Health dashboard; data sourced from
HealthMonitoring. - Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on
ApiInboundrows, drill-in from Notifications to filtered Audit Log. OperationalAuditread permission gating +AuditExportfor the Export button.
M7 — Tasks (TDD-detail)
M7-T1: New AuditLogPage.razor scaffold + route + Audit nav group
Files:
- Create:
src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor+.razor.cs+.razor.css. Route/audit/log. Empty body for now beyond<h1>Audit Log</h1>. - Modify:
src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor(or equivalent) — add a new top-level Audit nav group sibling to Notifications, containing this page. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPageScaffoldTests.cs— Blazor component test (bUnit if it's used in the codebase; else Playwright).
Steps:
- Failing test: navigating to
/audit/logrenders the page (heading present). - Failing test: nav menu shows the Audit group.
- Implement.
- Run: pass.
- Commit:
feat(ui): scaffold Audit Log page + Audit nav group.
M7-T2: <AuditFilterBar> component
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor+.razor.cs— 10 filter elements perComponent-AuditLog.md§10. Multi-select chips for Channel/Kind/Status/Site (Bootstrap custom; NO third-party UI library). Time-range relative dropdown + custom date picker. Text search for Instance/Script/Target/Actor/CorrelationId. "Errors only" toggle. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditFilterBarTests.cs.
Steps:
- Failing test: rendering shows all 10 elements.
- Failing test: selecting filters and clicking "Apply" raises a
FilterChangedevent with the rightAuditQuerypayload. - Failing test: Kind options narrow when Channels are selected.
- Implement.
- Run: pass.
- Commit:
feat(ui): AuditFilterBar component.
M7-T3: <AuditResultsGrid> component with keyset paging
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor+.razor.cs— custom Bootstrap table (no third-party grid). 10 columns perComponent-AuditLog.md. Resizable + reorderable + persistable-per-user (persistence via existing user-settings store). - Keyset paging via
(OccurredAtUtc desc, EventId desc)cursor; default page 100. - Data source: server-side via
IAuditLogRepository.QueryAsync(M1-T8). Wire through aIAuditLogQueryService(new) that the page injects. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditResultsGridTests.cs.
Steps:
- Failing test: grid renders rows from a stub query service; columns match the documented set.
- Failing test: clicking "next page" calls the service with the keyset cursor of the last row.
- Failing test: column reordering persists across navigations (user-settings).
- Failing test: row click emits a
RowSelectedevent with the selectedAuditEvent. - Implement.
- Run: pass.
- Commit:
feat(ui): AuditResultsGrid with keyset paging.
M7-T4: <AuditDrilldownDrawer> — JSON pretty-print
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor+.razor.cs— slide-in drawer triggered byRowSelected. Renders all fields of the selectedAuditEvent. JSON detection: ifRequestSummaryorResponseSummaryis valid JSON, pretty-print with indentation. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditDrilldownDrawerJsonTests.cs.
Steps:
- Failing test: opening drawer with an event whose
RequestSummaryis valid JSON renders an indented version. - Failing test: non-JSON body renders verbatim.
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown drawer JSON pretty-print.
M7-T5: Drilldown — SQL syntax highlighting
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— forChannel=DbOutboundevents, treatRequestSummaryas SQL; apply syntax highlighting via a lightweight client-side library (Prism.js or Highlight.js if already in the project; else a small custom highlighter — confirm during M7 brainstorm). - Modify:
src/ScadaLink.CentralUI/wwwroot/— add the highlighter assets if needed.
Steps:
- Failing test: a
DbOutboundevent'sRequestSummaryis rendered inside a<code class="language-sql">block. - Implement.
- Run: pass.
- Commit:
feat(ui): drilldown SQL syntax highlighting.
M7-T6: Drilldown — "Copy as cURL" for ApiOutbound / ApiInbound
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— forChannel ∈ {ApiOutbound, ApiInbound}events, render a "Copy as cURL" button. Clicking generates a cURL command from the event's URL/headers/body and copies to clipboard viaIJSRuntime.
Steps:
- Failing test: button appears only for HTTP-bearing events.
- Failing test: clicking generates the correct cURL string (verified against a known event fixture).
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown Copy as cURL action.
M7-T7: Drilldown — "Show all events for this operation"
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— when the event has a non-nullCorrelationId, render a link "Show all events for this operation" that re-applies the page's filter set withCorrelationId = <value>(other filters cleared).
Steps:
- Failing test: link appears only when CorrelationId is non-null.
- Failing test: clicking re-navigates to the Audit Log page with the filter applied.
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown "Show all events" by CorrelationId.
M7-T8: Drilldown — redaction indicators
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor— wherever a payload contains the string<redacted>or<redacted: redactor error>, render a small badge indicating the field was redacted. Show a tooltip linking to "Payload Capture Policy" in the Component-AuditLog docs.
Steps:
- Failing test: a payload with
<redacted>shows the badge. - Implement.
- Run: pass.
- Commit:
feat(ui): drilldown redaction indicators.
M7-T9: Rename AuditLog.razor → ConfigurationAuditLog.razor
Files:
- Rename:
src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor→Components/Pages/Audit/ConfigurationAuditLog.razor. - Update: the file's
@pagedirective to/audit/configuration. - Update: all
<NavLink>and any other inbound references to the old path. - Update: tests referencing the old name.
- Modify: nav menu — sit
ConfigurationAuditLogunder the Audit group as a sibling to the new Audit Log page.
Steps:
- Failing test: navigating to
/audit/configurationrenders the (renamed) page. - Failing test: the old
/monitoring/auditlogreturns 404 (or a redirect — choose during M7 brainstorm; redirect is safer for any external bookmarks). - Implement rename + path updates.
- Run: pass.
- Commit:
refactor(ui): rename Audit Log Viewer to Configuration Audit Log Viewer.
M7-T10: Drill-in from Notifications page
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor(or row-action panel) — add "View audit history" action to each row. Navigates to/audit/log?correlationId={NotificationId}.
Steps:
- Failing test: row action exists.
- Failing test: click navigates with the right query string.
- Implement.
- Run: pass.
- Commit:
feat(ui): drill-in from Notifications to Audit Log.
M7-T11: Drill-in from Site Calls page
Files:
- Modify: the Site Calls listing page (or create one if missing — defer to a follow-up if it doesn't exist yet — Site Call Audit #22 UI work is mostly out of scope here). For M7 acceptance: drill-in only required from pages that exist.
- If the page exists, mirror M7-T10's pattern with
?correlationId={TrackedOperationId}.
Steps:
- Conditional on page existence — confirm during M7 brainstorm.
- Implement.
- Commit:
feat(ui): drill-in from Site Calls to Audit Log.
M7-T12: Drill-in from External Systems / Inbound API Keys / Sites / Instances detail pages
Files:
- Modify (per page): External Systems detail, Inbound API Keys detail, Sites detail, Instances detail. Each gets a "Recent activity" / "Recent calls" / "Audit feed" link or tab navigating to
/audit/logwith the appropriate pre-filter (target=<system>/actor=<key name> AND channel=ApiInbound/site=<site>/instance=<instance>). - Tests: one per drill-in.
Steps:
- Failing tests per page.
- Implement.
- Run: pass.
- Commit:
feat(ui): drill-ins from detail pages to Audit Log.
M7-T13: 3 KPI tiles on the Health dashboard
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Pages/Health/HealthDashboard.razor(or equivalent) — add three tiles under a new "Audit" group: Audit volume, Audit error rate, Audit backlog. Data fed from the metrics defined in M5-T7 and M6-T6/T7/T8/T9. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/Health/AuditKpiTilesTests.cs.
Steps:
- Failing test: tiles render with stub data; clicking each navigates to the relevant Audit Log filtered view (or to a per-site breakdown for the backlog tile).
- Implement.
- Run: pass.
- Commit:
feat(ui): Audit KPI tiles on Health dashboard.
M7-T14: Server-side CSV export streaming
Files:
- Create:
src/ScadaLink.CentralUI/Services/AuditLogExportService.cs— accepts the current filter, streams server-side CSV viaIAuditLogRepository.QueryAsyncpaged enumeration; writes to the HTTP response without buffering the whole result in memory. - Modify:
AuditLogPage.razor— Export button calls the service. RequiresAuditExportpermission (M7-T15). - Create:
tests/ScadaLink.CentralUI.Tests/Services/AuditLogExportServiceTests.cs.
Steps:
- Failing test: exporting 10,000 rows streams as CSV; memory usage stays bounded.
- Failing test: default cap of 100k rows enforced; larger requests get a "use the CLI" error.
- Implement.
- Run: pass.
- Commit:
feat(ui): server-side streaming CSV export of Audit Log.
M7-T15: OperationalAudit + AuditExport permission gating
Files:
- Modify:
src/ScadaLink.Security/(or wherever the role/permission model lives) — addOperationalAuditandAuditExportpermissions; map them to the Audit role (existing) by default. - Modify:
AuditLogPage.razor— gate page access onOperationalAudit; gate the Export button onAuditExport. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPagePermissionTests.cs.
Steps:
- Failing test: a user without
OperationalAuditgets a 403 / hidden page. - Failing test: a user with
OperationalAuditbut noAuditExportcan read but Export button is hidden. - Implement permission checks.
- Run: pass.
- Commit:
feat(security): OperationalAudit + AuditExport permissions for the Audit Log surface.
M7-T16: Playwright E2E tests
Files:
- Create:
tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs— covers: filter narrowing, drilldown drawer JSON pretty-print, "Copy as cURL" on ApiInbound, drill-in from Notifications to filtered Audit Log, CSV export end-to-end, permission gating.
Steps:
- Sketch tests using the existing Playwright harness.
- Iterate until all green.
- Commit:
test(ui): Audit Log Playwright E2E coverage.
M7 — Risk callouts
- Custom data grid scope: keyset paging + reorderable columns + per-user persistence is non-trivial. Bench the existing
NotificationReport.razorgrid to see whether it can be generalised vs forking it. Decision during M7 brainstorm. - SignalR + large drawer payloads: the drilldown payload (up to 64 KB on errors) is rendered server-side via SignalR. Confirm
MaxRecvMessageSizeis large enough; bump if needed. - Permission infrastructure assumptions: confirm during M7 brainstorm that the codebase already supports per-permission gates at the page level, not just role-level. If only role-level, fall back to gating via the existing Audit role with a feature flag for the export.
- The rename to
ConfigurationAuditLog.razorbreaks any external bookmarks. Decide redirect vs 404 explicitly during M7 brainstorm.
M8 — CLI: scadalink audit query | export | verify-chain
Goal: Operator surface for the centralized Audit Log.
Affected projects: CLI, CLI.Tests, ManagementService (new HTTP endpoint), IntegrationTests.
Acceptance criteria:
scadalink audit querymirrors the UI filter set; results stream as JSON (default) or table.scadalink audit exportstreams server-side to CSV / JSONL / Parquet; requiresAuditExportpermission.scadalink audit verify-chain --month YYYY-MMis a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).- Existing
audit-log query(IAuditService config-change viewer) renamed in code toaudit-config queryto disambiguate; old name kept as a deprecated alias for one minor version. - Permissions:
audit queryandaudit verify-chainrequireOperationalAudit;audit exportadditionally requiresAuditExport.
M8 — Tasks (TDD-detail)
M8-T1: Create AuditCommands.cs (separate from existing AuditLogCommands.cs)
Files:
- Create:
src/ScadaLink.CLI/Commands/AuditCommands.cs—static AuditCommands { public static Command Build() }following the System.CommandLine pattern fromAuditLogCommands.cs:1–53. Sets up theauditparent command with three subcommands (T2/T3/T4). - Modify:
src/ScadaLink.CLI/Program.cs— registerAuditCommands.Build()alongside the existing command groups. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditCommandsScaffoldTests.cs.
Steps:
- Failing test:
scadalink audit --helplists three subcommands (query, export, verify-chain). - Implement.
- Run: pass.
- Commit:
feat(cli): scaffold scadalink audit command group.
M8-T2: audit query subcommand
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addquerysubcommand with the flag set matching the Central UI Audit Log filter set (post-Bundle-D fix):--since,--until,--channel,--kind,--status,--site,--instance,--target,--actor,--correlation-id,--errors-only,--page,--page-size. Output JSON by default;--format tableopt-in. - Create:
src/ScadaLink.Commons/Messages/Cli/QueryAuditLogCommand.cs(or wherever the CLI↔Management messages live — confirm via repo). - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs.
Steps:
- Failing test: parsing the documented flag set produces a
QueryAuditLogCommandwith the expected fields. - Failing test:
--format tableswitches the output formatter. - Failing test: unknown flag returns non-zero exit code with a helpful error.
- Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit query subcommand.
M8-T3: audit export subcommand
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addexportsubcommand with flags--since(required),--until(required),--format csv|jsonl|parquet(required),--output <path>(required),--channel,--kind,--status,--site,--target,--actor. - Create:
src/ScadaLink.Commons/Messages/Cli/ExportAuditLogCommand.cs. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditExportCommandTests.cs.
Steps:
- Failing test: missing required flag returns helpful error.
- Failing test: valid invocation creates an
ExportAuditLogCommandwith all fields. - Failing test: streams results to
--output; doesn't buffer entire export in memory (test with 100k+ rows). - Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit export subcommand (csv|jsonl|parquet).
M8-T4: audit verify-chain subcommand (no-op stub)
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addverify-chain --month <YYYY-MM>subcommand. In v1, returns a documented "hash chain not yet enabled in this release; see Component-AuditLog.md Security & Tamper-Evidence for the v1.x roadmap" message with exit code 0. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditVerifyChainCommandTests.cs.
Steps:
- Failing test:
scadalink audit verify-chain --month 2026-05exits 0 with the documented message. - Failing test: malformed month string (e.g.,
2026-13) exits non-zero with a parse error. - Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit verify-chain subcommand (v1 no-op).
M8-T5: ManagementService HTTP endpoints
Files:
- Modify:
src/ScadaLink.ManagementService/Controllers/AuditController.cs(new) — REST endpointsGET /api/audit/query(paged) andGET /api/audit/export(streaming). Both gated onOperationalAudit/AuditExportpermissions (matching the UI's permission split from M7-T15). - Create:
tests/ScadaLink.ManagementService.Tests/Controllers/AuditControllerTests.cs.
Steps:
- Failing test:
GET /api/audit/querywith valid params returns JSON page of audit events. - Failing test:
GET /api/audit/exportstreams CSV/JSONL/Parquet without buffering. - Failing test: a request without
OperationalAuditreturns 403. - Failing test:
/exportwithoutAuditExportreturns 403. - Implement.
- Run: pass.
- Commit:
feat(mgmt): /api/audit/{query,export} endpoints with permission gates.
M8-T6: Output formatters (JSON + table)
Files:
- Modify:
src/ScadaLink.CLI/Output/— add anAuditEventTableFormatterthat renders results as an aligned table with sensible defaults (truncate long fields with…). - The JSON formatter follows existing CLI patterns (one event per line for streaming, or array for paged results — confirm during M8 brainstorm).
- Create:
tests/ScadaLink.CLI.Tests/Output/AuditEventFormatterTests.cs.
Steps:
- Failing test: table format includes columns: OccurredAtUtc, Channel, Kind, Status, Target, Actor, DurationMs.
- Failing test: JSON format is one event per line.
- Implement.
- Run: pass.
- Commit:
feat(cli): JSON + table formatters for audit events.
M8-T7: Rename existing audit-log query → audit-config query with deprecation alias
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditLogCommands.cs— rename the top-level command fromaudit-logtoaudit-config(clearer disambiguation from the newauditgroup). Add an aliasaudit-logthat prints a deprecation warning and forwards toaudit-configfor one minor version. - Modify:
src/ScadaLink.CLI/README.mdand CLI help text to document the rename and the deprecation timeline. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditConfigDeprecationTests.cs.
Steps:
- Failing test:
scadalink audit-config query --user aliceworks. - Failing test:
scadalink audit-log query --user aliceworks but emits a deprecation warning to stderr. - Failing test:
scadalink audit query --since ...(the NEW operational command) andscadalink audit-config query --user ...(the renamed config command) are clearly different surfaces and do not conflict. - Implement.
- Run: pass.
- Commit:
refactor(cli): rename audit-log → audit-config with deprecation alias.
M8-T8: CLI README + help text updates
Files:
- Modify:
src/ScadaLink.CLI/README.md— document the newauditgroup, the renamedaudit-configgroup, the permission requirements, theverify-chainno-op note, and the CLI ↔ UI filter parity. - Modify: each subcommand's
--helpdescription for clarity.
Steps:
- Inline doc edits.
- Verify
scadalink audit --helpandscadalink audit-config --helpproduce the documented output. - Commit:
docs(cli): document new scadalink audit group and audit-config rename.
M8-T9: CLI integration test — end-to-end query + export
Files:
- Create:
tests/ScadaLink.IntegrationTests/Cli/AuditCliEndToEndTests.cs— boots central with a populated AuditLog table; invokesscadalink audit query --since ...against the running ManagementService; asserts results match the database. Same for export.
Steps:
- Sketch test using existing IntegrationTests harness.
- Iterate until all flag combinations work end-to-end.
- Commit:
test(cli): scadalink audit end-to-end against running ManagementService.
M8 — Risk callouts
- Operator script breakage from the
audit-logrename: the deprecation alias is the safety net but only for one minor version; document the deprecation timeline clearly in the CLI README. Coordinate with anyone runningaudit-login CI/cron. - Parquet output: requires a Parquet writer library. If one isn't already in
Directory.Packages.props, add the smallest viable dependency (ParquetSharporParquet.Net). Decide during M8 brainstorm. - Streaming export from CLI: the CLI invokes the ManagementService HTTP endpoint, which itself streams. Confirm
HttpClient.SendAsyncwithHttpCompletionOption.ResponseHeadersReadis used so the CLI doesn't buffer the whole response. - Permission model parity: ensure the CLI's permission errors mirror the UI's (HTTP 403 → CLI exit code 2 with a clear message).
Cross-cutting concerns (apply at every milestone)
- Branching: every milestone gets its own
feature/audit-log-mN-<slice>branch; merged with--no-fftomainon milestone completion. No pushes without explicit user authorization. - Tests: Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
- Commits: small and frequent. Bite-sized per writing-plans skill.
- Reviews: per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
- Docs: if implementation reveals a design gap, fix the design doc FIRST (in
docs/requirements/Component-AuditLog.mdand/oralog.md), commit, then implement. Don't let the code and docs drift. - Infra: the 3
infra/*working-tree modifications still uncommitted onmainare unrelated and stay that way unless the user explicitly addresses them. Use explicitgit add <path>throughout, nevergit commit -am.
Per-milestone execution flow (template)
When a milestone is about to start, run this sequence:
- Brainstorm: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
- Writing-plans: produce a milestone-specific plan with TDD detail per task — saved to
docs/plans/2026-XX-XX-auditlog-mN-<slice>.md+ peer.tasks.json. - Subagent-driven execution: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to
mainwith--no-ff.
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.