M4 head now records M3 realities: - Vocabulary translation table from pre-M1 spec strings to M1-aligned enum values (DbWrite vs SyncWrite/SyncRead; NotifyDeliver vs Notification.Attempt/Terminal; InboundRequest/InboundAuthFailure vs ApiInbound.Completed; Failed vs PermanentFailure). - Mapper consolidation: 4 DTO mappers exist; extract single helper before M4 adds more channels. - OnCachedTelemetryWithoutDualWriteAsync test-mode fallback may be deprecated in M4. - Site SQLite drain for OperationTrackingStore: only dual-write transaction writes central today; plan drain if M4 needs in-flight tracking visibility. - SiteCallAuditActor wired but unused on M3 hot path; M4/M6 natural first direct caller.
113 KiB
Audit Log (#23) Code Implementation Roadmap
For Claude: REQUIRED SUB-SKILL FLOW per milestone:
brainstorming→writing-plans→subagent-driven-development. Usedocs/requirements/Component-AuditLog.md+alog.mdas the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. M1 carries full TDD-level task detail; M2–M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.
Goal: Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
Architecture: Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See alog.md for the locked design decisions and Component-AuditLog.md for the component spec.
Tech Stack: Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing SiteStream channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
Spec: /Users/dohertj2/Desktop/scadalink-design/alog.md (validated, immutable; commit fec0bb1). Component design at /Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md.
Codebase Reality Check (what already exists)
- All 22 prior components have source + tests. Audit Log slots in as a new
src/ScadaLink.AuditLog/project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface). - Existing patterns to copy from:
- Singleton wiring:
src/ScadaLink.Host/Actors/AkkaHostedService.cs:272–280(NotificationOutboxActor) —ClusterSingletonManager.Props+ manager/proxy pair. - EF migration:
src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs— table create + indexes; no partitioning yet — Audit Log will be the first. - Site SQLite hot-path:
src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98— single connection, write lock, Channel-based background writer. - Site-buffer + forwarder:
src/ScadaLink.StoreAndForward/—StoreAndForwardStorage+NotificationForwardershow the Pending → Forwarded transition we'll mirror. - Actor + repo + test trio:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csandtests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20— TestKit base class, NSubstitute repo,Sys.ActorOf,ExpectMsg<T>. - gRPC additive:
src/ScadaLink.Communication/Protos/sitestream.proto— currently carries onlyAttributeValueUpdateandAlarmStateUpdatein aoneof; we extend it. - CLI command shape:
src/ScadaLink.CLI/Commands/AuditLogCommands.cs:1–53— System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay). - Blazor listing page:
src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor— filter bar + keyset paging + status badges idiom.
- Singleton wiring:
AuditLog.razorandAuditLogCommands.csalready exist but they're the IAuditService config-change viewer. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.- Test framework: xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under
tests/ScadaLink.IntegrationTests/. Playwright UI tests undertests/ScadaLink.CentralUI.PlaywrightTests/. Atests/ScadaLink.PerformanceTests/exists for load tests.
Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
The design for both is merged on main (alog.md cached-call tracking section; Component-SiteCallAudit.md), but grep finds zero references to TrackedOperationId or CachedCallTelemetry in src/. This matters because M3 (cached operations + dual-write transaction) cannot be built without them.
Three ways to handle this — pick before M3:
- Inline into M3 (Recommended): Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the
CachedCallTelemetrymessage, the operational-tracking SQLite table at sites, theSiteCallstable + repo +SiteCallAuditActorskeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end). - M0 prerequisite milestone: Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
- Ship Audit Log sync-only first, retrofit cached path later: M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
Default choice in this roadmap: (1). M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
Milestone index
| M | Title | Ships | Touches | Depends on |
|---|---|---|---|---|
| M1 | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
| M2 | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync Call() audited from script to central row). |
Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
| M3 | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
| M4 | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
| M5 | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
| M6 | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
| M7 | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing AuditLog.razor renamed to ConfigurationAuditLog. |
CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
| M8 | CLI — scadalink audit query / export / verify-chain |
Operator surface for query/export; verify-chain is a no-op stub until v1.x hash chain ships. |
CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
Ship-state at end of each milestone is the shippable slice — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
M1 — Foundation: schema, types, DB roles, partitioning
Goal: Land the new AuditLog table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
Affected projects:
src/ScadaLink.Commons/— entity, enums, interfaces, message DTOs.src/ScadaLink.ConfigurationDatabase/— EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.tests/ScadaLink.Commons.Tests/— enum + record tests.tests/ScadaLink.ConfigurationDatabase.Tests/— migration tests, repo tests.
Acceptance criteria:
dotnet buildof the solution succeeds.dotnet ef database updateagainst a dev MS SQL applies the migration;AuditLogtable exists, partitioned monthly onOccurredAtUtc, with PK onEventIdand the five expected indexes.scadalink_audit_writerandscadalink_audit_purgerSQL roles exist with the documented grants; a smoke test confirmsUPDATE AuditLogfrom the writer role fails.AuditEventrecord,AuditChannel/AuditKind/AuditStatusenums,IAuditWriter/ICentralAuditWriterinterfaces,AuditTelemetryEnvelope/PullAuditEventsmessage DTOs all exist in Commons in the right folders.IAuditLogRepositoryinterface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposesInsertIfNotExistsAsync, paged read, andSwitchOutPartitionAsync— no update or row-delete.- All new tests pass; no existing tests regress.
M1 — Tasks (TDD-detail)
M1-T1: Add audit enums to Commons
Files:
- Create:
src/ScadaLink.Commons/Types/Enums/AuditChannel.cs,AuditKind.cs,AuditStatus.cs. - Create:
tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs.
Steps:
- Write failing test verifying
AuditChannelhas exactlyApiOutbound | DbOutbound | Notification | ApiInbound(assertingEnum.GetValueslength and members). - Same for
AuditKind(10 members perComponent-AuditLog.md). - Same for
AuditStatus(8 members). - Run: tests fail (enums don't exist). Implement the three enums.
- Run tests: pass.
- Commit:
feat(commons): add Audit{Channel,Kind,Status} enums for #23.
M1-T2: Add AuditEvent record + ForwardState enum
Files:
- Create:
src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs— public record carrying all 20 central columns (peralog.md§4) plus a nullableForwardState?for the site-local variant. - Create:
src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs—Pending | Forwarded | Reconciled. - Create:
tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs.
Steps:
- Write failing test that constructs an
AuditEvent, sets every property, and round-trips viawithexpressions — asserts immutability and required-property behaviour. - Run: fail (type doesn't exist). Implement the record.
- Run: pass.
- Commit:
feat(commons): add AuditEvent record + ForwardState enum.
M1-T3: Add IAuditWriter and ICentralAuditWriter
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs,ICentralAuditWriter.cs. - Create:
tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs(smoke — only that the interfaces exist and have the documented signatures).
Steps:
- Write failing reflection-based test asserting both interfaces expose
Task WriteAsync(AuditEvent, CancellationToken). - Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
- Run: pass.
- Commit:
feat(commons): add IAuditWriter and ICentralAuditWriter.
M1-T4: Add audit telemetry + pull message DTOs
Files:
- Create:
src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs,PullAuditEventsRequest.cs,PullAuditEventsResponse.cs. - Create:
tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs.
Steps:
- Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
- Failing test: pull request carries
SinceUtc+BatchSize; response carries events +MoreAvailable. - Implement.
- Run: pass.
- Commit:
feat(commons): add audit telemetry + pull message DTOs.
M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
Files:
- Modify:
src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs— addpublic DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();at the appropriate position (afterNotifications). - Create:
src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs—IEntityTypeConfiguration<AuditEvent>mapping the columns, types, length constraints, and indexes peralog.md§4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively. - Modify:
OnModelCreating— apply the new configuration. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs— useModelBuilderdirectly to verify the entity is mapped toAuditLogtable, PK isEventId, and the expected columns + indexes are declared.
Steps:
- Failing test asserts mapped table name, PK column, and column count.
- Implement entity configuration; apply in
OnModelCreating. - Failing test asserts the five expected indexes exist on the model.
- Add
HasIndexdeclarations. - Run: pass.
- Commit:
feat(configdb): map AuditEvent to AuditLog table with PK and indexes.
M1-T6: Generate and customize EF migration for AuditLog
Files:
- Create:
src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.csviadotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase. - Modify: the generated
Up()/Down()to:- Create the partition function
pf_AuditLog_Monthand partition schemeps_AuditLog_Month(raw SQL viamigrationBuilder.Sql(...)), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting). - Alter the
CreateTablecall (or follow up withSql) to align the table tops_AuditLog_Month(OccurredAtUtc). - Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
- Create the partition function
- Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs— applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
Steps:
- Run
dotnet ef migrations add AddAuditLogTable. - Failing integration test: apply migration, query
sys.partition_functionsandsys.partition_schemesfor the expected names. - Edit migration to add the partition function + scheme + alignment.
- Re-run test: pass.
- Failing test: query
sys.indexesfor the five expected named indexes. - Adjust migration if any index name drifts.
- Run: pass.
- Commit:
feat(configdb): add AuditLog migration with monthly partitioning.
M1-T7: Add DB roles in migration
Files:
- Modify: the M1-T6 migration
Up()to also create thescadalink_audit_writer(INSERT + SELECT only) andscadalink_audit_purger(ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (IF NOT EXISTS). - Modify:
Down()— drop the roles. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs— applies migration, then runsSELECTonsys.database_role_members/sys.database_permissionsto assert the role grants. Plus a smoke test: connect as a user mapped toscadalink_audit_writer, attemptUPDATE AuditLog SET Status = 'X'and expect a permission error.
Steps:
- Failing test asserts both roles exist with documented grants.
- Add
migrationBuilder.Sql(...)blocks. - Run: pass.
- Failing test:
UPDATE AuditLogas audit writer → expect SqlException with permission error. - Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
- Run: pass.
- Commit:
feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles.
M1-T8: Add IAuditLogRepository + EF implementation
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs—InsertIfNotExistsAsync(AuditEvent, CancellationToken),QueryAsync(filter, paging, CancellationToken),SwitchOutPartitionAsync(monthBoundary, CancellationToken). Deliberately noUpdateAsyncor row-levelDeleteAsync. - Create:
src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs— implementation using the DbContext;InsertIfNotExistsAsyncusesMERGEor rawINSERT … WHERE NOT EXISTSto satisfy idempotency without throwing on dupes. - Modify:
ServiceCollectionExtensions.cs— registerIAuditLogRepository→AuditLogRepositoryin DI. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs.
Steps:
- Failing test:
InsertIfNotExistsAsyncfor a freshEventIdwrites one row; calling again with the sameEventIdis a no-op (no exception, no second row). - Implement; use a
MERGEorINSERT … WHERE NOT EXISTSstrategy that does NOT rely on EF change tracking. - Run: pass.
- Failing test: paged
QueryAsyncreturns rows in(OccurredAtUtc desc, EventId desc)order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range). - Implement filter projection + keyset paging.
- Run: pass.
- Failing test:
SwitchOutPartitionAsyncfor the oldest partition removes its rows from the live table. - Implement via
migrationBuilder-styleSql("ALTER TABLE ... SWITCH PARTITION ... TO ...")(against a staging table the implementation creates and drops within the same transaction). - Run: pass.
- Commit:
feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge).
M1-T9: Add AuditLogOptions configuration class + binding
Files:
- Create:
src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs(new project — see M1-T11) — ownsDefaultCapBytes,ErrorCapBytes,HeaderRedactList,GlobalBodyRedactors,PerTargetOverrides,RetentionDays, validation attributes. - Add: validation on startup (
IValidateOptions<AuditLogOptions>). - Test: ensure
appsettings.jsonbind round-trips and validation rejects out-of-rangeRetentionDays.
Steps:
- Failing test: bind a valid section → values present.
- Implement options class + binding.
- Failing test: bind invalid
RetentionDays→ validator rejects. - Implement validator.
- Run: pass.
- Commit:
feat(auditlog): add AuditLogOptions config binding.
M1-T10: Add ScadaLink.AuditLog project skeleton
Files:
- Create:
src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj— TargetFramework matches the rest of the solution; ProjectReferences toScadaLink.CommonsandScadaLink.ConfigurationDatabase. - Create:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs—AddAuditLog(this IServiceCollection, IConfiguration)that registersAuditLogOptions,IAuditLogRepository, plus placeholders that later milestones will fill (writer impls, actors). - Create:
tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csprojwith one smoke test. - Modify:
ScadaLink.slnx— add both projects to the solution. - Modify:
Directory.Packages.propsif any new package versions are needed.
Steps:
- Create projects via
dotnet new classlib/dotnet new xunit; add references; add to slnx. - Failing test: smoke-test
AddAuditLog()populates DI withIAuditLogRepositoryandIOptions<AuditLogOptions>. - Implement
ServiceCollectionExtensions.AddAuditLog. - Run: pass.
- Commit:
feat(auditlog): scaffold ScadaLink.AuditLog project.
M1-T11: Update Component-Host.md responsibilities + README component table
Files:
- Modify:
docs/requirements/Component-Host.md— listScadaLink.AuditLogin the central role's registration set. - Modify:
README.md— confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
Steps:
- Edit, verify cross-refs, commit:
docs(audit): register ScadaLink.AuditLog project in Host role.
M2 — Site pipeline (sync-only path)
Goal: First end-to-end audit emission: a script-initiated ExternalSystem.Call() produces an audit row in the central AuditLog table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: ApiOutbound / ApiCall.
Affected projects: Commons, AuditLog (new), Communication, Host, ExternalSystemGateway, all matching *.Tests/, tests/ScadaLink.IntegrationTests/.
M1 realities to honor:
- Vocabulary: M1 enums use
AuditKind.ApiCall(sync) andAuditStatus.Delivered|Failed. The original spec'sSyncCall/Successnames were superseded; alog.md + Component-AuditLog.md were reconciled in the M1 merge.- Idempotent insert race: M1's
AuditLogRepository.InsertIfNotExistsAsyncuses non-lockingIF NOT EXISTS … INSERT. M2 is the first concurrent writer (AuditLogIngestActorwill receive batches from multiple sites). Harden the repo before relying on it — either addWITH (UPDLOCK, HOLDLOCK)to the existence check, or catch SqlException numbers 2601/2627 (duplicate key onUX_AuditLog_EventId) and swallow. Add a new task at the head of M2 for this fix and its concurrency test.- Keyset tiebreaker test gap: M1's
QueryAsync_Keyset_NextPageStartsAfterCursortest uses five rows with distinctOccurredAtUtc, so theGuid.CompareTotiebreaker branch is never exercised. Add a same-OccurredAt test in M2 (Bundle D reviewer's deferred recommendation).- Reusable MSSQL fixture:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs+[SkippableFact]+Skip.IfNot(_fixture.Available, _fixture.SkipReason)is the established pattern. Consider promoting it to a[CollectionDefinition]-shared fixture when M2+ adds more MSSQL-dependent test classes.- Project layout:
src/ScadaLink.AuditLog/is wired into the solution withConfiguration/AuditLogOptions.cs+ validator +ServiceCollectionExtensions.AddAuditLog(). M2'sSite/andCentral/subfolders attach to this project; the DI extension is the registration point.
Acceptance criteria:
- Site-local
IAuditWriterwrites to a per-site SQLiteauditlog.dbon the hot path withForwardState = 'Pending'; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric. SiteAuditTelemetryActordrains pending rows in batches via a newIngestAuditEventsRPC on the existingSiteStreamgRPC service; on success flipsForwardState = 'Forwarded'.AuditLogIngestActor(central singleton) receives the batch, performsInsertIfNotExistsAsyncper event, returns ack.ExternalSystem.Call()emits oneApiOutbound.SyncCallrow viaIAuditWriteron every call completion; audit-write failure does NOT abort the script.- Integration test in
tests/ScadaLink.IntegrationTests/boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the centralAuditLogwithin N seconds. - No regressions in existing ExternalSystemGateway or Communication tests.
M2 — Tasks (TDD-detail)
M2-T1: SqliteAuditWriter — schema + connection bootstrap
Files:
- Create:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs— implementsIAuditWriter. Constructor takes aSqliteOptions(path); singleSqliteConnectionper instance gated bySemaphoreSlim(1,1). CallsInitializeSchema()on first use. Pattern fromsrc/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs.
Steps:
- Failing test: opening a writer against a
:memory:SQLite produces anAuditLogtable with the documented columns (the 20 central columns minusIngestedAtUtc, plusForwardState). - Run: fail (class doesn't exist).
- Implement
InitializeSchema()withCREATE TABLE IF NOT EXISTS AuditLog (...). Use SQLite column types matching the EF mapping where reasonable (TEXTfor IDs,INTEGERfor status enums,BLOBnot used). - Run: pass.
- Commit:
feat(auditlog): SqliteAuditWriter schema bootstrap.
M2-T2: SqliteAuditWriter — hot-path WriteAsync
Files:
- Modify:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs.
Steps:
- Failing test:
WriteAsync(event)inserts one row withForwardState = Pending. - Failing test: 1,000 concurrent
WriteAsynccalls all complete without exception and produce exactly 1,000 rows (write-lock correctness). - Run: fail.
- Implement using a parameterized
INSERTunderSemaphoreSlimlock. - Run: pass.
- Commit:
feat(auditlog): SqliteAuditWriter hot-path INSERT with write lock.
M2-T3: RingBufferFallback — in-memory fallback
Files:
- Create:
src/ScadaLink.AuditLog/Site/RingBufferFallback.cs—Channel<AuditEvent>withBoundedChannelFullMode.DropOldest, default capacity 1024. - Create:
tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs.
Steps:
- Failing test: enqueueing 1,025 events into a 1,024-cap ring drops the oldest and emits a
RingBufferOverflownotification (incrementing a passed-in counter). - Failing test:
DrainTo(writer)writes all buffered events in FIFO order and clears the ring. - Implement.
- Run: pass.
- Commit:
feat(auditlog): RingBufferFallback with drop-oldest overflow.
M2-T4: FallbackAuditWriter — compose primary + ring behind IAuditWriter
Files:
- Create:
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs— primary writer isSqliteAuditWriter; on transient exception, enqueues intoRingBufferFallbackand incrementsSiteAuditWriteFailures(M2-T11). On the next successful primary write, drains the ring back through the primary. - Create:
tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs.
Steps:
- Failing test: when the primary throws, the event lands in the ring and the call returns successfully.
- Failing test: when primary writes succeed again, the ring drains in FIFO order.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): FallbackAuditWriter composing SQLite + ring.
M2-T5: Extend sitestream.proto with IngestAuditEvents RPC
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— addmessage AuditEventDto { string event_id = 1; google.protobuf.Timestamp occurred_at_utc = 2; ... }(all 20 central fields),message AuditEventBatch { repeated AuditEventDto events = 1; },message IngestAck { repeated string accepted_event_ids = 1; }, andrpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);onSiteStreamService. - Build:
dotnet build src/ScadaLink.Communication/regenerates the C# stubs. - Create:
tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs.
Steps:
- Failing test: round-trip serialize/deserialize a populated
AuditEventDto; assert all fields survive. - Edit proto; rebuild.
- Run: pass.
- Commit:
feat(comms): add IngestAuditEvents RPC + AuditEvent proto messages.
M2-T6: AuditEvent ↔ AuditEventDto mapper
Files:
- Create:
src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs— staticToDto(AuditEvent)andFromDto(AuditEventDto). - Create:
tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs.
Steps:
- Failing test: round-trip a populated
AuditEventthroughToDto→FromDto; assert equality on all 20 columns. - Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditEvent ↔ proto Dto mapper.
M2-T7: SiteAuditTelemetryActor — drain loop
Files:
- Create:
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs—ReceiveActorwith aDrainself-tick. OnDrain: read up toBatchSizePendingrows from SQLite; send via gRPC; mark accepted rowsForwarded. - Create:
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs—BatchSize = 256,BusyIntervalSeconds = 5,IdleIntervalSeconds = 30. - Create:
tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.csusingTestKit+ NSubstitute for the gRPC client.
Steps:
- Failing test: when SQLite has 50 pending rows, a
Draintick sends one batch via the mocked gRPC client. - Failing test: on ack, the corresponding rows flip to
Forwardedin SQLite. - Failing test: when gRPC throws, rows stay
Pendingand the next tick retries. - Failing test: cadence is 5s after a tick that drained ≥1 row, 30s after a tick that drained 0.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): SiteAuditTelemetryActor drain loop.
M2-T8: AuditLogIngestActor + gRPC server handler
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs—ReceiveActoracceptingIngestAuditEventsCommand(batch); callsIAuditLogRepository.InsertIfNotExistsAsyncfor each event inside a singleDbContexttransaction; replies withIngestAck(acceptedEventIds). - Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs— implement the newIngestAuditEventsmethod as a thin gRPC↔Akka adapter (Askagainst the central singleton's proxy, mapped to the gRPC reply). - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs.
Steps:
- Failing test: actor receives a batch of 5 events; repo is called 5 times; reply lists all 5 EventIds as accepted.
- Failing test: when 2 of 5 events already exist (repo returns
Inserted = false), the reply still lists all 5 as accepted (idempotent semantics). - Failing test: gRPC handler routes to actor and returns its reply.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditLogIngestActor + gRPC server handler.
M2-T9: Host registration with dedicated dispatcher
Files:
- Modify:
src/ScadaLink.Host/Actors/AkkaHostedService.cs— alongside the existing wiring at:272–280, registerAuditLogIngestActoras central singleton andSiteAuditTelemetryActoras site singleton bound toaudit-telemetry-dispatcher. Manager + proxy pair for both. - Modify: Host HOCON (likely
src/ScadaLink.Host/Configuration/akka.confor similar) — addaudit-telemetry-dispatcher { type = ForkJoinDispatcher; parallelism-min = 1; parallelism-max = 2; }. - Modify:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs— register actorPropsfactories so Host can resolve them. - Create:
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs.
Steps:
- Failing test: starting the host with the audit module loaded produces healthy
IActorRefproxies for both singletons. - Failing test:
SiteAuditTelemetryActoris bound toaudit-telemetry-dispatcher(assert via Akka actor cell inspection or via a known-good dispatcher-tagged behaviour). - Implement.
- Run: pass.
- Commit:
feat(host): register AuditLog singletons with dedicated dispatcher.
M2-T10: ESG ExternalSystemClient.CallAsync audit emission
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs(syncCallAsyncaround line 45–70) — injectIAuditWritervia constructor. After the call completes (success OR exception), build anAuditEvent(channel=ApiOutbound, kind=SyncCall, status from outcome,DurationMs,HttpStatus, target = system+method, provenance fromScriptExecutionContext). Call_auditWriter.WriteAsync(evt)inside atry/catchthat swallows + logs + incrementsSiteAuditWriteFailures. - Modify:
src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs— acceptIAuditWriterfrom DI. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientAuditEmissionTests.cs.
Steps:
- Failing test: sync
CallAsyncsuccess → exactly one event withStatus=Success,Channel=ApiOutbound,Kind=SyncCall. - Failing test: sync
CallAsyncHTTP 500 →Status=TransientFailure,HttpStatus=500. - Failing test: sync
CallAsyncHTTP 400 →Status=PermanentFailure,HttpStatus=400. - Failing test: when
IAuditWriter.WriteAsyncthrows, the script call still completes normally and the script sees the original (non-audit) result. - Implement.
- Run: pass.
- Commit:
feat(esg): emit ApiOutbound.SyncCall audit event on every sync call.
M2-T11: SiteAuditWriteFailures health metric
Files:
- Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— add aSiteAuditWriteFailurescounter; expose it in the site health report payload. - Modify:
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs(M2-T4) — acceptIHealthMetrics(or the project's existing health counter abstraction) and increment per failed primary write. - Create:
tests/ScadaLink.AuditLog.Tests/Site/SiteAuditWriteFailuresMetricTests.cs.
Steps:
- Failing test: 3 simulated SQLite failures → counter reports 3 in the next snapshot.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditWriteFailures metric.
M2-T12: End-to-end integration test
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/SyncCallEmissionTests.cs— boots a site + central pair via the existing IntegrationTests harness; deploys a tiny script that calls a stub external system; asserts the centralAuditLogtable has exactly one row with the expected channel/kind/status within 10s. - Possibly modify:
infra/reseed.shif integration tests need a fresh AuditLog table per run.
Steps:
- Sketch the test using existing IntegrationTests fixtures.
- Run: fail somewhere (gaps in earlier tasks surface here).
- Iterate fixes back through M2-T1..M2-T11 until end-to-end passes.
- Commit:
test(auditlog): end-to-end sync call emission integration test.
M2 — Risk callouts
- SiteStream proto evolution: adding a new top-level RPC is wire-compatible; confirm generated
Sitestream.csrebuilds cleanly and existing tests still pass. - Dedicated dispatcher misconfiguration: if
SiteAuditTelemetryActorlands on the script blocking-I/O dispatcher, scripts will starve during telemetry bursts. Add a runtime assertion inM2-T9that the actor's dispatcher matches expectation. - Script execution context plumbing: ESG emission (M2-T10) needs
SourceInstanceId/SourceScript; confirm these are reachable via the existingScriptExecutionContext(or equivalent in SiteRuntime) before starting M2-T10. - Integration-test DB isolation: target an isolated MS SQL database (or a dedicated schema) so the test doesn't clash with other integration tests.
M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
Goal: Cached external calls (ExternalSystem.CachedCall) and cached DB writes (Database.CachedWrite) produce four audit rows per operation (Kind=CachedSubmit Status=Submitted, Kind=ApiCallCached/DbWriteCached Status=Forwarded, Kind=ApiCallCached/DbWriteCached Status=Attempted × N, Kind=CachedResolve Status=Delivered|Failed|Parked|Discarded) AND populate the operational SiteCalls table at central — in one transaction at central, from a single combined telemetry packet.
M2 realities to honor:
- Vocabulary: use the M1-aligned enums. M3 will be the first code to populate
AuditKind.ApiCallCached,DbWriteCached,CachedSubmit,CachedResolve. The locked spec (alog.md + Component-AuditLog.md) was reconciled in the M1 merge.- Site→central gRPC client deferred to M6: M2 ships
NoOpSiteStreamAuditClientas the production default. Site SQLite rows accumulate asPendingforever in production until M6. M3 component tests should use Bundle H'sDirectActorSiteStreamAuditClientpattern (seetests/ScadaLink.AuditLog.Tests/Integration/SyncCallEmissionEndToEndTests.cs:277-340). Extract that helper intotests/ScadaLink.AuditLog.Tests/Integration/Infrastructure/so M3 cached-call E2E tests can reuse it without re-defining.- Mapper duplication:
SiteStreamGrpcServer.IngestAuditEventsinlines DTO→entity decoding (intentional, to avoid the AuditLog→Communication project-ref cycle). The mapper lives atsrc/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs. M3 should add a comment in both spots tying them together, OR move the mapper intosrc/ScadaLink.Commons/(project-ref clean) so both consumers can share it.AuditIngestAskTimeout = 30sis hardcoded inSiteStreamGrpcServer.cs:37. M3 may want to expose this viaCommunicationOptionsorAuditLogOptionsas central reconciliation/dual-write traffic grows.- CachedCallTelemetry message: per CLAUDE.md, the existing
CachedCallTelemetrymessage does not yet exist in code. M3 must create it from scratch (additively, per Commons REQ-COM-5a — DO NOT rename itCachedOperationTelemetry). It carries BOTH the AuditLog rows (4+) AND the SiteCalls upsert in one packet.- Dual-write transaction: central writes
AuditLog+SiteCallsin one MS SQL transaction. The repository'sInsertIfNotExistsAsyncswallows duplicates (M2 Bundle A fix); the SiteCalls upsert usesMERGE(or insert-if-not-exists then upsert-on-newer-status per CLAUDE.md). M3 must ensure the same Bundle A swallow pattern applies if duplicateCachedCallIdarrives.- AuditEvent ForwardState semantics in M3: cached-operation telemetry rows are site-emitted just like sync M2 rows, so the same site SQLite hot-path +
Pending→Forwardedlifecycle applies. The four lifecycle rows share a CorrelationId (the TrackedOperationId), but each is its own AuditEvent with a distinct EventId.
Affected projects: Commons, AuditLog, SiteCallAudit (new — minimum-viable surface), ConfigurationDatabase (new SiteCalls table migration), ExternalSystemGateway, StoreAndForward, Host. Tests across all of them + IntegrationTests.
Prerequisite call-out: This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — TrackedOperationId, site-local operation tracking SQLite, SiteCalls table at central, the existing-message CachedCallTelemetry (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
Acceptance criteria:
- New
SiteCallsMS SQL table + repo (no partitioning needed; this is operational state, not audit). - New
CachedCallTelemetrymessage in Commons carrying BOTH the cached-call operational fields AND anAuditEventpayload. - Site path:
CachedCallwrites the audit row to site SQLite (Kind = CachedEnqueued), creates the site operation-tracking row, and sends a combined telemetry packet. - Central path:
AuditLogIngestActor(extended) receives the combined packet, performs one transaction containing both theAuditLoginsert and theSiteCallsupsert. - Retry attempt →
Kind = CachedAttemptaudit row +SiteCallsstatus transition. Terminal →Kind = CachedTerminalaudit row +SiteCallsterminal status. - Integration test asserts: triggering a
CachedCallthat fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row withStatus = Delivered, all sharing the sameTrackedOperationIdcorrelation key.
M3 — Tasks (TDD-detail)
M3-T1: TrackedOperationId strong-typed ID
Files:
- Create:
src/ScadaLink.Commons/Types/TrackedOperationId.cs— readonly record struct wrappingGuid;New()/Parse(string)/ToString(). - Create:
tests/ScadaLink.Commons.Tests/Types/TrackedOperationIdTests.cs.
Steps:
- Failing test: round-trip via
ToString()/Parse()and equality semantics. - Implement.
- Run: pass.
- Commit:
feat(commons): TrackedOperationId strong type.
M3-T2: Site-local operation-tracking SQLite table + repo
Files:
- Create:
src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs— SQLite-backed store with columns:TrackedOperationId,Kind,TargetSummary,Status,RetryCount,LastError,CreatedAtUtc,UpdatedAtUtc,TerminalAtUtc, source provenance. Schema bootstrap on first use; uses the same write-lock pattern asSqliteAuditWriter. ImplementsIOperationTrackingStore(interface in Commons). - Create:
src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs—RecordEnqueueAsync,RecordAttemptAsync,RecordTerminalAsync,GetStatusAsync(TrackedOperationId),PurgeTerminalAsync(olderThanUtc). - Create:
tests/ScadaLink.SiteRuntime.Tests/Tracking/OperationTrackingStoreTests.cs.
Steps:
- Failing test: schema bootstrap creates the table.
- Failing test:
RecordEnqueueAsyncinserts aPendingrow;RecordAttemptAsyncupdatesStatus/RetryCount/LastError;RecordTerminalAsyncfinalises. - Failing test:
GetStatusAsyncreturns the latest snapshot (answersTracking.Status(id)site-locally). - Failing test:
PurgeTerminalAsyncremoves terminal rows older than threshold; non-terminal rows are kept regardless of age. - Implement.
- Run: pass.
- Commit:
feat(siteruntime): site-local operation tracking SQLite store.
M3-T3: Tracking.Status(id) API surface in SiteRuntime
Files:
- Modify:
src/ScadaLink.SiteRuntime/Scripting/TrackingApi.cs(new or existing — confirm via repo) — publicStatus(TrackedOperationId)method routed throughIOperationTrackingStore. - Modify: script trust-model allow-list to include the new
Tracking.*surface (confirm via grep). - Create:
tests/ScadaLink.SiteRuntime.Tests/Scripting/TrackingApiTests.cs.
Steps:
- Failing test:
Tracking.Status(unknownId)returns a documented "not found" sentinel. - Failing test:
Tracking.Status(knownId)returns the latest snapshot. - Implement.
- Run: pass.
- Commit:
feat(siteruntime): Tracking.Status(id) script API.
M3-T4: CachedCallTelemetry Commons message — carries both operational + audit content
Files:
- Create:
src/ScadaLink.Commons/Messages/Integration/CachedCallTelemetry.cs— fields:TrackedOperationId,Kind(CachedEnqueued/CachedAttempt/CachedTerminalaudit kind), operational status, retry count, last error, timestamps, and a nestedAuditEventcarrying the audit row content. Documented as additive-only per Commons REQ-COM-5a. - Create:
tests/ScadaLink.Commons.Tests/Messages/Integration/CachedCallTelemetryTests.cs.
Steps:
- Failing test: construct a telemetry packet for each of the three lifecycle kinds; verify the nested AuditEvent's channel/kind alignment (e.g., a
CachedAttemptpacket must carry anAuditEventwithKind = CachedAttempt). - Failing test: serialization round-trip preserves both layers.
- Implement.
- Run: pass.
- Commit:
feat(commons): CachedCallTelemetry carrying combined operational + audit content.
M3-T5: SiteCalls MS SQL table — EF mapping
Files:
- Create:
src/ScadaLink.Commons/Entities/Audit/SiteCall.cs— POCO record per Component-SiteCallAudit.md. - Create:
src/ScadaLink.ConfigurationDatabase/Entities/SiteCallEntityTypeConfiguration.cs—IEntityTypeConfiguration<SiteCall>with PK onTrackedOperationId, indexes on(SourceSite, CreatedAtUtc)and(Status, UpdatedAtUtc). - Modify:
ScadaLinkDbContext.cs—public DbSet<SiteCall> SiteCalls => Set<SiteCall>();. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Entities/SiteCallEntityTypeConfigurationTests.cs.
Steps:
- Failing test: model exposes
SiteCallstable with documented columns and indexes. - Implement.
- Run: pass.
- Commit:
feat(configdb): map SiteCall to SiteCalls table.
M3-T6: SiteCalls migration
Files:
- Create:
src/ScadaLink.ConfigurationDatabase/Migrations/<ts>_AddSiteCallsTable.csviadotnet ef migrations add AddSiteCallsTable. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddSiteCallsTableMigrationTests.cs.
Steps:
- Failing test: applying the migration creates the
SiteCallstable with PK + indexes. - Generate + adjust migration.
- Run: pass.
- Commit:
feat(configdb): add SiteCalls migration.
M3-T7: ISiteCallAuditRepository + EF impl
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs—UpsertAsync(SiteCall)(insert-if-not-exists byTrackedOperationId, otherwise update-on-newer-status using monotonic status progression),GetAsync(TrackedOperationId),QueryAsync(filter, paging),PurgeTerminalAsync(olderThanUtc). - Create:
src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs. - Modify:
ServiceCollectionExtensions.cs— register. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs.
Steps:
- Failing test: first
UpsertAsyncinserts; secondUpsertAsyncwith an advanced status updates; anUpsertAsyncwith an older status is a no-op (monotonic progression). - Failing test: paged query supports the documented filter set.
- Implement.
- Run: pass.
- Commit:
feat(configdb): ISiteCallAuditRepository + EF impl.
M3-T8: SiteCallAuditActor skeleton (central singleton)
Files:
- Create:
src/ScadaLink.SiteCallAudit/(new project) —SiteCallAuditActor.cs+ScadaLink.SiteCallAudit.csproj+ServiceCollectionExtensions.cs. Actor handlesUpsertSiteCallCommandmessages by callingISiteCallAuditRepository.UpsertAsync. Note: full reconciliation, KPIs, and Retry/Discard relay are explicitly deferred — this is the minimum-viable surface for M3. - Modify:
ScadaLink.slnxto include the new project. - Create:
tests/ScadaLink.SiteCallAudit.Tests/SiteCallAuditActorTests.cs.
Steps:
- Failing test: actor receives
UpsertSiteCallCommand, calls repo, replies with ack. - Failing test: actor swallows transient DB errors and surfaces them as health metrics (does NOT crash the central singleton).
- Implement.
- Run: pass.
- Commit:
feat(scaudit): SiteCallAuditActor minimum viable surface.
M3-T9: Extend sitestream.proto with IngestCachedTelemetry RPC OR extend IngestAuditEvents
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— preferred approach: add a new top-level RPCrpc IngestCachedTelemetry(CachedTelemetryBatch) returns (IngestAck);and amessage CachedTelemetryPacket { AuditEventDto audit_event = 1; SiteCallOperationalDto operational = 2; }plusmessage CachedTelemetryBatch { repeated CachedTelemetryPacket packets = 1; }. Decision should be confirmed during M3's brainstorm. - Build to regenerate.
- Create:
tests/ScadaLink.Communication.Tests/Protos/CachedTelemetryProtoTests.cs.
Steps:
- Failing test: round-trip a populated
CachedTelemetryPacket. - Add proto + rebuild.
- Run: pass.
- Commit:
feat(comms): IngestCachedTelemetry RPC + combined telemetry messages.
M3-T10: Extend AuditLogIngestActor for combined telemetry — dual-write transaction
Files:
- Modify:
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs— add a handler for the cached telemetry message. Inside a singleDbContexttransaction: (a) callIAuditLogRepository.InsertIfNotExistsAsync(auditEvent), then (b) callISiteCallAuditRepository.UpsertAsync(operationalState). Both must succeed or both must roll back. - Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs— route the new RPC to the central actor. - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorCombinedTelemetryTests.cs.
Steps:
- Failing test: a single combined packet produces one AuditLog row AND one SiteCalls row (or upsert).
- Failing test: when the SiteCalls upsert throws, the AuditLog insert is rolled back (no orphan rows).
- Failing test: when the AuditLog insert is a no-op (duplicate
EventId), the SiteCalls upsert still runs. - Failing test: when both rows already exist with monotonic-equal statuses, the operation is a no-op overall (full idempotency).
- Implement.
- Run: pass.
- Commit:
feat(auditlog): combined telemetry dual-write transaction.
M3-T11: ESG CachedCallAsync — emit CachedEnqueued on enqueue
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:75–136(cached call) — at the moment of buffering into S&F: build anAuditEvent(channel=ApiOutbound, kind=CachedEnqueued) AND aSiteCallOperationalDto(status=Pending); package as aCachedTelemetryPacket; hand to the combined-telemetry forwarder. - Modify:
src/ScadaLink.ExternalSystemGateway/Cached/CachedCallTelemetryForwarder.cs(new) — accumulates packets and posts toSiteAuditTelemetryActor(or a sibling actor — decision in milestone brainstorm). - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/CachedCallEnqueueEmissionTests.cs.
Steps:
- Failing test: an enqueued cached call produces exactly one packet with
kind=CachedEnqueued. - Implement.
- Run: pass.
- Commit:
feat(esg): CachedCall emits CachedEnqueued combined telemetry on buffering.
M3-T12: ESG CachedCallAsync — emit CachedAttempt per retry
Files:
- Modify:
src/ScadaLink.StoreAndForward/retry loop (locate the per-attempt callback site) to emit aCachedAttemptpacket on each attempt (success OR transient failure). - Create:
tests/ScadaLink.StoreAndForward.Tests/CachedCallAttemptEmissionTests.cs.
Steps:
- Failing test: an attempt that returns HTTP 500 produces a packet with
kind=CachedAttempt,status=TransientFailure,HttpStatus=500. - Failing test: a successful attempt produces a packet with
kind=CachedAttempt,status=Success,HttpStatus=200. - Implement.
- Run: pass.
- Commit:
feat(snf): CachedCall emits CachedAttempt per retry.
M3-T13: ESG CachedCallAsync — emit CachedTerminal on terminal state
Files:
- Modify: same retry-loop terminal-transition site — on
Delivered/Failed/Parked/Discarded, emit one finalCachedTerminalpacket. - Create:
tests/ScadaLink.StoreAndForward.Tests/CachedCallTerminalEmissionTests.cs.
Steps:
- Failing test: a cached call that succeeds on attempt 3 produces (in order): 1
CachedEnqueued, 3CachedAttempt, 1CachedTerminal(withstatus=Delivered). - Failing test: a cached call that exhausts retries produces a final
CachedTerminalwithstatus=Parked. - Implement.
- Run: pass.
- Commit:
feat(snf): CachedCall emits CachedTerminal on lifecycle terminal.
M3-T14: Database.CachedWrite — mirror the three-lifecycle emission for DB cached writes
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs(or equivalent — confirm via repo) — same three-event emission pattern as ESG cached calls, butchannel=DbOutbound. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/CachedWriteLifecycleEmissionTests.cs.
Steps:
- Failing test: a
CachedWritethat succeeds first try producesCachedEnqueued+CachedAttempt(Success)+CachedTerminal(Delivered). - Failing test: a
CachedWritewith transient retry mirrors the ESG pattern. - Implement.
- Run: pass.
- Commit:
feat(esg): Database.CachedWrite emits three-lifecycle combined telemetry.
M3-T15: Host registration — SiteCallAuditActor central singleton
Files:
- Modify:
src/ScadaLink.Host/Actors/AkkaHostedService.cs— registerSiteCallAuditActorcentral singleton + proxy alongsideAuditLogIngestActor. - Modify:
src/ScadaLink.SiteCallAudit/ServiceCollectionExtensions.cs— register actor props. - Modify:
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs— extend to assertSiteCallAuditActorproxy resolves.
Steps:
- Failing test: starting host produces the new singleton's proxy.
- Implement.
- Run: pass.
- Commit:
feat(host): register SiteCallAuditActor central singleton.
M3-T16: Integration test — cached external call audit (end-to-end)
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs— site + central; stub external system returns 500 twice then 200; script invokesExternalSystem.CachedCall("System","Method", args); assert AuditLog has 5 rows (Enqueued + 3 Attempts + Terminal) AND SiteCalls has 1 row withStatus=DeliveredANDTracking.Status(id)reports the same.
Steps:
- Sketch test against IntegrationTests harness.
- Run: fail (likely surfacing earlier-task gaps).
- Iterate fixes until pass.
- Commit:
test(auditlog): cached call combined telemetry end-to-end.
M3-T17: Integration test — cached DB write audit (end-to-end)
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CachedWriteCombinedTelemetryTests.cs— mirror M3-T16 against the DB cached path.
Steps:
- Sketch.
- Iterate.
- Commit:
test(auditlog): cached DB write combined telemetry end-to-end.
M3-T18: Idempotency test — duplicate telemetry doesn't double-insert / double-upsert
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/CombinedTelemetryIdempotencyTests.cs— force the same packet to arrive twice (simulated telemetry retry); assert AuditLog still has exactly one row and SiteCalls upsert is monotonic.
Steps:
- Sketch.
- Pass.
- Commit:
test(auditlog): combined telemetry idempotency on retried packets.
M3 — Risk callouts
- Combined telemetry packet evolution: design the proto so future cached audit-kind additions are non-breaking (avoid
oneoffor fields you'll extend; use sparse field numbers). - Dual-write transaction failure modes: the single
DbContexttransaction at central spans two tables; ensure retry behaviour on transient connection errors works as expected (existingIDbExecutionStrategypatterns may apply). - Idempotency cross-table: AuditLog dedups on
EventId, SiteCalls dedups onTrackedOperationIdwith status-monotonic update. A retried packet whose AuditLog row exists must still upsert SiteCalls (no short-circuit). - Scope discipline: M3 inlines the minimum surface for #22 and cached-call tracking. Full #22 reconciliation, KPIs, and Retry/Discard relay are deferred. Note in the milestone brainstorm whether any extra #22 surface is genuinely needed for M3 acceptance criteria — if not, defer aggressively.
Tracking.Statussemantics: confirmed authoritative site-locally per design; no central round-trip. Ensure the test in M3-T3 reflects this.
M4 — Remaining boundary emission
Goal: Every channel × kind from Component-AuditLog.md produces a row when its boundary call fires.
Affected projects: ExternalSystemGateway (sync DB writes/reads, cached DB writes), SiteRuntime (Database surface exposing them), NotificationOutbox (central direct-write of Attempt/Terminal), InboundAPI (middleware). Tests across all.
M3 realities to honor:
- Vocabulary: use the M1-aligned enums. The roadmap's old
SyncWrite/SyncRead/Notification.Attempt/Notification.Terminal/Notification.Enqueued/ApiInbound.Completed/PermanentFailurestrings are pre-M1 spec wording — DO NOT use those names in code. Translation:
- sync DB write/read →
AuditKind.DbWrite(Channel=DbOutbound); distinguish read vs write viaExtra(e.g.,{"op": "read", "rowsReturned": 42}).- notification delivery attempt →
AuditKind.NotifyDeliverwithAuditStatus.Attempted.- notification delivery terminal →
AuditKind.NotifyDeliverwithAuditStatus.Delivered|Failed|Parked|Discarded.- notification submit (site-emit) →
AuditKind.NotifySendwithAuditStatus.Submitted.- inbound API success →
AuditKind.InboundRequestwithAuditStatus.Delivered.- inbound API auth failure →
AuditKind.InboundAuthFailurewithAuditStatus.Failed.- "permanent failure" →
AuditStatus.Failed. "Transient failure" never lands a terminal row.- Mapper consolidation: M3 surfaced 4 DTO mappers (AuditEventMapper, SiteStreamGrpcServer inline, SiteCall DTO mapper, DirectActorSiteStreamAuditClient test stub). M4 should extract a single
IntegrationMappershelper insrc/ScadaLink.Commons/Messages/Integration/or similar to consolidate before adding more channels. The project-ref cycle that motivated the inline duplication can be broken by moving the mapper into Commons (proto types are auto-generated in Communication; the mapper just needs the proto types reachable from Commons via a transitive ref).OnCachedTelemetryWithoutDualWriteAsynctest-mode fallback: inAuditLogIngestActorfor the single-repo ctor. M4 may deprecate the single-repo constructor entirely and migrate tests to the IServiceProvider+harness pattern.- Site SQLite drain for OperationTrackingStore: M3 wrote the tracking half site-locally but no drain pipeline pushes it to central — central reads SiteCalls operational state via the dual-write transaction only. If M4 needs central visibility into in-flight (non-terminal) tracking entries, plan a drain.
SiteCallAuditActor: wired in M3 as a cluster singleton + proxy but not on the M3 hot path. M4 (or M6 reconciliation) is the natural first direct caller — wire one production code path through it.- Vocabulary correction in the body of M4 below: every M4-T*1-N step that still says
Status=PermanentFailure,Kind=SyncWrite/SyncRead/Completed/Attempt/Terminal/Enqueuedis stale; apply the translation above when implementing.
Acceptance criteria:
- Sync
Database.Connection().Execute()→DbOutbound.DbWriterow (withExtra.op = "write"androwsAffected);ExecuteReader→DbOutbound.DbWriterow (withExtra.op = "read"androwsReturned). Parameter values captured by default; per-connection redaction opt-in supported. Database.CachedWrite→ three lifecycle rows via the combined telemetry built in M3.- Notification Outbox dispatcher: every delivery attempt writes
NotifyDeliverwithStatus=Attempted; terminal writesNotifyDeliverwithStatus={Delivered|Failed|Parked|Discarded}. Site-emittedNotifySend(Status=Submitted) flows through the standard site→central audit path. Audit-write failure never affects delivery. - Inbound API middleware writes one
ApiInbound.InboundRequestrow per request, beforeawait next()returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response. Auth failures emitApiInbound.InboundAuthFailurewithStatus=Failed.
M4 — Tasks (TDD-detail)
M4-T1: ESG Database.Connection().ExecuteAsync audit emission — DbOutbound.SyncWrite
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs(or wherever the script-facingExecute*lives — confirm via repo) — wrap the call site to emit anAuditEvent(channel=DbOutbound, kind=SyncWrite) on everyExecute/ExecuteScalar. Capture statement text, parameter values (default; redaction in M5),DurationMs,rowsAffectedinExtra. - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncWriteEmissionTests.cs.
Steps:
- Failing test:
Execute("INSERT INTO ...", new {...})emits one event withChannel=DbOutbound,Kind=SyncWrite, statement text + parameter values captured. - Failing test:
ExecuteScalaremits the same kind. - Failing test: execute that throws → emission with
Status=PermanentFailure,ErrorMessagepopulated. - Failing test: audit-write failure does NOT abort the SQL call (script sees the original outcome).
- Implement.
- Run: pass.
- Commit:
feat(esg): emit DbOutbound.SyncWrite on script-initiated Execute*.
M4-T2: ESG Database.Connection().ExecuteReaderAsync audit emission — DbOutbound.SyncRead
Files:
- Modify:
src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs— wrapExecuteReaderto emitDbOutbound.SyncRead. Capture statement, parameter values,DurationMs,rowsReturnedinExtra. Response body capture defaults to NOT including rows; opt-in via per-connection config (M5). - Create:
tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncReadEmissionTests.cs.
Steps:
- Failing test:
Query<T>("SELECT ...")emits one event withChannel=DbOutbound,Kind=SyncRead. - Failing test:
rowsReturnedappears inExtra. - Implement.
- Run: pass.
- Commit:
feat(esg): emit DbOutbound.SyncRead on script-initiated reads.
M4-T3: NotificationOutboxActor — inject ICentralAuditWriter
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:22–68— constructor acceptsICentralAuditWriter. Wire into DI inServiceCollectionExtensions.cs. - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAuditInjectionTests.cs.
Steps:
- Failing test: actor's
Propsfactory accepts anICentralAuditWriter; constructor stores it. - Implement.
- Run: pass.
- Commit:
feat(notif): NotificationOutboxActor accepts ICentralAuditWriter.
M4-T4: NotificationOutboxActor — emit Notification.Attempt per dispatcher attempt
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csdispatcher attempt branch (after each delivery attempt resolves) — emitNotification.Attemptrow withStatusmapped from attempt result (Success,TransientFailure,PermanentFailure). - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAttemptEmissionTests.cs.
Steps:
- Failing test: a successful attempt → exactly one event with
Kind=Attempt,Status=Success. - Failing test: a transient-failure attempt →
Status=TransientFailure,ErrorMessagepopulated. - Failing test: when
ICentralAuditWriter.WriteAsyncthrows, the dispatcher's per-attemptNotificationsrow update STILL succeeds (audit must never block delivery). - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Attempt per dispatcher attempt.
M4-T5: NotificationOutboxActor — emit Notification.Terminal on terminal transition
Files:
- Modify:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csterminal branches (Delivered/Parked/Discardedtransitions) — emitNotification.Terminalrow. - Create:
tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorTerminalEmissionTests.cs.
Steps:
- Failing test: a notification that succeeds emits one
Terminalevent withStatus=Delivered. - Failing test: a Parked transition emits
Status=Parked. - Failing test: an operator Discard emits
Status=Discarded. - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Terminal on terminal transitions.
M4-T6: Site-emitted Notification.Enqueued
Files:
- Modify:
src/ScadaLink.NotificationService/(or wherever the site-sideNotify.To().Send()runs — confirm via repo) — at the moment of buffering into the site S&F: emit a site-sideAuditEvent(channel=Notification, kind=Enqueued) viaIAuditWriter. Telemetry forwards as usual. - Create:
tests/ScadaLink.NotificationService.Tests/NotifyEnqueueAuditEmissionTests.cs.
Steps:
- Failing test:
Notify.To("list").Send("subject", "body")emits one event withChannel=Notification,Kind=Enqueued, target=list name, body captured (subject too). - Failing test: audit-write failure does not abort
Send(). - Implement.
- Run: pass.
- Commit:
feat(notif): emit Notification.Enqueued from site-side Notify.Send.
M4-T7: Inbound API — AuditWriteMiddleware
Files:
- Create:
src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs— ASP.NET Core middleware. Afterawait next()(so the response is fully resolved but BEFORE flush — usingHttpResponse.OnStartingor buffered body), build anAuditEvent(channel=ApiInbound, kind=Completed,Actor=API key NAME from request context,Target=method name,HttpStatus,DurationMs,RequestSummary/ResponseSummary). CallICentralAuditWriter.WriteAsyncinsidetry/catch— failures never affect the response. - Modify:
src/ScadaLink.InboundAPI/Startup.cs(or wherever the pipeline is configured) — register middleware. - Create:
tests/ScadaLink.InboundAPI.Tests/Middleware/AuditWriteMiddlewareTests.cs.
Steps:
- Failing test: a successful POST to
/api/{method}produces oneApiInbound.Completedevent withHttpStatus=200. - Failing test: a 400/401/500 response produces an event with the matching
HttpStatusandStatusmapped (PermanentFailurefor 4xx,TransientFailurefor 5xx). - Failing test:
Actorcarries the API key NAME (never the key material). - Failing test: when
ICentralAuditWriter.WriteAsyncthrows, the HTTP response is unchanged (success stays success). - Failing test: request remote IP and User-Agent appear in
Extra. - Implement.
- Run: pass.
- Commit:
feat(inbound): AuditWriteMiddleware emitting ApiInbound.Completed per request.
M4-T8: Register middleware in the ASP.NET pipeline
Files:
- Modify:
src/ScadaLink.InboundAPI/Startup.cs/Program.cs—app.UseMiddleware<AuditWriteMiddleware>()placed AFTER auth (soActorresolves) and BEFORE the script-execution handler. - Create:
tests/ScadaLink.InboundAPI.Tests/Middleware/MiddlewareOrderTests.cs.
Steps:
- Failing test: pipeline ordering puts AuditWrite after auth, before script execution.
- Implement.
- Run: pass.
- Commit:
feat(inbound): register AuditWriteMiddleware in pipeline.
M4-T9: Integration test — DB sync emission
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/DatabaseSyncEmissionTests.cs— script invokesDatabase.Connection().Execute("INSERT ...")andQuery<T>("SELECT ..."); assert central AuditLog has oneDbOutbound.SyncWriterow and oneDbOutbound.SyncReadrow.
Steps:
- Sketch, iterate, commit:
test(auditlog): DB sync emission integration test.
M4-T10: Integration test — Notify dispatcher audit trail
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/NotifyDispatcherAuditTrailTests.cs— script callsNotify.To(list).Send(...); stub SMTP returns transient then success; assert AuditLog hasEnqueued+ 2Attempt(one transient, one success) + 1Terminal(Delivered).
Steps:
- Sketch, iterate, commit:
test(auditlog): Notify dispatcher audit trail end-to-end.
M4-T11: Integration test — Inbound API request audit
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/InboundApiAuditTests.cs— POST to/api/{method}with a valid API key; assert oneApiInbound.Completedrow with the expectedActor(key name),HttpStatus=200, request/response bodies captured. - Also test: POST with a bad API key → row with
Actor=NULL(or ""),HttpStatus=401,ExtracarriesremoteIp.
Steps:
- Sketch, iterate, commit:
test(auditlog): Inbound API request audit end-to-end.
M4-T12: Integration test — audit-write failure never aborts the action
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/AuditWriteFailureSafetyTests.cs— inject a brokenICentralAuditWriter(always throws) for one test; assert that ESG sync calls, ESG cached calls, DB writes, Inbound API calls, and Notification dispatch all still complete successfully and the script/caller sees the normal outcome.
Steps:
- Sketch test with broken-writer DI override per scenario.
- Run, fix any spots where audit-write exceptions leak.
- Commit:
test(auditlog): audit failures never abort user-facing actions.
M4 — Risk callouts
- Inbound API correlation IDs: if upstream tracing headers (W3C
traceparent) are present, prefer them asCorrelationId; otherwise generate. Confirm whether existing middleware sets a request ID we can reuse. AuditWriteMiddlewareplacement: must run AFTER authentication so the API key NAME is inHttpContext.User. Verify with the middleware-order test in M4-T8.- Notification dispatcher loop hot-path: audit emission must NOT extend per-attempt latency materially. Bench in M4-T10 if there's any concern.
- DB parameter capture: parameter values are captured verbatim by default (per design); redaction is opt-in (M5). For M4, just capture — don't try to second-guess what's sensitive.
M5 — Payload + redaction policy
Goal: Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
Affected projects: AuditLog (policy engine + options), ExternalSystemGateway (HTTP header redactors, SQL param redaction hook), InboundAPI (header redactors, body capture), NotificationOutbox (subject/body capture follows existing rules). Tests.
Acceptance criteria:
- A
IAuditPayloadFilterservice is invoked between event construction and write. Truncates to default cap; raises to error cap on non-Successrows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and incrementsAuditRedactionFailure. - Configuration test: changing
appsettings.jsonredactors changes runtime behaviour (no rebuild needed for regex changes). - Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
M5 — Tasks (TDD-detail)
M5-T1: IAuditPayloadFilter interface
Files:
- Create:
src/ScadaLink.AuditLog/Payload/IAuditPayloadFilter.cs— single methodAuditEvent Apply(AuditEvent rawEvent)that returns a filtered copy (truncation + redaction applied). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/PayloadFilterContractTests.cs.
Steps:
- Failing test: interface exists, method signature matches.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): IAuditPayloadFilter contract.
M5-T2: DefaultAuditPayloadFilter — truncation (default + error cap)
Files:
- Create:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— composesTruncationStage+ redactors (M5-T3/T4/T5). Truncation rule: default cap =AuditLogOptions.DefaultCapBytes(8 KB); error cap =ErrorCapBytes(64 KB) applied whenStatusis NOT in {Success,Delivered,Enqueued}. UTF-8 byte-safe boundary (no mid-character cuts). SetPayloadTruncated = truewhen applied. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/TruncationTests.cs.
Steps:
- Failing test: 10 KB success body → truncated to 8 KB;
PayloadTruncated = true. - Failing test: 10 KB body on
Status=TransientFailure→ not truncated (under 64 KB cap);PayloadTruncated = false. - Failing test: 70 KB body on
Status=PermanentFailure→ truncated to 64 KB;PayloadTruncated = true. - Failing test: multi-byte UTF-8 character that would straddle the cap is not split mid-character.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety.
M5-T3: HTTP header redaction
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— add header-redaction stage. Strips header values for names inAuditLogOptions.HeaderRedactList(default:Authorization,Cookie,Set-Cookie,X-API-Key) and any matching configured regex. Replacement:<redacted>. - Headers travel in
RequestSummary/ResponseSummary(JSON of headers + body) OR inExtra— confirm format during M5 brainstorm and document. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/HeaderRedactionTests.cs.
Steps:
- Failing test:
Authorization: Bearer xyzinRequestSummarybecomesAuthorization: <redacted>. - Failing test: case-insensitive match (
authorizationredacted too). - Failing test: custom redact-list extension works (operator adds
X-Custom-Token). - Implement.
- Run: pass.
- Commit:
feat(auditlog): HTTP header redaction.
M5-T4: Body regex redaction with safety net
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— add body-regex stage. Global redactors apply to all bodies; per-target redactors apply to matchingTarget. Patterns precompiled at startup; rejected if compile takes >100ms. - Safety net: if a regex throws at runtime, replace the body with
<redacted: redactor error>and incrementAuditRedactionFailure(M5-T7). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/BodyRegexRedactionTests.cs.
Steps:
- Failing test:
"password":"hunter2"in a JSON body →"password":"<redacted>"when the default global redactor pattern matches. - Failing test: per-target redactor only applies to matching
Target. - Failing test: a redactor that throws → body becomes
<redacted: redactor error>AND the counter increments. - Failing test: catastrophic backtracking regex rejected at startup.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): body regex redaction with over-redaction safety net.
M5-T5: SQL parameter redaction (per-connection opt-in)
Files:
- Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— forChannel=DbOutboundevents, parseExtra.paramsand redact parameter VALUES whose NAME matches the connection's configured regex (fromAuditLogOptions.PerTargetOverrides[<connection name>].RedactSqlParamsMatching). - Create:
tests/ScadaLink.AuditLog.Tests/Payload/SqlParamRedactionTests.cs.
Steps:
- Failing test: no opt-in config → params captured verbatim (default behaviour).
- Failing test: opt-in regex
@apikey|@tokenredacts those param VALUES but keeps OTHER param values intact. - Failing test: regex applies to parameter NAMES (not values) and is case-insensitive.
- Implement.
- Run: pass.
- Commit:
feat(auditlog): per-connection SQL parameter redaction (opt-in).
M5-T6: Wire filter into emission paths
Files:
- Modify: ESG (M2-T10, M3-T11/12/13, M4-T1/T2), InboundAPI middleware (M4-T7), NotificationOutbox (M4-T4/T5), NotificationService site path (M4-T6) — every emission site receives
IAuditPayloadFilterfrom DI and callsfilter.Apply(rawEvent)before handing to the writer. - Modify:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs— registerDefaultAuditPayloadFilterasIAuditPayloadFiltersingleton. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/FilterIntegrationTests.cs— assert each emitter calls through the filter before the writer.
Steps:
- Failing test: ESG emission writes the filter-applied event (not the raw one).
- Failing test: same for each other emitter.
- Implement by injecting the filter into each emitter and routing through it.
- Run: pass.
- Commit:
feat(auditlog): wire payload filter into all emission paths.
M5-T7: AuditRedactionFailure health metric
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs(or equivalent) — addAuditRedactionFailurecounter. - Modify:
src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs— increment on every redactor exception. - Create:
tests/ScadaLink.AuditLog.Tests/Payload/AuditRedactionFailureMetricTests.cs.
Steps:
- Failing test: 5 redactor exceptions → counter shows 5.
- Implement.
- Run: pass.
- Commit:
feat(health): AuditRedactionFailure metric.
M5-T8: Configuration test — appsettings.json round-trip
Files:
- Create:
tests/ScadaLink.AuditLog.Tests/Configuration/AuditLogOptionsBindingTests.cs— bind a realisticappsettings.jsonblock (with header-redact list, body redactors, per-target overrides, retention) and assert values appear inIOptions<AuditLogOptions>. Re-bind with a hot-reload simulation and assert filter behaviour changes accordingly.
Steps:
- Failing test: bind + read → matches.
- Failing test: change config → filter behaviour updates without restart (
IOptionsMonitorpattern). - Implement (likely needs adjusting M1-T9 from
IOptionstoIOptionsMonitor). - Run: pass.
- Commit:
feat(auditlog): hot-reloadable AuditLogOptions.
M5-T9: Performance test — hot-path latency budget
Files:
- Create:
tests/ScadaLink.PerformanceTests/AuditLog/HotPathLatencyTests.cs— benchfilter.Apply(event)for a 4 KB JSON body with the default redactor set; target P95 < 50 µs (number set during M5 brainstorm based on baseline measurements). Also benchSqliteAuditWriter.WriteAsyncend-to-end target P95 < 500 µs.
Steps:
- Sketch test using BenchmarkDotNet or the existing performance test harness.
- Run baseline; if over budget, profile + optimise.
- Commit:
test(auditlog): hot-path latency budget.
M5-T10: Safety-net test — bad regex over-redacts
Files:
- Create:
tests/ScadaLink.AuditLog.Tests/Payload/RedactionSafetyNetTests.cs— register a deliberately bad regex that throws; assert the body is over-redacted (<redacted: redactor error>) rather than under-redacted (passing through unmodified).
Steps:
- Failing test.
- Verify the safety net from M5-T4 covers this.
- Commit:
test(auditlog): redaction safety net over-redacts on regex failure.
M5 — Risk callouts
- Regex catastrophic backtracking: validate patterns at startup with a short-running compile test; reject patterns that exceed a timeout. Document the rejection behaviour.
- Order of stages matters: truncation BEFORE redaction means a redaction target halfway through the cap could get cut. Confirm the chosen order during M5 brainstorm; current draft applies redaction FIRST, then truncation — that way the redacted-replacement text is what gets truncated, not a half-secret.
- Body capture format: decide whether headers travel in
RequestSummary/ResponseSummaryorExtra. Affects M5-T3's redaction strategy. Lock during the M5 brainstorm. - Hot-reload semantics:
IOptionsMonitorsnapshots — ensure pre-compiled regex cache invalidates when config changes.
M6 — Reconciliation, purge, partition maintenance, health metrics
Goal: Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
Affected projects: AuditLog (3 new actors: SiteAuditReconciliationActor, AuditLogPurgeActor, partition-maintenance worker), Communication (the PullAuditEvents RPC), HealthMonitoring (5 new metrics), ConfigurationDatabase (partition-roll-forward SQL helper).
Acceptance criteria:
SiteAuditReconciliationActorruns every 5 minutes per site; pulls events the site reports asPending; central performsInsertIfNotExistsAsyncthen signals the site to flip those rows toReconciled.AuditLogPurgeActorruns daily; for each partition older thanRetentionDays, switches it out to a staging table and drops the staging table. Emits anAuditLog:Purgedevent with rowcount + duration.- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
- 5 new health metrics published per site:
SiteAuditBacklog(count + oldest + bytes),SiteAuditWriteFailures,SiteAuditTelemetryStalled; and per central node:CentralAuditWriteFailures,AuditRedactionFailure. - Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
M6 — Tasks (TDD-detail)
M6-T1: Extend sitestream.proto with PullAuditEvents RPC
Files:
- Modify:
src/ScadaLink.Communication/Protos/sitestream.proto— addrpc PullAuditEvents(PullAuditEventsRequest) returns (PullAuditEventsResponse);and the corresponding request/response messages (sinceUtc,batchSize,events,more_available). - Build: regenerate stubs.
- Create:
tests/ScadaLink.Communication.Tests/Protos/PullAuditEventsProtoTests.cs.
Steps:
- Failing test: round-trip request and response messages.
- Add proto + rebuild.
- Run: pass.
- Commit:
feat(comms): PullAuditEvents RPC for audit reconciliation.
M6-T2: Site-side handler for PullAuditEvents
Files:
- Modify:
src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs(the site-side server inside each site cluster) — handlePullAuditEventsby readingPendingrows older thanSinceUtcfromSqliteAuditWriter(read-only path) and streaming them back. After ack, mark themReconciled. - Create:
tests/ScadaLink.Communication.Tests/SiteStreamPullAuditEventsTests.cs.
Steps:
- Failing test: a pull request with N pending rows returns those rows; rows flip to
Reconciledafter the response is acked. - Implement.
- Run: pass.
- Commit:
feat(comms): site-side PullAuditEvents handler.
M6-T3: SiteAuditReconciliationActor — central, timer-driven
Files:
- Create:
src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs— central singleton; on a 5-minute timer (configurable), for each known site, ask: "what's your oldestPendingrow?" If the site reports a non-draining backlog (compared with the previous tick), issue aPullAuditEventsand ingest the returned rows viaIAuditLogRepository.InsertIfNotExistsAsync. Keeps a per-siteLastReconciledAtcursor. - Create:
tests/ScadaLink.AuditLog.Tests/Central/SiteAuditReconciliationActorTests.cs.
Steps:
- Failing test: actor's timer fires every 5 minutes (test via
TestKitvirtual scheduler). - Failing test: when site reports non-draining backlog over two consecutive ticks, the actor issues a pull and ingests results.
- Failing test: idempotency — re-running the pull doesn't double-insert (relies on AuditLog PK).
- Implement.
- Run: pass.
- Commit:
feat(auditlog): SiteAuditReconciliationActor.
M6-T4: AuditLogPurgeActor — daily partition-switch purge
M1 reality:
IAuditLogRepository.SwitchOutPartitionAsyncships in M1 as aNotSupportedExceptionstub because the non-alignedUX_AuditLog_EventIdunique index (necessary for first-write-wins idempotency without includingOccurredAtUtcin the unique key) blocksALTER TABLE … SWITCH PARTITION. M6 must replace the stub with the drop-and-rebuild dance: (1)DROP INDEX UX_AuditLog_EventId ON dbo.AuditLog;(2) create the staging table on[PRIMARY]with identical schema; (3)ALTER TABLE dbo.AuditLog SWITCH PARTITION <n> TO dbo.<staging>;(4)DROP TABLE dbo.<staging>;(5)CREATE UNIQUE NONCLUSTERED INDEX UX_AuditLog_EventId ON dbo.AuditLog(EventId) ON [PRIMARY];. The small unique-index outage window during the switch is acceptable — partition switches are O(seconds) andInsertIfNotExistsAsynccallers will see a transient retry surface; document this in the actor.
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs— central singleton; daily timer. For each partition whose latestOccurredAtUtcis older thanAuditLogOptions.RetentionDays, callIAuditLogRepository.SwitchOutPartitionAsync(partitionBoundary). Emit anAuditLogPurgedevent (logged + metricked) with partition range, row count, and duration. - Modify:
src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs— replace the M1NotSupportedExceptionstub with the drop-and-rebuild dance described above. Wrap in a transaction. Add a regression test asserting the unique index is rebuilt and the data left behind matches the un-switched partitions. - Create:
tests/ScadaLink.AuditLog.Tests/Central/AuditLogPurgeActorTests.cs.
Steps:
- Failing test: with retention = 30 days, partitions older than 30 days are switched out; newer partitions are kept.
- Failing test: purge emits the
AuditLogPurgedevent with correct row count. - Failing test: partition switch under the
scadalink_audit_purgerrole completes successfully (requires the role to ALSO be granted permission to DROP/CREATE theUX_AuditLog_EventIdindex — extend the role grants in this milestone if not in M1's role definition; M1 grantedALTER ON SCHEMA::dbowhich should cover this). - Failing test: post-switch,
InsertIfNotExistsAsynccontinues to enforce first-write-wins (unique index successfully rebuilt). - Implement.
- Run: pass.
- Commit:
feat(auditlog): AuditLogPurgeActor with partition-switch purge (drop-and-rebuild around UX_AuditLog_EventId).
M6-T5: AuditLogPartitionMaintenanceService — monthly roll-forward
M1 reality: the partition function
pf_AuditLog_Monthships with 24 explicit monthly boundaries (Jan 2026 through Dec 2027) on filegroup[PRIMARY]. M6's hosted service must keep this rolling — split a new boundary for the upcoming month and (if a separate hot/cold filegroup strategy is adopted later) drop oldest boundaries via MERGE after purge.
Files:
- Create:
src/ScadaLink.AuditLog/Central/AuditLogPartitionMaintenanceService.cs—IHostedServicethat runs on startup AND every month: ensures the next month's partition range exists onpf_AuditLog_Monthand the partition scheme has a destination filegroup. Implemented via raw SQL (ALTER PARTITION FUNCTION pf_AuditLog_Month SPLIT RANGE (<next-month-boundary>)); ensure the scheme staysALL TO ([PRIMARY])unless production deployment overrides per-filegroup. - Create:
tests/ScadaLink.AuditLog.Tests/Central/PartitionMaintenanceServiceTests.cs(integration viaMsSqlMigrationFixture; runs against a temp DB).
Steps:
- Failing test: against a DB seeded with the M1 migration (covering through Dec 2027), running the service in Apr 2028 splits a Jan 2028 boundary so the function has a range for "current month + at least the next month".
- Implement.
- Failing test: subsequent monthly runs add successive future boundaries (idempotent: already-split boundaries are no-ops, not errors).
- Run: pass.
- Commit:
feat(auditlog): partition maintenance HostedService (SPLIT RANGE roll-forward).
M6-T6: Health metric SiteAuditBacklog
Files:
- Modify:
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs— exposeGetBacklogStatsAsync()returning(pendingCount, oldestPendingUtc, onDiskBytes). - Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— addSiteAuditBacklogmetric (3-tuple), populated per site-health-report tick. - Create:
tests/ScadaLink.HealthMonitoring.Tests/SiteAuditBacklogMetricTests.cs.
Steps:
- Failing test: with 100 pending rows in SQLite, the metric reports
pendingCount=100. - Failing test: oldest pending age is reported in seconds since
OccurredAtUtc. - Failing test: on-disk bytes ≈ SQLite file size.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditBacklog metric (count + age + bytes).
M6-T7: Health metric SiteAuditTelemetryStalled
Files:
- Modify:
src/ScadaLink.HealthMonitoring/SiteHealthState.cs— add booleanSiteAuditTelemetryStalled. - Modify:
src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs— set the flag when reconciliation detects a non-draining backlog over two consecutive cycles. - Create:
tests/ScadaLink.HealthMonitoring.Tests/SiteAuditTelemetryStalledTests.cs.
Steps:
- Failing test: two consecutive non-draining cycles → flag set.
- Failing test: a subsequent draining cycle → flag cleared.
- Implement.
- Run: pass.
- Commit:
feat(health): SiteAuditTelemetryStalled flag.
M6-T8: Health metric CentralAuditWriteFailures
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs— addCentralAuditWriteFailurescounter. - Modify: every
ICentralAuditWritercall site (Inbound API middleware M4-T7, NotificationOutboxActor M4-T4/T5) — increment on caught exceptions. - Create:
tests/ScadaLink.HealthMonitoring.Tests/CentralAuditWriteFailuresTests.cs.
Steps:
- Failing test: 3 forced central direct-write failures → counter reports 3.
- Implement.
- Run: pass.
- Commit:
feat(health): CentralAuditWriteFailures metric.
M6-T9: Surface AuditRedactionFailure in central health
Files:
- Modify:
src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs— register the counter created in M5-T7 so it appears in the central health report payload. - Create:
tests/ScadaLink.HealthMonitoring.Tests/AuditRedactionFailureSurfacingTests.cs.
Steps:
- Failing test: incrementing the counter is visible in the next central health snapshot.
- Implement.
- Run: pass.
- Commit:
feat(health): surface AuditRedactionFailure in central health.
M6-T10: Integration test — central outage + reconciliation recovery
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/OutageReconciliationTests.cs— site + central; simulate a 5-minute central gRPC outage; during outage, site emits 200 events; restore central; assert reconciliation pulls catch up within one cycle and all 200 events land in central AuditLog with no duplicates.
Steps:
- Sketch, iterate, commit:
test(auditlog): outage + reconciliation recovery end-to-end.
M6-T11: Integration test — partition switch purge
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/PartitionPurgeTests.cs— pre-populate AuditLog with rows in three monthly partitions (one older than retention, two newer); triggerAuditLogPurgeActor; assert the oldest partition's rows are gone and newer partitions are untouched.
Steps:
- Sketch, iterate, commit:
test(auditlog): partition-switch purge end-to-end.
M6-T12: Integration test — partition maintenance roll-forward
Files:
- Create:
tests/ScadaLink.IntegrationTests/AuditLog/PartitionMaintenanceTests.cs— assert that afterAuditLogPartitionMaintenanceServiceruns, the partition function covers the next month's range.
Steps:
- Sketch, iterate, commit:
test(auditlog): partition maintenance roll-forward end-to-end.
M6 — Risk callouts
- Partition switch on a live table: SQL Server
ALTER TABLE ... SWITCH PARTITIONis metadata-only when source and target match in structure and filegroup; verify with a load test that ingest isn't paused during purge. - Pull cadence vs ingest rate: a site producing >
BatchSize/5s sustained may never let telemetry catch up — reconciliation must close the gap. The non-draining detection in M6-T3 is the safety net. - Site SQLite
ForwardStateflip after reconciliation: must be atomic with the central ack; otherwise a site crash mid-flip can re-send rows (idempotent at central, harmless but worth noting). - HostedService scheduling: ensure the partition maintenance service runs on the ACTIVE central node only (not both — would cause SQL errors trying to add the same range twice).
M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
Goal: User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
Affected projects: CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests.
Acceptance criteria:
- New
Components/Pages/Audit/AuditLogPage.razorexists; new "Audit" nav group sibling to Notifications. - All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per
Component-AuditLog.md§10. - Existing
Components/Pages/Monitoring/AuditLog.razor(the IAuditService config-change viewer) renamed in code toConfigurationAuditLog.razor, with URL/audit/configurationto match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added. - 3 KPI tiles added to the Health dashboard; data sourced from
HealthMonitoring. - Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on
ApiInboundrows, drill-in from Notifications to filtered Audit Log. OperationalAuditread permission gating +AuditExportfor the Export button.
M7 — Tasks (TDD-detail)
M7-T1: New AuditLogPage.razor scaffold + route + Audit nav group
Files:
- Create:
src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor+.razor.cs+.razor.css. Route/audit/log. Empty body for now beyond<h1>Audit Log</h1>. - Modify:
src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor(or equivalent) — add a new top-level Audit nav group sibling to Notifications, containing this page. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPageScaffoldTests.cs— Blazor component test (bUnit if it's used in the codebase; else Playwright).
Steps:
- Failing test: navigating to
/audit/logrenders the page (heading present). - Failing test: nav menu shows the Audit group.
- Implement.
- Run: pass.
- Commit:
feat(ui): scaffold Audit Log page + Audit nav group.
M7-T2: <AuditFilterBar> component
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor+.razor.cs— 10 filter elements perComponent-AuditLog.md§10. Multi-select chips for Channel/Kind/Status/Site (Bootstrap custom; NO third-party UI library). Time-range relative dropdown + custom date picker. Text search for Instance/Script/Target/Actor/CorrelationId. "Errors only" toggle. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditFilterBarTests.cs.
Steps:
- Failing test: rendering shows all 10 elements.
- Failing test: selecting filters and clicking "Apply" raises a
FilterChangedevent with the rightAuditQuerypayload. - Failing test: Kind options narrow when Channels are selected.
- Implement.
- Run: pass.
- Commit:
feat(ui): AuditFilterBar component.
M7-T3: <AuditResultsGrid> component with keyset paging
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor+.razor.cs— custom Bootstrap table (no third-party grid). 10 columns perComponent-AuditLog.md. Resizable + reorderable + persistable-per-user (persistence via existing user-settings store). - Keyset paging via
(OccurredAtUtc desc, EventId desc)cursor; default page 100. - Data source: server-side via
IAuditLogRepository.QueryAsync(M1-T8). Wire through aIAuditLogQueryService(new) that the page injects. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditResultsGridTests.cs.
Steps:
- Failing test: grid renders rows from a stub query service; columns match the documented set.
- Failing test: clicking "next page" calls the service with the keyset cursor of the last row.
- Failing test: column reordering persists across navigations (user-settings).
- Failing test: row click emits a
RowSelectedevent with the selectedAuditEvent. - Implement.
- Run: pass.
- Commit:
feat(ui): AuditResultsGrid with keyset paging.
M7-T4: <AuditDrilldownDrawer> — JSON pretty-print
Files:
- Create:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor+.razor.cs— slide-in drawer triggered byRowSelected. Renders all fields of the selectedAuditEvent. JSON detection: ifRequestSummaryorResponseSummaryis valid JSON, pretty-print with indentation. - Create:
tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditDrilldownDrawerJsonTests.cs.
Steps:
- Failing test: opening drawer with an event whose
RequestSummaryis valid JSON renders an indented version. - Failing test: non-JSON body renders verbatim.
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown drawer JSON pretty-print.
M7-T5: Drilldown — SQL syntax highlighting
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— forChannel=DbOutboundevents, treatRequestSummaryas SQL; apply syntax highlighting via a lightweight client-side library (Prism.js or Highlight.js if already in the project; else a small custom highlighter — confirm during M7 brainstorm). - Modify:
src/ScadaLink.CentralUI/wwwroot/— add the highlighter assets if needed.
Steps:
- Failing test: a
DbOutboundevent'sRequestSummaryis rendered inside a<code class="language-sql">block. - Implement.
- Run: pass.
- Commit:
feat(ui): drilldown SQL syntax highlighting.
M7-T6: Drilldown — "Copy as cURL" for ApiOutbound / ApiInbound
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— forChannel ∈ {ApiOutbound, ApiInbound}events, render a "Copy as cURL" button. Clicking generates a cURL command from the event's URL/headers/body and copies to clipboard viaIJSRuntime.
Steps:
- Failing test: button appears only for HTTP-bearing events.
- Failing test: clicking generates the correct cURL string (verified against a known event fixture).
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown Copy as cURL action.
M7-T7: Drilldown — "Show all events for this operation"
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs— when the event has a non-nullCorrelationId, render a link "Show all events for this operation" that re-applies the page's filter set withCorrelationId = <value>(other filters cleared).
Steps:
- Failing test: link appears only when CorrelationId is non-null.
- Failing test: clicking re-navigates to the Audit Log page with the filter applied.
- Implement.
- Run: pass.
- Commit:
feat(ui): drilldown "Show all events" by CorrelationId.
M7-T8: Drilldown — redaction indicators
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor— wherever a payload contains the string<redacted>or<redacted: redactor error>, render a small badge indicating the field was redacted. Show a tooltip linking to "Payload Capture Policy" in the Component-AuditLog docs.
Steps:
- Failing test: a payload with
<redacted>shows the badge. - Implement.
- Run: pass.
- Commit:
feat(ui): drilldown redaction indicators.
M7-T9: Rename AuditLog.razor → ConfigurationAuditLog.razor
Files:
- Rename:
src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor→Components/Pages/Audit/ConfigurationAuditLog.razor. - Update: the file's
@pagedirective to/audit/configuration. - Update: all
<NavLink>and any other inbound references to the old path. - Update: tests referencing the old name.
- Modify: nav menu — sit
ConfigurationAuditLogunder the Audit group as a sibling to the new Audit Log page.
Steps:
- Failing test: navigating to
/audit/configurationrenders the (renamed) page. - Failing test: the old
/monitoring/auditlogreturns 404 (or a redirect — choose during M7 brainstorm; redirect is safer for any external bookmarks). - Implement rename + path updates.
- Run: pass.
- Commit:
refactor(ui): rename Audit Log Viewer to Configuration Audit Log Viewer.
M7-T10: Drill-in from Notifications page
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor(or row-action panel) — add "View audit history" action to each row. Navigates to/audit/log?correlationId={NotificationId}.
Steps:
- Failing test: row action exists.
- Failing test: click navigates with the right query string.
- Implement.
- Run: pass.
- Commit:
feat(ui): drill-in from Notifications to Audit Log.
M7-T11: Drill-in from Site Calls page
Files:
- Modify: the Site Calls listing page (or create one if missing — defer to a follow-up if it doesn't exist yet — Site Call Audit #22 UI work is mostly out of scope here). For M7 acceptance: drill-in only required from pages that exist.
- If the page exists, mirror M7-T10's pattern with
?correlationId={TrackedOperationId}.
Steps:
- Conditional on page existence — confirm during M7 brainstorm.
- Implement.
- Commit:
feat(ui): drill-in from Site Calls to Audit Log.
M7-T12: Drill-in from External Systems / Inbound API Keys / Sites / Instances detail pages
Files:
- Modify (per page): External Systems detail, Inbound API Keys detail, Sites detail, Instances detail. Each gets a "Recent activity" / "Recent calls" / "Audit feed" link or tab navigating to
/audit/logwith the appropriate pre-filter (target=<system>/actor=<key name> AND channel=ApiInbound/site=<site>/instance=<instance>). - Tests: one per drill-in.
Steps:
- Failing tests per page.
- Implement.
- Run: pass.
- Commit:
feat(ui): drill-ins from detail pages to Audit Log.
M7-T13: 3 KPI tiles on the Health dashboard
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Pages/Health/HealthDashboard.razor(or equivalent) — add three tiles under a new "Audit" group: Audit volume, Audit error rate, Audit backlog. Data fed from the metrics defined in M5-T7 and M6-T6/T7/T8/T9. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/Health/AuditKpiTilesTests.cs.
Steps:
- Failing test: tiles render with stub data; clicking each navigates to the relevant Audit Log filtered view (or to a per-site breakdown for the backlog tile).
- Implement.
- Run: pass.
- Commit:
feat(ui): Audit KPI tiles on Health dashboard.
M7-T14: Server-side CSV export streaming
Files:
- Create:
src/ScadaLink.CentralUI/Services/AuditLogExportService.cs— accepts the current filter, streams server-side CSV viaIAuditLogRepository.QueryAsyncpaged enumeration; writes to the HTTP response without buffering the whole result in memory. - Modify:
AuditLogPage.razor— Export button calls the service. RequiresAuditExportpermission (M7-T15). - Create:
tests/ScadaLink.CentralUI.Tests/Services/AuditLogExportServiceTests.cs.
Steps:
- Failing test: exporting 10,000 rows streams as CSV; memory usage stays bounded.
- Failing test: default cap of 100k rows enforced; larger requests get a "use the CLI" error.
- Implement.
- Run: pass.
- Commit:
feat(ui): server-side streaming CSV export of Audit Log.
M7-T15: OperationalAudit + AuditExport permission gating
Files:
- Modify:
src/ScadaLink.Security/(or wherever the role/permission model lives) — addOperationalAuditandAuditExportpermissions; map them to the Audit role (existing) by default. - Modify:
AuditLogPage.razor— gate page access onOperationalAudit; gate the Export button onAuditExport. - Create:
tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPagePermissionTests.cs.
Steps:
- Failing test: a user without
OperationalAuditgets a 403 / hidden page. - Failing test: a user with
OperationalAuditbut noAuditExportcan read but Export button is hidden. - Implement permission checks.
- Run: pass.
- Commit:
feat(security): OperationalAudit + AuditExport permissions for the Audit Log surface.
M7-T16: Playwright E2E tests
Files:
- Create:
tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs— covers: filter narrowing, drilldown drawer JSON pretty-print, "Copy as cURL" on ApiInbound, drill-in from Notifications to filtered Audit Log, CSV export end-to-end, permission gating.
Steps:
- Sketch tests using the existing Playwright harness.
- Iterate until all green.
- Commit:
test(ui): Audit Log Playwright E2E coverage.
M7 — Risk callouts
- Custom data grid scope: keyset paging + reorderable columns + per-user persistence is non-trivial. Bench the existing
NotificationReport.razorgrid to see whether it can be generalised vs forking it. Decision during M7 brainstorm. - SignalR + large drawer payloads: the drilldown payload (up to 64 KB on errors) is rendered server-side via SignalR. Confirm
MaxRecvMessageSizeis large enough; bump if needed. - Permission infrastructure assumptions: confirm during M7 brainstorm that the codebase already supports per-permission gates at the page level, not just role-level. If only role-level, fall back to gating via the existing Audit role with a feature flag for the export.
- The rename to
ConfigurationAuditLog.razorbreaks any external bookmarks. Decide redirect vs 404 explicitly during M7 brainstorm.
M8 — CLI: scadalink audit query | export | verify-chain
Goal: Operator surface for the centralized Audit Log.
Affected projects: CLI, CLI.Tests, ManagementService (new HTTP endpoint), IntegrationTests.
Acceptance criteria:
scadalink audit querymirrors the UI filter set; results stream as JSON (default) or table.scadalink audit exportstreams server-side to CSV / JSONL / Parquet; requiresAuditExportpermission.scadalink audit verify-chain --month YYYY-MMis a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).- Existing
audit-log query(IAuditService config-change viewer) renamed in code toaudit-config queryto disambiguate; old name kept as a deprecated alias for one minor version. - Permissions:
audit queryandaudit verify-chainrequireOperationalAudit;audit exportadditionally requiresAuditExport.
M8 — Tasks (TDD-detail)
M8-T1: Create AuditCommands.cs (separate from existing AuditLogCommands.cs)
Files:
- Create:
src/ScadaLink.CLI/Commands/AuditCommands.cs—static AuditCommands { public static Command Build() }following the System.CommandLine pattern fromAuditLogCommands.cs:1–53. Sets up theauditparent command with three subcommands (T2/T3/T4). - Modify:
src/ScadaLink.CLI/Program.cs— registerAuditCommands.Build()alongside the existing command groups. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditCommandsScaffoldTests.cs.
Steps:
- Failing test:
scadalink audit --helplists three subcommands (query, export, verify-chain). - Implement.
- Run: pass.
- Commit:
feat(cli): scaffold scadalink audit command group.
M8-T2: audit query subcommand
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addquerysubcommand with the flag set matching the Central UI Audit Log filter set (post-Bundle-D fix):--since,--until,--channel,--kind,--status,--site,--instance,--target,--actor,--correlation-id,--errors-only,--page,--page-size. Output JSON by default;--format tableopt-in. - Create:
src/ScadaLink.Commons/Messages/Cli/QueryAuditLogCommand.cs(or wherever the CLI↔Management messages live — confirm via repo). - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs.
Steps:
- Failing test: parsing the documented flag set produces a
QueryAuditLogCommandwith the expected fields. - Failing test:
--format tableswitches the output formatter. - Failing test: unknown flag returns non-zero exit code with a helpful error.
- Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit query subcommand.
M8-T3: audit export subcommand
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addexportsubcommand with flags--since(required),--until(required),--format csv|jsonl|parquet(required),--output <path>(required),--channel,--kind,--status,--site,--target,--actor. - Create:
src/ScadaLink.Commons/Messages/Cli/ExportAuditLogCommand.cs. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditExportCommandTests.cs.
Steps:
- Failing test: missing required flag returns helpful error.
- Failing test: valid invocation creates an
ExportAuditLogCommandwith all fields. - Failing test: streams results to
--output; doesn't buffer entire export in memory (test with 100k+ rows). - Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit export subcommand (csv|jsonl|parquet).
M8-T4: audit verify-chain subcommand (no-op stub)
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditCommands.cs— addverify-chain --month <YYYY-MM>subcommand. In v1, returns a documented "hash chain not yet enabled in this release; see Component-AuditLog.md Security & Tamper-Evidence for the v1.x roadmap" message with exit code 0. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditVerifyChainCommandTests.cs.
Steps:
- Failing test:
scadalink audit verify-chain --month 2026-05exits 0 with the documented message. - Failing test: malformed month string (e.g.,
2026-13) exits non-zero with a parse error. - Implement.
- Run: pass.
- Commit:
feat(cli): scadalink audit verify-chain subcommand (v1 no-op).
M8-T5: ManagementService HTTP endpoints
Files:
- Modify:
src/ScadaLink.ManagementService/Controllers/AuditController.cs(new) — REST endpointsGET /api/audit/query(paged) andGET /api/audit/export(streaming). Both gated onOperationalAudit/AuditExportpermissions (matching the UI's permission split from M7-T15). - Create:
tests/ScadaLink.ManagementService.Tests/Controllers/AuditControllerTests.cs.
Steps:
- Failing test:
GET /api/audit/querywith valid params returns JSON page of audit events. - Failing test:
GET /api/audit/exportstreams CSV/JSONL/Parquet without buffering. - Failing test: a request without
OperationalAuditreturns 403. - Failing test:
/exportwithoutAuditExportreturns 403. - Implement.
- Run: pass.
- Commit:
feat(mgmt): /api/audit/{query,export} endpoints with permission gates.
M8-T6: Output formatters (JSON + table)
Files:
- Modify:
src/ScadaLink.CLI/Output/— add anAuditEventTableFormatterthat renders results as an aligned table with sensible defaults (truncate long fields with…). - The JSON formatter follows existing CLI patterns (one event per line for streaming, or array for paged results — confirm during M8 brainstorm).
- Create:
tests/ScadaLink.CLI.Tests/Output/AuditEventFormatterTests.cs.
Steps:
- Failing test: table format includes columns: OccurredAtUtc, Channel, Kind, Status, Target, Actor, DurationMs.
- Failing test: JSON format is one event per line.
- Implement.
- Run: pass.
- Commit:
feat(cli): JSON + table formatters for audit events.
M8-T7: Rename existing audit-log query → audit-config query with deprecation alias
Files:
- Modify:
src/ScadaLink.CLI/Commands/AuditLogCommands.cs— rename the top-level command fromaudit-logtoaudit-config(clearer disambiguation from the newauditgroup). Add an aliasaudit-logthat prints a deprecation warning and forwards toaudit-configfor one minor version. - Modify:
src/ScadaLink.CLI/README.mdand CLI help text to document the rename and the deprecation timeline. - Create:
tests/ScadaLink.CLI.Tests/Commands/AuditConfigDeprecationTests.cs.
Steps:
- Failing test:
scadalink audit-config query --user aliceworks. - Failing test:
scadalink audit-log query --user aliceworks but emits a deprecation warning to stderr. - Failing test:
scadalink audit query --since ...(the NEW operational command) andscadalink audit-config query --user ...(the renamed config command) are clearly different surfaces and do not conflict. - Implement.
- Run: pass.
- Commit:
refactor(cli): rename audit-log → audit-config with deprecation alias.
M8-T8: CLI README + help text updates
Files:
- Modify:
src/ScadaLink.CLI/README.md— document the newauditgroup, the renamedaudit-configgroup, the permission requirements, theverify-chainno-op note, and the CLI ↔ UI filter parity. - Modify: each subcommand's
--helpdescription for clarity.
Steps:
- Inline doc edits.
- Verify
scadalink audit --helpandscadalink audit-config --helpproduce the documented output. - Commit:
docs(cli): document new scadalink audit group and audit-config rename.
M8-T9: CLI integration test — end-to-end query + export
Files:
- Create:
tests/ScadaLink.IntegrationTests/Cli/AuditCliEndToEndTests.cs— boots central with a populated AuditLog table; invokesscadalink audit query --since ...against the running ManagementService; asserts results match the database. Same for export.
Steps:
- Sketch test using existing IntegrationTests harness.
- Iterate until all flag combinations work end-to-end.
- Commit:
test(cli): scadalink audit end-to-end against running ManagementService.
M8 — Risk callouts
- Operator script breakage from the
audit-logrename: the deprecation alias is the safety net but only for one minor version; document the deprecation timeline clearly in the CLI README. Coordinate with anyone runningaudit-login CI/cron. - Parquet output: requires a Parquet writer library. If one isn't already in
Directory.Packages.props, add the smallest viable dependency (ParquetSharporParquet.Net). Decide during M8 brainstorm. - Streaming export from CLI: the CLI invokes the ManagementService HTTP endpoint, which itself streams. Confirm
HttpClient.SendAsyncwithHttpCompletionOption.ResponseHeadersReadis used so the CLI doesn't buffer the whole response. - Permission model parity: ensure the CLI's permission errors mirror the UI's (HTTP 403 → CLI exit code 2 with a clear message).
Cross-cutting concerns (apply at every milestone)
- Branching: every milestone gets its own
feature/audit-log-mN-<slice>branch; merged with--no-fftomainon milestone completion. No pushes without explicit user authorization. - Tests: Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
- Commits: small and frequent. Bite-sized per writing-plans skill.
- Reviews: per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
- Docs: if implementation reveals a design gap, fix the design doc FIRST (in
docs/requirements/Component-AuditLog.mdand/oralog.md), commit, then implement. Don't let the code and docs drift. - Infra: the 3
infra/*working-tree modifications still uncommitted onmainare unrelated and stay that way unless the user explicitly addresses them. Use explicitgit add <path>throughout, nevergit commit -am.
Per-milestone execution flow (template)
When a milestone is about to start, run this sequence:
- Brainstorm: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
- Writing-plans: produce a milestone-specific plan with TDD detail per task — saved to
docs/plans/2026-XX-XX-auditlog-mN-<slice>.md+ peer.tasks.json. - Subagent-driven execution: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to
mainwith--no-ff.
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.