Roadmap covering Audit Log (#23) code implementation across 8 milestones
(M1 Foundation → M8 CLI). Reflects the actual state of the codebase —
all 22 prior components have source + tests, but Site Call Audit (#22)
and cached-call tracking are design-only despite being on main; their
minimum surface is inlined into M3.
M1 is laid out at full TDD-level task detail (11 bite-sized tasks).
M2–M8 are at milestone-shape detail (goals, files, task headlines,
acceptance criteria, risk callouts). Per-milestone bite-sized plans
will be generated by brainstorm + writing-plans when each milestone is
about to execute — locking 80 task cards now would mostly be stale by
M5 as M1 reveals codebase realities.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8).
Spec: docs/requirements/Component-AuditLog.md + alog.md (commit
fec0bb1).
40 KiB
Audit Log (#23) Code Implementation Roadmap
For Claude: REQUIRED SUB-SKILL FLOW per milestone:
brainstorming→writing-plans→subagent-driven-development. Usedocs/requirements/Component-AuditLog.md+alog.mdas the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. M1 carries full TDD-level task detail; M2–M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.
Goal: Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
Architecture: Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See alog.md for the locked design decisions and Component-AuditLog.md for the component spec.
Tech Stack: Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing SiteStream channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
Spec: /Users/dohertj2/Desktop/scadalink-design/alog.md (validated, immutable; commit fec0bb1). Component design at /Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md.
Codebase Reality Check (what already exists)
- All 22 prior components have source + tests. Audit Log slots in as a new
src/ScadaLink.AuditLog/project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface). - Existing patterns to copy from:
- Singleton wiring:
src/ScadaLink.Host/Actors/AkkaHostedService.cs:272–280(NotificationOutboxActor) —ClusterSingletonManager.Props+ manager/proxy pair. - EF migration:
src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs— table create + indexes; no partitioning yet — Audit Log will be the first. - Site SQLite hot-path:
src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98— single connection, write lock, Channel-based background writer. - Site-buffer + forwarder:
src/ScadaLink.StoreAndForward/—StoreAndForwardStorage+NotificationForwardershow the Pending → Forwarded transition we'll mirror. - Actor + repo + test trio:
src/ScadaLink.NotificationOutbox/NotificationOutboxActor.csandtests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20— TestKit base class, NSubstitute repo,Sys.ActorOf,ExpectMsg<T>. - gRPC additive:
src/ScadaLink.Communication/Protos/sitestream.proto— currently carries onlyAttributeValueUpdateandAlarmStateUpdatein aoneof; we extend it. - CLI command shape:
src/ScadaLink.CLI/Commands/AuditLogCommands.cs:1–53— System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay). - Blazor listing page:
src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor— filter bar + keyset paging + status badges idiom.
- Singleton wiring:
AuditLog.razorandAuditLogCommands.csalready exist but they're the IAuditService config-change viewer. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.- Test framework: xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under
tests/ScadaLink.IntegrationTests/. Playwright UI tests undertests/ScadaLink.CentralUI.PlaywrightTests/. Atests/ScadaLink.PerformanceTests/exists for load tests.
Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
The design for both is merged on main (alog.md cached-call tracking section; Component-SiteCallAudit.md), but grep finds zero references to TrackedOperationId or CachedCallTelemetry in src/. This matters because M3 (cached operations + dual-write transaction) cannot be built without them.
Three ways to handle this — pick before M3:
- Inline into M3 (Recommended): Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the
CachedCallTelemetrymessage, the operational-tracking SQLite table at sites, theSiteCallstable + repo +SiteCallAuditActorskeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end). - M0 prerequisite milestone: Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
- Ship Audit Log sync-only first, retrofit cached path later: M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
Default choice in this roadmap: (1). M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
Milestone index
| M | Title | Ships | Touches | Depends on |
|---|---|---|---|---|
| M1 | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
| M2 | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync Call() audited from script to central row). |
Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
| M3 | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
| M4 | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
| M5 | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
| M6 | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
| M7 | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing AuditLog.razor renamed to ConfigurationAuditLog. |
CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
| M8 | CLI — scadalink audit query / export / verify-chain |
Operator surface for query/export; verify-chain is a no-op stub until v1.x hash chain ships. |
CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
Ship-state at end of each milestone is the shippable slice — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
M1 — Foundation: schema, types, DB roles, partitioning
Goal: Land the new AuditLog table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
Affected projects:
src/ScadaLink.Commons/— entity, enums, interfaces, message DTOs.src/ScadaLink.ConfigurationDatabase/— EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.tests/ScadaLink.Commons.Tests/— enum + record tests.tests/ScadaLink.ConfigurationDatabase.Tests/— migration tests, repo tests.
Acceptance criteria:
dotnet buildof the solution succeeds.dotnet ef database updateagainst a dev MS SQL applies the migration;AuditLogtable exists, partitioned monthly onOccurredAtUtc, with PK onEventIdand the five expected indexes.scadalink_audit_writerandscadalink_audit_purgerSQL roles exist with the documented grants; a smoke test confirmsUPDATE AuditLogfrom the writer role fails.AuditEventrecord,AuditChannel/AuditKind/AuditStatusenums,IAuditWriter/ICentralAuditWriterinterfaces,AuditTelemetryEnvelope/PullAuditEventsmessage DTOs all exist in Commons in the right folders.IAuditLogRepositoryinterface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposesInsertIfNotExistsAsync, paged read, andSwitchOutPartitionAsync— no update or row-delete.- All new tests pass; no existing tests regress.
M1 — Tasks (TDD-detail)
M1-T1: Add audit enums to Commons
Files:
- Create:
src/ScadaLink.Commons/Types/Enums/AuditChannel.cs,AuditKind.cs,AuditStatus.cs. - Create:
tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs.
Steps:
- Write failing test verifying
AuditChannelhas exactlyApiOutbound | DbOutbound | Notification | ApiInbound(assertingEnum.GetValueslength and members). - Same for
AuditKind(10 members perComponent-AuditLog.md). - Same for
AuditStatus(8 members). - Run: tests fail (enums don't exist). Implement the three enums.
- Run tests: pass.
- Commit:
feat(commons): add Audit{Channel,Kind,Status} enums for #23.
M1-T2: Add AuditEvent record + ForwardState enum
Files:
- Create:
src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs— public record carrying all 20 central columns (peralog.md§4) plus a nullableForwardState?for the site-local variant. - Create:
src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs—Pending | Forwarded | Reconciled. - Create:
tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs.
Steps:
- Write failing test that constructs an
AuditEvent, sets every property, and round-trips viawithexpressions — asserts immutability and required-property behaviour. - Run: fail (type doesn't exist). Implement the record.
- Run: pass.
- Commit:
feat(commons): add AuditEvent record + ForwardState enum.
M1-T3: Add IAuditWriter and ICentralAuditWriter
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs,ICentralAuditWriter.cs. - Create:
tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs(smoke — only that the interfaces exist and have the documented signatures).
Steps:
- Write failing reflection-based test asserting both interfaces expose
Task WriteAsync(AuditEvent, CancellationToken). - Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
- Run: pass.
- Commit:
feat(commons): add IAuditWriter and ICentralAuditWriter.
M1-T4: Add audit telemetry + pull message DTOs
Files:
- Create:
src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs,PullAuditEventsRequest.cs,PullAuditEventsResponse.cs. - Create:
tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs.
Steps:
- Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
- Failing test: pull request carries
SinceUtc+BatchSize; response carries events +MoreAvailable. - Implement.
- Run: pass.
- Commit:
feat(commons): add audit telemetry + pull message DTOs.
M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
Files:
- Modify:
src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs— addpublic DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();at the appropriate position (afterNotifications). - Create:
src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs—IEntityTypeConfiguration<AuditEvent>mapping the columns, types, length constraints, and indexes peralog.md§4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively. - Modify:
OnModelCreating— apply the new configuration. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs— useModelBuilderdirectly to verify the entity is mapped toAuditLogtable, PK isEventId, and the expected columns + indexes are declared.
Steps:
- Failing test asserts mapped table name, PK column, and column count.
- Implement entity configuration; apply in
OnModelCreating. - Failing test asserts the five expected indexes exist on the model.
- Add
HasIndexdeclarations. - Run: pass.
- Commit:
feat(configdb): map AuditEvent to AuditLog table with PK and indexes.
M1-T6: Generate and customize EF migration for AuditLog
Files:
- Create:
src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.csviadotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase. - Modify: the generated
Up()/Down()to:- Create the partition function
pf_AuditLog_Monthand partition schemeps_AuditLog_Month(raw SQL viamigrationBuilder.Sql(...)), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting). - Alter the
CreateTablecall (or follow up withSql) to align the table tops_AuditLog_Month(OccurredAtUtc). - Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
- Create the partition function
- Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs— applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
Steps:
- Run
dotnet ef migrations add AddAuditLogTable. - Failing integration test: apply migration, query
sys.partition_functionsandsys.partition_schemesfor the expected names. - Edit migration to add the partition function + scheme + alignment.
- Re-run test: pass.
- Failing test: query
sys.indexesfor the five expected named indexes. - Adjust migration if any index name drifts.
- Run: pass.
- Commit:
feat(configdb): add AuditLog migration with monthly partitioning.
M1-T7: Add DB roles in migration
Files:
- Modify: the M1-T6 migration
Up()to also create thescadalink_audit_writer(INSERT + SELECT only) andscadalink_audit_purger(ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (IF NOT EXISTS). - Modify:
Down()— drop the roles. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs— applies migration, then runsSELECTonsys.database_role_members/sys.database_permissionsto assert the role grants. Plus a smoke test: connect as a user mapped toscadalink_audit_writer, attemptUPDATE AuditLog SET Status = 'X'and expect a permission error.
Steps:
- Failing test asserts both roles exist with documented grants.
- Add
migrationBuilder.Sql(...)blocks. - Run: pass.
- Failing test:
UPDATE AuditLogas audit writer → expect SqlException with permission error. - Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
- Run: pass.
- Commit:
feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles.
M1-T8: Add IAuditLogRepository + EF implementation
Files:
- Create:
src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs—InsertIfNotExistsAsync(AuditEvent, CancellationToken),QueryAsync(filter, paging, CancellationToken),SwitchOutPartitionAsync(monthBoundary, CancellationToken). Deliberately noUpdateAsyncor row-levelDeleteAsync. - Create:
src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs— implementation using the DbContext;InsertIfNotExistsAsyncusesMERGEor rawINSERT … WHERE NOT EXISTSto satisfy idempotency without throwing on dupes. - Modify:
ServiceCollectionExtensions.cs— registerIAuditLogRepository→AuditLogRepositoryin DI. - Create:
tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs.
Steps:
- Failing test:
InsertIfNotExistsAsyncfor a freshEventIdwrites one row; calling again with the sameEventIdis a no-op (no exception, no second row). - Implement; use a
MERGEorINSERT … WHERE NOT EXISTSstrategy that does NOT rely on EF change tracking. - Run: pass.
- Failing test: paged
QueryAsyncreturns rows in(OccurredAtUtc desc, EventId desc)order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range). - Implement filter projection + keyset paging.
- Run: pass.
- Failing test:
SwitchOutPartitionAsyncfor the oldest partition removes its rows from the live table. - Implement via
migrationBuilder-styleSql("ALTER TABLE ... SWITCH PARTITION ... TO ...")(against a staging table the implementation creates and drops within the same transaction). - Run: pass.
- Commit:
feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge).
M1-T9: Add AuditLogOptions configuration class + binding
Files:
- Create:
src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs(new project — see M1-T11) — ownsDefaultCapBytes,ErrorCapBytes,HeaderRedactList,GlobalBodyRedactors,PerTargetOverrides,RetentionDays, validation attributes. - Add: validation on startup (
IValidateOptions<AuditLogOptions>). - Test: ensure
appsettings.jsonbind round-trips and validation rejects out-of-rangeRetentionDays.
Steps:
- Failing test: bind a valid section → values present.
- Implement options class + binding.
- Failing test: bind invalid
RetentionDays→ validator rejects. - Implement validator.
- Run: pass.
- Commit:
feat(auditlog): add AuditLogOptions config binding.
M1-T10: Add ScadaLink.AuditLog project skeleton
Files:
- Create:
src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj— TargetFramework matches the rest of the solution; ProjectReferences toScadaLink.CommonsandScadaLink.ConfigurationDatabase. - Create:
src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs—AddAuditLog(this IServiceCollection, IConfiguration)that registersAuditLogOptions,IAuditLogRepository, plus placeholders that later milestones will fill (writer impls, actors). - Create:
tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csprojwith one smoke test. - Modify:
ScadaLink.slnx— add both projects to the solution. - Modify:
Directory.Packages.propsif any new package versions are needed.
Steps:
- Create projects via
dotnet new classlib/dotnet new xunit; add references; add to slnx. - Failing test: smoke-test
AddAuditLog()populates DI withIAuditLogRepositoryandIOptions<AuditLogOptions>. - Implement
ServiceCollectionExtensions.AddAuditLog. - Run: pass.
- Commit:
feat(auditlog): scaffold ScadaLink.AuditLog project.
M1-T11: Update Component-Host.md responsibilities + README component table
Files:
- Modify:
docs/requirements/Component-Host.md— listScadaLink.AuditLogin the central role's registration set. - Modify:
README.md— confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
Steps:
- Edit, verify cross-refs, commit:
docs(audit): register ScadaLink.AuditLog project in Host role.
M2 — Site pipeline (sync-only path)
Goal: First end-to-end audit emission: a script-initiated ExternalSystem.Call() produces an audit row in the central AuditLog table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: ApiOutbound.SyncCall.
Affected projects: Commons, AuditLog (new), Communication, Host, ExternalSystemGateway, all matching *.Tests/, tests/ScadaLink.IntegrationTests/.
Acceptance criteria:
- Site-local
IAuditWriterwrites to a per-site SQLiteauditlog.dbon the hot path withForwardState = 'Pending'; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric. SiteAuditTelemetryActordrains pending rows in batches via a newIngestAuditEventsRPC on the existingSiteStreamgRPC service; on success flipsForwardState = 'Forwarded'.AuditLogIngestActor(central singleton) receives the batch, performsInsertIfNotExistsAsyncper event, returns ack.ExternalSystem.Call()emits oneApiOutbound.SyncCallrow viaIAuditWriteron every call completion; audit-write failure does NOT abort the script.- Integration test in
tests/ScadaLink.IntegrationTests/boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the centralAuditLogwithin N seconds. - No regressions in existing ExternalSystemGateway or Communication tests.
Task headlines (each expanded to TDD detail in its own writing-plans pass before execution):
- Site-local
SqliteAuditWriterimplementingIAuditWriter— schema bootstrap, hot-path INSERT, write lock, ring-buffer fallback. Pattern fromSiteEventLogger.cs:28–98. - Bounded in-memory
RingBufferFallbackthat drains into the SQLite writer when health returns. SiteAuditTelemetryActoractor — periodic drain loop (5s busy / 30s idle), batch INSERT-IF-NOT-EXISTS via gRPC,ForwardStatetransitions.- Extend
sitestream.proto: addIngestAuditEvents(stream AuditEventBatch) returns (IngestAck). Regenerate. UpdateSiteStreamGrpcServer.csto handle the new RPC. AuditLogIngestActor(central singleton) — handles ingest message, callsIAuditLogRepository.InsertIfNotExistsAsyncper event in a single transaction.- Host wiring: register
SiteAuditTelemetryActoras a site singleton on a dedicated dispatcher (peralog.md§6.2); registerAuditLogIngestActoras a central singleton. Reference pattern atAkkaHostedService.cs:272–280. - ESG sync
Call()emission hook — addIAuditWriterinjection; emitAuditEvent(channel=ApiOutbound, kind=SyncCall) before returning. Audit-write failures never throw to the script. - End-to-end integration test in
IntegrationTests/AuditLog/SyncCallEmissionTests.cs— site + central wired, script invokes ESGCall(), central row appears. - Health metric
SiteAuditWriteFailures(this milestone defines it; M6 surfaces the tile). - Update
docker/deploy.sh/infra/reseed.shif needed so dev clusters can verify locally.
Risk callouts:
- Site SQLite write throughput under load — bench against existing SiteEventLogger numbers.
- gRPC additive evolution: the existing proto uses a
oneof. Adding a new top-level RPC is safe; embedding new oneof variants is also safe. Confirm message-ordering guarantees aren't violated. - Don't accidentally bind
SiteAuditTelemetryActorto the same dispatcher used by script blocking I/O; that's a real perf issue (per spec).
M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
Goal: Cached external calls (ExternalSystem.CachedCall) and cached DB writes (Database.CachedWrite) produce three audit rows per operation (CachedEnqueued, CachedAttempt × N, CachedTerminal) AND populate the operational SiteCalls table at central — in one transaction at central, from a single combined telemetry packet.
Affected projects: Commons, AuditLog, SiteCallAudit (new — minimum-viable surface), ConfigurationDatabase (new SiteCalls table migration), ExternalSystemGateway, StoreAndForward, Host. Tests across all of them + IntegrationTests.
Prerequisite call-out: This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — TrackedOperationId, site-local operation tracking SQLite, SiteCalls table at central, the existing-message CachedCallTelemetry (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
Acceptance criteria:
- New
SiteCallsMS SQL table + repo (no partitioning needed; this is operational state, not audit). - New
CachedCallTelemetrymessage in Commons carrying BOTH the cached-call operational fields AND anAuditEventpayload. - Site path:
CachedCallwrites the audit row to site SQLite (Kind = CachedEnqueued), creates the site operation-tracking row, and sends a combined telemetry packet. - Central path:
AuditLogIngestActor(extended) receives the combined packet, performs one transaction containing both theAuditLoginsert and theSiteCallsupsert. - Retry attempt →
Kind = CachedAttemptaudit row +SiteCallsstatus transition. Terminal →Kind = CachedTerminalaudit row +SiteCallsterminal status. - Integration test asserts: triggering a
CachedCallthat fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row withStatus = Delivered, all sharing the sameTrackedOperationIdcorrelation key.
Task headlines:
TrackedOperationIdGUID newtype in Commons.- Site-local SQLite operation-tracking table + repo (matches
alog.mdcached-call tracking design). CachedCallTelemetryCommons message carrying both operational fields andAuditEventpayload.SiteCallsMS SQL table + EF mapping + migration +ISiteCallAuditRepository+ repo impl.SiteCallAuditActorskeleton (singleton, central) — receives telemetry, ownsSiteCallsupsert via repo.- Extend
AuditLogIngestActorto detect combined telemetry and execute both writes (AuditLoginsert +SiteCallsupsert) in a singleDbContexttransaction. - ESG
CachedCall()emission — produce combined telemetry on every lifecycle transition (enqueue, attempt, terminal). - Extend gRPC proto with the combined-telemetry RPC if it's distinct from
IngestAuditEvents, or fold it into the existing one with a discriminator field (decision in milestone brainstorm). - Integration test in
IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs.
Risk callouts:
- Combined telemetry packet evolution: design the packet so future cached audit-kind additions are non-breaking (oneof or open-field map).
- Single transaction at central spans two tables; ensure connection retry behaviour is correct.
- Idempotency: AuditLog dedups on
EventId; SiteCalls dedups onTrackedOperationId. If telemetry retries and AuditLog already has the row, ensure SiteCalls upsert still runs (no short-circuit).
M4 — Remaining boundary emission
Goal: Every channel × kind from Component-AuditLog.md produces a row when its boundary call fires.
Affected projects: ExternalSystemGateway (sync DB writes/reads, cached DB writes), SiteRuntime (Database surface exposing them), NotificationOutbox (central direct-write of Attempt/Terminal), InboundAPI (middleware). Tests across all.
Acceptance criteria:
- Sync
Database.Connection().Execute()→DbOutbound.SyncWriterow;ExecuteReader→DbOutbound.SyncRead. Parameter values captured by default; per-connection redaction opt-in supported. Database.CachedWrite→ three lifecycle rows via the combined telemetry built in M3.- Notification Outbox dispatcher: every delivery attempt writes
Notification.Attempt; terminal writesNotification.Terminal. Site-emittedNotification.Enqueuedflows through the standard site→central audit path. Audit-write failure never affects delivery. - Inbound API middleware writes one
ApiInbound.Completedrow per request, beforeawait next()returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response.
Task headlines:
- ESG
Database.Connection()execute hook — wrapExecute*/ExecuteScalar/ExecuteReaderto emit before/after audit events. Database.CachedWritecombined-telemetry emission (mirror M3's ESG cached path).- NotificationOutboxActor extension — inject
ICentralAuditWriter; writeNotification.Attemptper dispatcher attempt; writeNotification.Terminalon terminal transitions; never abort on failure. - Site-emitted
Notification.Enqueued— when a script callsNotify.To().Send()(site-side via Store-and-Forward), emit a site audit row (Notification.Enqueued); telemetry forwards as usual. - Inbound API middleware: new
AuditWriteMiddlewareinsrc/ScadaLink.InboundAPI/Middleware/writingApiInbound.Completedbefore response flush; register in the ASP.NET pipeline. - Tests: emission unit tests per call mode, plus 4 integration tests (one per channel).
Risk callouts:
- Inbound API: correlation-id generation needs to be consistent with any upstream tracing headers (W3C
traceparentif present). - Notification dispatcher: confirm
ICentralAuditWritererrors are logged but don't block the dispatch loop.
M5 — Payload + redaction policy
Goal: Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
Affected projects: AuditLog (policy engine + options), ExternalSystemGateway (HTTP header redactors, SQL param redaction hook), InboundAPI (header redactors, body capture), NotificationOutbox (subject/body capture follows existing rules). Tests.
Acceptance criteria:
- A
IAuditPayloadFilterservice is invoked between event construction and write. Truncates to default cap; raises to error cap on non-Successrows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and incrementsAuditRedactionFailure. - Configuration test: changing
appsettings.jsonredactors changes runtime behaviour (no rebuild needed for regex changes). - Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
Task headlines:
IAuditPayloadFilter+ default implementation (header redaction, body regex, SQL parameter redaction, safety net).- Wire the filter into the emission paths (M2, M3, M4 emitters all call through the filter before handing the
AuditEventto the writer). appsettings.jsonschema for the filter (already prepared in M1-T9; M5 plugs the runtime in).- Tests: redaction unit tests with known-bad payloads (passwords in JSON,
Authorizationheaders, SQL params named@apikey). - Performance test in
tests/ScadaLink.PerformanceTests/for the hot-path latency budget.
Risk callouts:
- Regex performance — pre-compile and cache patterns; reject patterns that take too long to compile.
- Don't redact post-truncation if the truncation cut a redaction target in half.
M6 — Reconciliation, purge, partition maintenance, health metrics
Goal: Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
Affected projects: AuditLog (3 new actors: SiteAuditReconciliationActor, AuditLogPurgeActor, partition-maintenance worker), Communication (the PullAuditEvents RPC), HealthMonitoring (5 new metrics), ConfigurationDatabase (partition-roll-forward SQL helper).
Acceptance criteria:
SiteAuditReconciliationActorruns every 5 minutes per site; pulls events the site reports asPending; central performsInsertIfNotExistsAsyncthen signals the site to flip those rows toReconciled.AuditLogPurgeActorruns daily; for each partition older thanRetentionDays, switches it out to a staging table and drops the staging table. Emits anAuditLog:Purgedevent with rowcount + duration.- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
- 5 new health metrics published per site:
SiteAuditBacklog(count + oldest + bytes),SiteAuditWriteFailures,SiteAuditTelemetryStalled; and per central node:CentralAuditWriteFailures,AuditRedactionFailure. - Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
Task headlines:
PullAuditEventsRPC on the existingSiteStreamgRPC server.SiteAuditReconciliationActoractor with timer + per-siteLastReconciledAtcursor.AuditLogPurgeActoractor with daily schedule, partition-switch logic viaIAuditLogRepository.SwitchOutPartitionAsync.- Partition-roll-forward helper (raw SQL
migrationBuilder.Sqlequivalent at runtime — likely aHostedServicethat runs once at startup and once per month). - Health metric publishing per emitter; integrate with the existing
SiteHealthState/CentralHealthAggregatorplumbing. - Integration tests for outage/recovery + purge.
Risk callouts:
- Partition switch on an active table — ensure online schema operations don't block ingest; document the window if a brief lock is unavoidable.
- Reconciliation can produce duplicate
Forwarded↔Reconciledstate flips; ensure idempotency at site SQLite layer.
M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
Goal: User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
Affected projects: CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests.
Acceptance criteria:
- New
Components/Pages/Audit/AuditLogPage.razorexists; new "Audit" nav group sibling to Notifications. - All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per
Component-AuditLog.md§10. - Existing
Components/Pages/Monitoring/AuditLog.razor(the IAuditService config-change viewer) renamed in code toConfigurationAuditLog.razor, with URL/audit/configurationto match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added. - 3 KPI tiles added to the Health dashboard; data sourced from
HealthMonitoring. - Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on
ApiInboundrows, drill-in from Notifications to filtered Audit Log. OperationalAuditread permission gating +AuditExportfor the Export button.
Task headlines:
- New
Components/Pages/Audit/AuditLogPage.razor+ matching.razor.cscode-behind +.razor.css. - Custom Blazor
<AuditFilterBar>component (multi-select chips for Channel/Kind/Status, autocomplete for Instance/Script). - Custom Blazor
<AuditResultsGrid>component — keyset paging viaQueryAsyncrepository method (M1-T8). <AuditDrilldownDrawer>component — JSON pretty-print, SQL syntax highlight, "Copy as cURL", "Show all events" CorrelationId filter.- Rename existing
AuditLog.razor→ConfigurationAuditLog.razor+ update routes + update internal links. - Drill-in additions to 6 existing pages.
- 3 KPI tile components on Health dashboard.
- Server-side CSV export (streaming) with
AuditExportpermission check. - Playwright E2E tests.
Risk callouts:
- Permission check at the page level needs to align with the existing role/permission infrastructure (Security #10).
- Keyset paging across partitioned table needs the right index; M1's
IX_AuditLog_OccurredAtUtcis the supporting index.
M8 — CLI: scadalink audit query | export | verify-chain
Goal: Operator surface for the centralized Audit Log.
Affected projects: CLI, CLI.Tests, ManagementService (new HTTP endpoint), IntegrationTests.
Acceptance criteria:
scadalink audit querymirrors the UI filter set; results stream as JSON (default) or table.scadalink audit exportstreams server-side to CSV / JSONL / Parquet; requiresAuditExportpermission.scadalink audit verify-chain --month YYYY-MMis a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).- Existing
audit-log query(IAuditService config-change viewer) renamed in code toaudit-config queryto disambiguate; old name kept as a deprecated alias for one minor version. - Permissions:
audit queryandaudit verify-chainrequireOperationalAudit;audit exportadditionally requiresAuditExport.
Task headlines:
- New
AuditCommands.cs(separate file fromAuditLogCommands.cs— the latter stays for the renamed config audit). - Build the three subcommands with their flag sets (per CLI doc &
alog.md§15.1, post-Bundle-D fix). - ManagementService HTTP endpoints backing each subcommand.
- Output formatters (JSON, table) reused from existing CLI patterns.
- CLI integration tests in
tests/ScadaLink.CLI.Tests/+tests/ScadaLink.IntegrationTests/. - Update CLI README + help text.
Risk callouts:
- The CLI rename (
audit-log query→audit-config query) breaks any operator scripts; provide a deprecation alias and document the migration.
Cross-cutting concerns (apply at every milestone)
- Branching: every milestone gets its own
feature/audit-log-mN-<slice>branch; merged with--no-fftomainon milestone completion. No pushes without explicit user authorization. - Tests: Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
- Commits: small and frequent. Bite-sized per writing-plans skill.
- Reviews: per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
- Docs: if implementation reveals a design gap, fix the design doc FIRST (in
docs/requirements/Component-AuditLog.mdand/oralog.md), commit, then implement. Don't let the code and docs drift. - Infra: the 3
infra/*working-tree modifications still uncommitted onmainare unrelated and stay that way unless the user explicitly addresses them. Use explicitgit add <path>throughout, nevergit commit -am.
Per-milestone execution flow (template)
When a milestone is about to start, run this sequence:
- Brainstorm: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
- Writing-plans: produce a milestone-specific plan with TDD detail per task — saved to
docs/plans/2026-XX-XX-auditlog-mN-<slice>.md+ peer.tasks.json. - Subagent-driven execution: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to
mainwith--no-ff.
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.