docs(audit): add 8-milestone code implementation roadmap
Roadmap covering Audit Log (#23) code implementation across 8 milestones
(M1 Foundation → M8 CLI). Reflects the actual state of the codebase —
all 22 prior components have source + tests, but Site Call Audit (#22)
and cached-call tracking are design-only despite being on main; their
minimum surface is inlined into M3.
M1 is laid out at full TDD-level task detail (11 bite-sized tasks).
M2–M8 are at milestone-shape detail (goals, files, task headlines,
acceptance criteria, risk callouts). Per-milestone bite-sized plans
will be generated by brainstorm + writing-plans when each milestone is
about to execute — locking 80 task cards now would mostly be stale by
M5 as M1 reveals codebase realities.
Critical path: M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8).
Spec: docs/requirements/Component-AuditLog.md + alog.md (commit
fec0bb1).
This commit is contained in:
471
docs/plans/2026-05-20-audit-log-code-roadmap.md
Normal file
471
docs/plans/2026-05-20-audit-log-code-roadmap.md
Normal file
@@ -0,0 +1,471 @@
|
||||
# Audit Log (#23) Code Implementation Roadmap
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL FLOW per milestone: `brainstorming` → `writing-plans` → `subagent-driven-development`. Use `docs/requirements/Component-AuditLog.md` + `alog.md` as the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. **M1 carries full TDD-level task detail; M2–M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.**
|
||||
|
||||
**Goal:** Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
|
||||
|
||||
**Architecture:** Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See `alog.md` for the locked design decisions and `Component-AuditLog.md` for the component spec.
|
||||
|
||||
**Tech Stack:** Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing `SiteStream` channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
|
||||
|
||||
**Spec:** `/Users/dohertj2/Desktop/scadalink-design/alog.md` (validated, immutable; commit `fec0bb1`). Component design at `/Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md`.
|
||||
|
||||
---
|
||||
|
||||
## Codebase Reality Check (what already exists)
|
||||
|
||||
- **All 22 prior components have source + tests.** Audit Log slots in as a new `src/ScadaLink.AuditLog/` project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface).
|
||||
- **Existing patterns to copy from:**
|
||||
- Singleton wiring: `src/ScadaLink.Host/Actors/AkkaHostedService.cs:272–280` (NotificationOutboxActor) — `ClusterSingletonManager.Props` + manager/proxy pair.
|
||||
- EF migration: `src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs` — table create + indexes; **no partitioning yet — Audit Log will be the first.**
|
||||
- Site SQLite hot-path: `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98` — single connection, write lock, Channel-based background writer.
|
||||
- Site-buffer + forwarder: `src/ScadaLink.StoreAndForward/` — `StoreAndForwardStorage` + `NotificationForwarder` show the Pending → Forwarded transition we'll mirror.
|
||||
- Actor + repo + test trio: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` and `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20` — TestKit base class, NSubstitute repo, `Sys.ActorOf`, `ExpectMsg<T>`.
|
||||
- gRPC additive: `src/ScadaLink.Communication/Protos/sitestream.proto` — currently carries only `AttributeValueUpdate` and `AlarmStateUpdate` in a `oneof`; we extend it.
|
||||
- CLI command shape: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs:1–53` — System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay).
|
||||
- Blazor listing page: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` — filter bar + keyset paging + status badges idiom.
|
||||
- **`AuditLog.razor` and `AuditLogCommands.cs` already exist** but they're the **IAuditService config-change viewer**. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.
|
||||
- **Test framework:** xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under `tests/ScadaLink.IntegrationTests/`. Playwright UI tests under `tests/ScadaLink.CentralUI.PlaywrightTests/`. A `tests/ScadaLink.PerformanceTests/` exists for load tests.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
|
||||
|
||||
The design for both is merged on `main` (`alog.md` cached-call tracking section; `Component-SiteCallAudit.md`), but `grep` finds zero references to `TrackedOperationId` or `CachedCallTelemetry` in `src/`. This matters because **M3 (cached operations + dual-write transaction) cannot be built without them**.
|
||||
|
||||
**Three ways to handle this — pick before M3:**
|
||||
|
||||
1. **Inline into M3 (Recommended):** Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the `CachedCallTelemetry` message, the operational-tracking SQLite table at sites, the `SiteCalls` table + repo + `SiteCallAuditActor` skeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end).
|
||||
2. **M0 prerequisite milestone:** Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
|
||||
3. **Ship Audit Log sync-only first, retrofit cached path later:** M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
|
||||
|
||||
**Default choice in this roadmap: (1).** M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
|
||||
|
||||
---
|
||||
|
||||
## Milestone index
|
||||
|
||||
| M | Title | Ships | Touches | Depends on |
|
||||
|---|---|---|---|---|
|
||||
| **M1** | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
|
||||
| **M2** | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync `Call()` audited from script to central row). | Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
|
||||
| **M3** | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
|
||||
| **M4** | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
|
||||
| **M5** | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
|
||||
| **M6** | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
|
||||
| **M7** | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing `AuditLog.razor` renamed to ConfigurationAuditLog. | CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
|
||||
| **M8** | CLI — `scadalink audit query / export / verify-chain` | Operator surface for query/export; `verify-chain` is a no-op stub until v1.x hash chain ships. | CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
|
||||
|
||||
**Ship-state at end of each milestone is the shippable slice** — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
|
||||
|
||||
**Critical path:** M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
|
||||
|
||||
---
|
||||
|
||||
## M1 — Foundation: schema, types, DB roles, partitioning
|
||||
|
||||
**Goal:** Land the new `AuditLog` table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
|
||||
|
||||
**Affected projects:**
|
||||
- `src/ScadaLink.Commons/` — entity, enums, interfaces, message DTOs.
|
||||
- `src/ScadaLink.ConfigurationDatabase/` — EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.
|
||||
- `tests/ScadaLink.Commons.Tests/` — enum + record tests.
|
||||
- `tests/ScadaLink.ConfigurationDatabase.Tests/` — migration tests, repo tests.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- `dotnet build` of the solution succeeds.
|
||||
- `dotnet ef database update` against a dev MS SQL applies the migration; `AuditLog` table exists, partitioned monthly on `OccurredAtUtc`, with PK on `EventId` and the five expected indexes.
|
||||
- `scadalink_audit_writer` and `scadalink_audit_purger` SQL roles exist with the documented grants; a smoke test confirms `UPDATE AuditLog` from the writer role fails.
|
||||
- `AuditEvent` record, `AuditChannel`/`AuditKind`/`AuditStatus` enums, `IAuditWriter`/`ICentralAuditWriter` interfaces, `AuditTelemetryEnvelope`/`PullAuditEvents` message DTOs all exist in Commons in the right folders.
|
||||
- `IAuditLogRepository` interface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposes `InsertIfNotExistsAsync`, paged read, and `SwitchOutPartitionAsync` — no update or row-delete.
|
||||
- All new tests pass; no existing tests regress.
|
||||
|
||||
### M1 — Tasks (TDD-detail)
|
||||
|
||||
#### M1-T1: Add audit enums to Commons
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.Commons/Types/Enums/AuditChannel.cs`, `AuditKind.cs`, `AuditStatus.cs`.
|
||||
- Create: `tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs`.
|
||||
|
||||
**Steps:**
|
||||
1. Write failing test verifying `AuditChannel` has exactly `ApiOutbound | DbOutbound | Notification | ApiInbound` (asserting `Enum.GetValues` length and members).
|
||||
2. Same for `AuditKind` (10 members per `Component-AuditLog.md`).
|
||||
3. Same for `AuditStatus` (8 members).
|
||||
4. Run: tests fail (enums don't exist). Implement the three enums.
|
||||
5. Run tests: pass.
|
||||
6. Commit: `feat(commons): add Audit{Channel,Kind,Status} enums for #23`.
|
||||
|
||||
#### M1-T2: Add AuditEvent record + ForwardState enum
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs` — public record carrying all 20 central columns (per `alog.md` §4) plus a nullable `ForwardState?` for the site-local variant.
|
||||
- Create: `src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs` — `Pending | Forwarded | Reconciled`.
|
||||
- Create: `tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs`.
|
||||
|
||||
**Steps:**
|
||||
1. Write failing test that constructs an `AuditEvent`, sets every property, and round-trips via `with` expressions — asserts immutability and required-property behaviour.
|
||||
2. Run: fail (type doesn't exist). Implement the record.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(commons): add AuditEvent record + ForwardState enum`.
|
||||
|
||||
#### M1-T3: Add IAuditWriter and ICentralAuditWriter
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs`, `ICentralAuditWriter.cs`.
|
||||
- Create: `tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs` (smoke — only that the interfaces exist and have the documented signatures).
|
||||
|
||||
**Steps:**
|
||||
1. Write failing reflection-based test asserting both interfaces expose `Task WriteAsync(AuditEvent, CancellationToken)`.
|
||||
2. Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(commons): add IAuditWriter and ICentralAuditWriter`.
|
||||
|
||||
#### M1-T4: Add audit telemetry + pull message DTOs
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs`, `PullAuditEventsRequest.cs`, `PullAuditEventsResponse.cs`.
|
||||
- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
|
||||
2. Failing test: pull request carries `SinceUtc` + `BatchSize`; response carries events + `MoreAvailable`.
|
||||
3. Implement.
|
||||
4. Run: pass.
|
||||
5. Commit: `feat(commons): add audit telemetry + pull message DTOs`.
|
||||
|
||||
#### M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs` — add `public DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();` at the appropriate position (after `Notifications`).
|
||||
- Create: `src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs` — `IEntityTypeConfiguration<AuditEvent>` mapping the columns, types, length constraints, and indexes per `alog.md` §4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively.
|
||||
- Modify: `OnModelCreating` — apply the new configuration.
|
||||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs` — use `ModelBuilder` directly to verify the entity is mapped to `AuditLog` table, PK is `EventId`, and the expected columns + indexes are declared.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test asserts mapped table name, PK column, and column count.
|
||||
2. Implement entity configuration; apply in `OnModelCreating`.
|
||||
3. Failing test asserts the five expected indexes exist on the model.
|
||||
4. Add `HasIndex` declarations.
|
||||
5. Run: pass.
|
||||
6. Commit: `feat(configdb): map AuditEvent to AuditLog table with PK and indexes`.
|
||||
|
||||
#### M1-T6: Generate and customize EF migration for AuditLog
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.cs` via `dotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase`.
|
||||
- Modify: the generated `Up()` / `Down()` to:
|
||||
- Create the partition function `pf_AuditLog_Month` and partition scheme `ps_AuditLog_Month` (raw SQL via `migrationBuilder.Sql(...)`), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting).
|
||||
- Alter the `CreateTable` call (or follow up with `Sql`) to align the table to `ps_AuditLog_Month(OccurredAtUtc)`.
|
||||
- Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
|
||||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs` — applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
|
||||
|
||||
**Steps:**
|
||||
1. Run `dotnet ef migrations add AddAuditLogTable`.
|
||||
2. Failing integration test: apply migration, query `sys.partition_functions` and `sys.partition_schemes` for the expected names.
|
||||
3. Edit migration to add the partition function + scheme + alignment.
|
||||
4. Re-run test: pass.
|
||||
5. Failing test: query `sys.indexes` for the five expected named indexes.
|
||||
6. Adjust migration if any index name drifts.
|
||||
7. Run: pass.
|
||||
8. Commit: `feat(configdb): add AuditLog migration with monthly partitioning`.
|
||||
|
||||
#### M1-T7: Add DB roles in migration
|
||||
|
||||
**Files:**
|
||||
- Modify: the M1-T6 migration `Up()` to also create the `scadalink_audit_writer` (INSERT + SELECT only) and `scadalink_audit_purger` (ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (`IF NOT EXISTS`).
|
||||
- Modify: `Down()` — drop the roles.
|
||||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs` — applies migration, then runs `SELECT` on `sys.database_role_members` / `sys.database_permissions` to assert the role grants. Plus a smoke test: connect as a user mapped to `scadalink_audit_writer`, attempt `UPDATE AuditLog SET Status = 'X'` and expect a permission error.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test asserts both roles exist with documented grants.
|
||||
2. Add `migrationBuilder.Sql(...)` blocks.
|
||||
3. Run: pass.
|
||||
4. Failing test: `UPDATE AuditLog` as audit writer → expect SqlException with permission error.
|
||||
5. Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
|
||||
6. Run: pass.
|
||||
7. Commit: `feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles`.
|
||||
|
||||
#### M1-T8: Add IAuditLogRepository + EF implementation
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs` — `InsertIfNotExistsAsync(AuditEvent, CancellationToken)`, `QueryAsync(filter, paging, CancellationToken)`, `SwitchOutPartitionAsync(monthBoundary, CancellationToken)`. **Deliberately no `UpdateAsync` or row-level `DeleteAsync`.**
|
||||
- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs` — implementation using the DbContext; `InsertIfNotExistsAsync` uses `MERGE` or raw `INSERT … WHERE NOT EXISTS` to satisfy idempotency without throwing on dupes.
|
||||
- Modify: `ServiceCollectionExtensions.cs` — register `IAuditLogRepository` → `AuditLogRepository` in DI.
|
||||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test: `InsertIfNotExistsAsync` for a fresh `EventId` writes one row; calling again with the same `EventId` is a no-op (no exception, no second row).
|
||||
2. Implement; use a `MERGE` or `INSERT … WHERE NOT EXISTS` strategy that does NOT rely on EF change tracking.
|
||||
3. Run: pass.
|
||||
4. Failing test: paged `QueryAsync` returns rows in `(OccurredAtUtc desc, EventId desc)` order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range).
|
||||
5. Implement filter projection + keyset paging.
|
||||
6. Run: pass.
|
||||
7. Failing test: `SwitchOutPartitionAsync` for the oldest partition removes its rows from the live table.
|
||||
8. Implement via `migrationBuilder`-style `Sql("ALTER TABLE ... SWITCH PARTITION ... TO ...")` (against a staging table the implementation creates and drops within the same transaction).
|
||||
9. Run: pass.
|
||||
10. Commit: `feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge)`.
|
||||
|
||||
#### M1-T9: Add AuditLogOptions configuration class + binding
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs` (new project — see M1-T11) — owns `DefaultCapBytes`, `ErrorCapBytes`, `HeaderRedactList`, `GlobalBodyRedactors`, `PerTargetOverrides`, `RetentionDays`, validation attributes.
|
||||
- Add: validation on startup (`IValidateOptions<AuditLogOptions>`).
|
||||
- Test: ensure `appsettings.json` bind round-trips and validation rejects out-of-range `RetentionDays`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test: bind a valid section → values present.
|
||||
2. Implement options class + binding.
|
||||
3. Failing test: bind invalid `RetentionDays` → validator rejects.
|
||||
4. Implement validator.
|
||||
5. Run: pass.
|
||||
6. Commit: `feat(auditlog): add AuditLogOptions config binding`.
|
||||
|
||||
#### M1-T10: Add ScadaLink.AuditLog project skeleton
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj` — TargetFramework matches the rest of the solution; ProjectReferences to `ScadaLink.Commons` and `ScadaLink.ConfigurationDatabase`.
|
||||
- Create: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — `AddAuditLog(this IServiceCollection, IConfiguration)` that registers `AuditLogOptions`, `IAuditLogRepository`, plus placeholders that later milestones will fill (writer impls, actors).
|
||||
- Create: `tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csproj` with one smoke test.
|
||||
- Modify: `ScadaLink.slnx` — add both projects to the solution.
|
||||
- Modify: `Directory.Packages.props` if any new package versions are needed.
|
||||
|
||||
**Steps:**
|
||||
1. Create projects via `dotnet new classlib` / `dotnet new xunit`; add references; add to slnx.
|
||||
2. Failing test: smoke-test `AddAuditLog()` populates DI with `IAuditLogRepository` and `IOptions<AuditLogOptions>`.
|
||||
3. Implement `ServiceCollectionExtensions.AddAuditLog`.
|
||||
4. Run: pass.
|
||||
5. Commit: `feat(auditlog): scaffold ScadaLink.AuditLog project`.
|
||||
|
||||
#### M1-T11: Update Component-Host.md responsibilities + README component table
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/requirements/Component-Host.md` — list `ScadaLink.AuditLog` in the central role's registration set.
|
||||
- Modify: `README.md` — confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
|
||||
|
||||
**Steps:**
|
||||
1. Edit, verify cross-refs, commit: `docs(audit): register ScadaLink.AuditLog project in Host role`.
|
||||
|
||||
---
|
||||
|
||||
## M2 — Site pipeline (sync-only path)
|
||||
|
||||
**Goal:** First end-to-end audit emission: a script-initiated `ExternalSystem.Call()` produces an audit row in the central `AuditLog` table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: `ApiOutbound.SyncCall`.
|
||||
|
||||
**Affected projects:** `Commons`, `AuditLog` (new), `Communication`, `Host`, `ExternalSystemGateway`, all matching `*.Tests/`, `tests/ScadaLink.IntegrationTests/`.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Site-local `IAuditWriter` writes to a per-site SQLite `auditlog.db` on the hot path with `ForwardState = 'Pending'`; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric.
|
||||
- `SiteAuditTelemetryActor` drains pending rows in batches via a new `IngestAuditEvents` RPC on the existing `SiteStream` gRPC service; on success flips `ForwardState = 'Forwarded'`.
|
||||
- `AuditLogIngestActor` (central singleton) receives the batch, performs `InsertIfNotExistsAsync` per event, returns ack.
|
||||
- `ExternalSystem.Call()` emits one `ApiOutbound.SyncCall` row via `IAuditWriter` on every call completion; audit-write failure does NOT abort the script.
|
||||
- Integration test in `tests/ScadaLink.IntegrationTests/` boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the central `AuditLog` within N seconds.
|
||||
- No regressions in existing ExternalSystemGateway or Communication tests.
|
||||
|
||||
**Task headlines** (each expanded to TDD detail in its own writing-plans pass before execution):
|
||||
1. Site-local `SqliteAuditWriter` implementing `IAuditWriter` — schema bootstrap, hot-path INSERT, write lock, ring-buffer fallback. Pattern from `SiteEventLogger.cs:28–98`.
|
||||
2. Bounded in-memory `RingBufferFallback` that drains into the SQLite writer when health returns.
|
||||
3. `SiteAuditTelemetryActor` actor — periodic drain loop (5s busy / 30s idle), batch INSERT-IF-NOT-EXISTS via gRPC, `ForwardState` transitions.
|
||||
4. Extend `sitestream.proto`: add `IngestAuditEvents(stream AuditEventBatch) returns (IngestAck)`. Regenerate. Update `SiteStreamGrpcServer.cs` to handle the new RPC.
|
||||
5. `AuditLogIngestActor` (central singleton) — handles ingest message, calls `IAuditLogRepository.InsertIfNotExistsAsync` per event in a single transaction.
|
||||
6. Host wiring: register `SiteAuditTelemetryActor` as a site singleton on a **dedicated dispatcher** (per `alog.md` §6.2); register `AuditLogIngestActor` as a central singleton. Reference pattern at `AkkaHostedService.cs:272–280`.
|
||||
7. ESG sync `Call()` emission hook — add `IAuditWriter` injection; emit `AuditEvent` (channel=ApiOutbound, kind=SyncCall) before returning. Audit-write failures never throw to the script.
|
||||
8. End-to-end integration test in `IntegrationTests/AuditLog/SyncCallEmissionTests.cs` — site + central wired, script invokes ESG `Call()`, central row appears.
|
||||
9. Health metric `SiteAuditWriteFailures` (this milestone defines it; M6 surfaces the tile).
|
||||
10. Update `docker/deploy.sh` / `infra/reseed.sh` if needed so dev clusters can verify locally.
|
||||
|
||||
**Risk callouts:**
|
||||
- Site SQLite write throughput under load — bench against existing SiteEventLogger numbers.
|
||||
- gRPC additive evolution: the existing proto uses a `oneof`. Adding a new top-level RPC is safe; embedding new oneof variants is also safe. Confirm message-ordering guarantees aren't violated.
|
||||
- Don't accidentally bind `SiteAuditTelemetryActor` to the same dispatcher used by script blocking I/O; that's a real perf issue (per spec).
|
||||
|
||||
---
|
||||
|
||||
## M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
|
||||
|
||||
**Goal:** Cached external calls (`ExternalSystem.CachedCall`) and cached DB writes (`Database.CachedWrite`) produce three audit rows per operation (`CachedEnqueued`, `CachedAttempt × N`, `CachedTerminal`) AND populate the operational `SiteCalls` table at central — in one transaction at central, from a single combined telemetry packet.
|
||||
|
||||
**Affected projects:** `Commons`, `AuditLog`, `SiteCallAudit` (new — minimum-viable surface), `ConfigurationDatabase` (new `SiteCalls` table migration), `ExternalSystemGateway`, `StoreAndForward`, `Host`. Tests across all of them + IntegrationTests.
|
||||
|
||||
**Prerequisite call-out:** This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — `TrackedOperationId`, site-local operation tracking SQLite, `SiteCalls` table at central, the existing-message `CachedCallTelemetry` (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- New `SiteCalls` MS SQL table + repo (no partitioning needed; this is operational state, not audit).
|
||||
- New `CachedCallTelemetry` message in Commons carrying BOTH the cached-call operational fields AND an `AuditEvent` payload.
|
||||
- Site path: `CachedCall` writes the audit row to site SQLite (`Kind = CachedEnqueued`), creates the site operation-tracking row, and sends a combined telemetry packet.
|
||||
- Central path: `AuditLogIngestActor` (extended) receives the combined packet, performs **one transaction containing both** the `AuditLog` insert and the `SiteCalls` upsert.
|
||||
- Retry attempt → `Kind = CachedAttempt` audit row + `SiteCalls` status transition. Terminal → `Kind = CachedTerminal` audit row + `SiteCalls` terminal status.
|
||||
- Integration test asserts: triggering a `CachedCall` that fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row with `Status = Delivered`, all sharing the same `TrackedOperationId` correlation key.
|
||||
|
||||
**Task headlines:**
|
||||
1. `TrackedOperationId` GUID newtype in Commons.
|
||||
2. Site-local SQLite operation-tracking table + repo (matches `alog.md` cached-call tracking design).
|
||||
3. `CachedCallTelemetry` Commons message carrying both operational fields and `AuditEvent` payload.
|
||||
4. `SiteCalls` MS SQL table + EF mapping + migration + `ISiteCallAuditRepository` + repo impl.
|
||||
5. `SiteCallAuditActor` skeleton (singleton, central) — receives telemetry, owns `SiteCalls` upsert via repo.
|
||||
6. Extend `AuditLogIngestActor` to detect combined telemetry and execute both writes (`AuditLog` insert + `SiteCalls` upsert) in a single `DbContext` transaction.
|
||||
7. ESG `CachedCall()` emission — produce combined telemetry on every lifecycle transition (enqueue, attempt, terminal).
|
||||
8. Extend gRPC proto with the combined-telemetry RPC if it's distinct from `IngestAuditEvents`, or fold it into the existing one with a discriminator field (decision in milestone brainstorm).
|
||||
9. Integration test in `IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs`.
|
||||
|
||||
**Risk callouts:**
|
||||
- Combined telemetry packet evolution: design the packet so future cached audit-kind additions are non-breaking (oneof or open-field map).
|
||||
- Single transaction at central spans two tables; ensure connection retry behaviour is correct.
|
||||
- Idempotency: AuditLog dedups on `EventId`; SiteCalls dedups on `TrackedOperationId`. If telemetry retries and AuditLog already has the row, ensure SiteCalls upsert still runs (no short-circuit).
|
||||
|
||||
---
|
||||
|
||||
## M4 — Remaining boundary emission
|
||||
|
||||
**Goal:** Every channel × kind from `Component-AuditLog.md` produces a row when its boundary call fires.
|
||||
|
||||
**Affected projects:** `ExternalSystemGateway` (sync DB writes/reads, cached DB writes), `SiteRuntime` (Database surface exposing them), `NotificationOutbox` (central direct-write of `Attempt`/`Terminal`), `InboundAPI` (middleware). Tests across all.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Sync `Database.Connection().Execute()` → `DbOutbound.SyncWrite` row; `ExecuteReader` → `DbOutbound.SyncRead`. Parameter values captured by default; per-connection redaction opt-in supported.
|
||||
- `Database.CachedWrite` → three lifecycle rows via the combined telemetry built in M3.
|
||||
- Notification Outbox dispatcher: every delivery attempt writes `Notification.Attempt`; terminal writes `Notification.Terminal`. Site-emitted `Notification.Enqueued` flows through the standard site→central audit path. Audit-write failure never affects delivery.
|
||||
- Inbound API middleware writes one `ApiInbound.Completed` row per request, before `await next()` returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response.
|
||||
|
||||
**Task headlines:**
|
||||
1. ESG `Database.Connection()` execute hook — wrap `Execute*` / `ExecuteScalar` / `ExecuteReader` to emit before/after audit events.
|
||||
2. `Database.CachedWrite` combined-telemetry emission (mirror M3's ESG cached path).
|
||||
3. NotificationOutboxActor extension — inject `ICentralAuditWriter`; write `Notification.Attempt` per dispatcher attempt; write `Notification.Terminal` on terminal transitions; never abort on failure.
|
||||
4. Site-emitted `Notification.Enqueued` — when a script calls `Notify.To().Send()` (site-side via Store-and-Forward), emit a site audit row (`Notification.Enqueued`); telemetry forwards as usual.
|
||||
5. Inbound API middleware: new `AuditWriteMiddleware` in `src/ScadaLink.InboundAPI/Middleware/` writing `ApiInbound.Completed` before response flush; register in the ASP.NET pipeline.
|
||||
6. Tests: emission unit tests per call mode, plus 4 integration tests (one per channel).
|
||||
|
||||
**Risk callouts:**
|
||||
- Inbound API: correlation-id generation needs to be consistent with any upstream tracing headers (W3C `traceparent` if present).
|
||||
- Notification dispatcher: confirm `ICentralAuditWriter` errors are logged but don't block the dispatch loop.
|
||||
|
||||
---
|
||||
|
||||
## M5 — Payload + redaction policy
|
||||
|
||||
**Goal:** Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
|
||||
|
||||
**Affected projects:** `AuditLog` (policy engine + options), `ExternalSystemGateway` (HTTP header redactors, SQL param redaction hook), `InboundAPI` (header redactors, body capture), `NotificationOutbox` (subject/body capture follows existing rules). Tests.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- A `IAuditPayloadFilter` service is invoked between event construction and write. Truncates to default cap; raises to error cap on non-`Success` rows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and increments `AuditRedactionFailure`.
|
||||
- Configuration test: changing `appsettings.json` redactors changes runtime behaviour (no rebuild needed for regex changes).
|
||||
- Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
|
||||
|
||||
**Task headlines:**
|
||||
1. `IAuditPayloadFilter` + default implementation (header redaction, body regex, SQL parameter redaction, safety net).
|
||||
2. Wire the filter into the emission paths (M2, M3, M4 emitters all call through the filter before handing the `AuditEvent` to the writer).
|
||||
3. `appsettings.json` schema for the filter (already prepared in M1-T9; M5 plugs the runtime in).
|
||||
4. Tests: redaction unit tests with known-bad payloads (passwords in JSON, `Authorization` headers, SQL params named `@apikey`).
|
||||
5. Performance test in `tests/ScadaLink.PerformanceTests/` for the hot-path latency budget.
|
||||
|
||||
**Risk callouts:**
|
||||
- Regex performance — pre-compile and cache patterns; reject patterns that take too long to compile.
|
||||
- Don't redact post-truncation if the truncation cut a redaction target in half.
|
||||
|
||||
---
|
||||
|
||||
## M6 — Reconciliation, purge, partition maintenance, health metrics
|
||||
|
||||
**Goal:** Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
|
||||
|
||||
**Affected projects:** `AuditLog` (3 new actors: `SiteAuditReconciliationActor`, `AuditLogPurgeActor`, partition-maintenance worker), `Communication` (the `PullAuditEvents` RPC), `HealthMonitoring` (5 new metrics), `ConfigurationDatabase` (partition-roll-forward SQL helper).
|
||||
|
||||
**Acceptance criteria:**
|
||||
- `SiteAuditReconciliationActor` runs every 5 minutes per site; pulls events the site reports as `Pending`; central performs `InsertIfNotExistsAsync` then signals the site to flip those rows to `Reconciled`.
|
||||
- `AuditLogPurgeActor` runs daily; for each partition older than `RetentionDays`, switches it out to a staging table and drops the staging table. Emits an `AuditLog:Purged` event with rowcount + duration.
|
||||
- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
|
||||
- 5 new health metrics published per site: `SiteAuditBacklog` (count + oldest + bytes), `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`; and per central node: `CentralAuditWriteFailures`, `AuditRedactionFailure`.
|
||||
- Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
|
||||
|
||||
**Task headlines:**
|
||||
1. `PullAuditEvents` RPC on the existing `SiteStream` gRPC server.
|
||||
2. `SiteAuditReconciliationActor` actor with timer + per-site `LastReconciledAt` cursor.
|
||||
3. `AuditLogPurgeActor` actor with daily schedule, partition-switch logic via `IAuditLogRepository.SwitchOutPartitionAsync`.
|
||||
4. Partition-roll-forward helper (raw SQL `migrationBuilder.Sql` equivalent at runtime — likely a `HostedService` that runs once at startup and once per month).
|
||||
5. Health metric publishing per emitter; integrate with the existing `SiteHealthState` / `CentralHealthAggregator` plumbing.
|
||||
6. Integration tests for outage/recovery + purge.
|
||||
|
||||
**Risk callouts:**
|
||||
- Partition switch on an active table — ensure online schema operations don't block ingest; document the window if a brief lock is unavoidable.
|
||||
- Reconciliation can produce duplicate `Forwarded` ↔ `Reconciled` state flips; ensure idempotency at site SQLite layer.
|
||||
|
||||
---
|
||||
|
||||
## M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
|
||||
|
||||
**Goal:** User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
|
||||
|
||||
**Affected projects:** `CentralUI`, `CentralUI.Tests`, `CentralUI.PlaywrightTests`.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- New `Components/Pages/Audit/AuditLogPage.razor` exists; new "Audit" nav group sibling to Notifications.
|
||||
- All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per `Component-AuditLog.md` §10.
|
||||
- Existing `Components/Pages/Monitoring/AuditLog.razor` (the IAuditService config-change viewer) **renamed in code** to `ConfigurationAuditLog.razor`, with URL `/audit/configuration` to match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added.
|
||||
- 3 KPI tiles added to the Health dashboard; data sourced from `HealthMonitoring`.
|
||||
- Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on `ApiInbound` rows, drill-in from Notifications to filtered Audit Log.
|
||||
- `OperationalAudit` read permission gating + `AuditExport` for the Export button.
|
||||
|
||||
**Task headlines:**
|
||||
1. New `Components/Pages/Audit/AuditLogPage.razor` + matching `.razor.cs` code-behind + `.razor.css`.
|
||||
2. Custom Blazor `<AuditFilterBar>` component (multi-select chips for Channel/Kind/Status, autocomplete for Instance/Script).
|
||||
3. Custom Blazor `<AuditResultsGrid>` component — keyset paging via `QueryAsync` repository method (M1-T8).
|
||||
4. `<AuditDrilldownDrawer>` component — JSON pretty-print, SQL syntax highlight, "Copy as cURL", "Show all events" CorrelationId filter.
|
||||
5. Rename existing `AuditLog.razor` → `ConfigurationAuditLog.razor` + update routes + update internal links.
|
||||
6. Drill-in additions to 6 existing pages.
|
||||
7. 3 KPI tile components on Health dashboard.
|
||||
8. Server-side CSV export (streaming) with `AuditExport` permission check.
|
||||
9. Playwright E2E tests.
|
||||
|
||||
**Risk callouts:**
|
||||
- Permission check at the page level needs to align with the existing role/permission infrastructure (Security #10).
|
||||
- Keyset paging across partitioned table needs the right index; M1's `IX_AuditLog_OccurredAtUtc` is the supporting index.
|
||||
|
||||
---
|
||||
|
||||
## M8 — CLI: `scadalink audit query | export | verify-chain`
|
||||
|
||||
**Goal:** Operator surface for the centralized Audit Log.
|
||||
|
||||
**Affected projects:** `CLI`, `CLI.Tests`, `ManagementService` (new HTTP endpoint), `IntegrationTests`.
|
||||
|
||||
**Acceptance criteria:**
|
||||
- `scadalink audit query` mirrors the UI filter set; results stream as JSON (default) or table.
|
||||
- `scadalink audit export` streams server-side to CSV / JSONL / Parquet; requires `AuditExport` permission.
|
||||
- `scadalink audit verify-chain --month YYYY-MM` is a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).
|
||||
- Existing `audit-log query` (IAuditService config-change viewer) **renamed** in code to `audit-config query` to disambiguate; old name kept as a deprecated alias for one minor version.
|
||||
- Permissions: `audit query` and `audit verify-chain` require `OperationalAudit`; `audit export` additionally requires `AuditExport`.
|
||||
|
||||
**Task headlines:**
|
||||
1. New `AuditCommands.cs` (separate file from `AuditLogCommands.cs` — the latter stays for the renamed config audit).
|
||||
2. Build the three subcommands with their flag sets (per CLI doc & `alog.md` §15.1, post-Bundle-D fix).
|
||||
3. ManagementService HTTP endpoints backing each subcommand.
|
||||
4. Output formatters (JSON, table) reused from existing CLI patterns.
|
||||
5. CLI integration tests in `tests/ScadaLink.CLI.Tests/` + `tests/ScadaLink.IntegrationTests/`.
|
||||
6. Update CLI README + help text.
|
||||
|
||||
**Risk callouts:**
|
||||
- The CLI rename (`audit-log query` → `audit-config query`) breaks any operator scripts; provide a deprecation alias and document the migration.
|
||||
|
||||
---
|
||||
|
||||
## Cross-cutting concerns (apply at every milestone)
|
||||
|
||||
- **Branching:** every milestone gets its own `feature/audit-log-mN-<slice>` branch; merged with `--no-ff` to `main` on milestone completion. No pushes without explicit user authorization.
|
||||
- **Tests:** Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
|
||||
- **Commits:** small and frequent. Bite-sized per writing-plans skill.
|
||||
- **Reviews:** per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
|
||||
- **Docs:** if implementation reveals a design gap, fix the design doc FIRST (in `docs/requirements/Component-AuditLog.md` and/or `alog.md`), commit, then implement. Don't let the code and docs drift.
|
||||
- **Infra:** the 3 `infra/*` working-tree modifications still uncommitted on `main` are unrelated and stay that way unless the user explicitly addresses them. Use explicit `git add <path>` throughout, never `git commit -am`.
|
||||
|
||||
---
|
||||
|
||||
## Per-milestone execution flow (template)
|
||||
|
||||
When a milestone is about to start, run this sequence:
|
||||
|
||||
1. **Brainstorm**: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
|
||||
2. **Writing-plans**: produce a milestone-specific plan with TDD detail per task — saved to `docs/plans/2026-XX-XX-auditlog-mN-<slice>.md` + peer `.tasks.json`.
|
||||
3. **Subagent-driven execution**: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to `main` with `--no-ff`.
|
||||
|
||||
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.
|
||||
Reference in New Issue
Block a user