Files
scadalink-design/docs/plans/2026-05-20-audit-log-code-roadmap.md
Joseph Doherty 5492c94e2f docs(audit): roadmap closeout — all 8 milestones complete (#23)
Audit Log #23 implementation complete. M1-M8 merged to main. Full
solution 2,993 tests green, 0 failures. Records final state, the
v1.x deferrals (hash chain, Parquet, per-channel retention), and the
follow-ups noted during implementation (real gRPC push client, mapper
consolidation, Site Calls UI, multi-value filters, grid drag UX).
2026-05-20 22:16:53 -04:00

1669 lines
119 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Audit Log (#23) Code Implementation Roadmap
> **STATUS: COMPLETE — all 8 milestones (M1M8) implemented and merged to `main` 2026-05-20.**
> Final state: full solution `dotnet build ScadaLink.slnx` clean; `dotnet test ScadaLink.slnx`
> 24 test projects, 2,993 tests, 0 failures, 0 skipped. Each milestone shipped on its own
> `feature/audit-log-mN-*` branch, merged `--no-ff`, with a per-milestone implementation plan
> under `docs/plans/2026-05-20-auditlog-mN-*.md` and downstream roadmap corrections committed
> after each merge. `infra/*` was never touched on any milestone branch. `alog.md` +
> `Component-AuditLog.md` were patched exactly once (M1 vocabulary reconciliation, commit
> `3592e74`, committed before the dependent M1 code merge per the spec-before-code ordering rule).
>
> **Deferred to v1.x (out of scope, intentionally not implemented):** hash-chain tamper
> evidence (`audit verify-chain` ships as a no-op stub), Parquet export (`format=parquet`
> returns HTTP 501), per-channel retention overrides. **Deferred follow-ups noted during
> implementation:** the real site→central gRPC push client (M6 wired the pull RPC + a mockable
> push seam; `NoOpSiteStreamAuditClient` remains the production binding); consolidation of the
> 4 DTO mapper copies; Site Calls UI page + its Audit drill-in; multi-value filter dimensions
> (`AuditLogQueryFilter` is single-value per dimension, so UI chips / CLI flags collapse to the
> first value); audit-results-grid drag resize/reorder UX.
>
> **For Claude:** REQUIRED SUB-SKILL FLOW per milestone: `brainstorming` → `writing-plans` → `subagent-driven-development`. Use `docs/requirements/Component-AuditLog.md` + `alog.md` as the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. **M1 carries full TDD-level task detail; M2M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.**
**Goal:** Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
**Architecture:** Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See `alog.md` for the locked design decisions and `Component-AuditLog.md` for the component spec.
**Tech Stack:** Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing `SiteStream` channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
**Spec:** `/Users/dohertj2/Desktop/scadalink-design/alog.md` (validated, immutable; commit `fec0bb1`). Component design at `/Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md`.
---
## Codebase Reality Check (what already exists)
- **All 22 prior components have source + tests.** Audit Log slots in as a new `src/ScadaLink.AuditLog/` project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface).
- **Existing patterns to copy from:**
- Singleton wiring: `src/ScadaLink.Host/Actors/AkkaHostedService.cs:272280` (NotificationOutboxActor) — `ClusterSingletonManager.Props` + manager/proxy pair.
- EF migration: `src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs` — table create + indexes; **no partitioning yet — Audit Log will be the first.**
- Site SQLite hot-path: `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:2898` — single connection, write lock, Channel-based background writer.
- Site-buffer + forwarder: `src/ScadaLink.StoreAndForward/``StoreAndForwardStorage` + `NotificationForwarder` show the Pending → Forwarded transition we'll mirror.
- Actor + repo + test trio: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` and `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20` — TestKit base class, NSubstitute repo, `Sys.ActorOf`, `ExpectMsg<T>`.
- gRPC additive: `src/ScadaLink.Communication/Protos/sitestream.proto` — currently carries only `AttributeValueUpdate` and `AlarmStateUpdate` in a `oneof`; we extend it.
- CLI command shape: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs:153` — System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay).
- Blazor listing page: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` — filter bar + keyset paging + status badges idiom.
- **`AuditLog.razor` and `AuditLogCommands.cs` already exist** but they're the **IAuditService config-change viewer**. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.
- **Test framework:** xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under `tests/ScadaLink.IntegrationTests/`. Playwright UI tests under `tests/ScadaLink.CentralUI.PlaywrightTests/`. A `tests/ScadaLink.PerformanceTests/` exists for load tests.
---
## Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
The design for both is merged on `main` (`alog.md` cached-call tracking section; `Component-SiteCallAudit.md`), but `grep` finds zero references to `TrackedOperationId` or `CachedCallTelemetry` in `src/`. This matters because **M3 (cached operations + dual-write transaction) cannot be built without them**.
**Three ways to handle this — pick before M3:**
1. **Inline into M3 (Recommended):** Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the `CachedCallTelemetry` message, the operational-tracking SQLite table at sites, the `SiteCalls` table + repo + `SiteCallAuditActor` skeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end).
2. **M0 prerequisite milestone:** Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
3. **Ship Audit Log sync-only first, retrofit cached path later:** M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
**Default choice in this roadmap: (1).** M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
---
## Milestone index
| M | Title | Ships | Touches | Depends on |
|---|---|---|---|---|
| **M1** | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
| **M2** | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync `Call()` audited from script to central row). | Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
| **M3** | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
| **M4** | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
| **M5** | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
| **M6** | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
| **M7** | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing `AuditLog.razor` renamed to ConfigurationAuditLog. | CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
| **M8** | CLI — `scadalink audit query / export / verify-chain` | Operator surface for query/export; `verify-chain` is a no-op stub until v1.x hash chain ships. | CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
**Ship-state at end of each milestone is the shippable slice** — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
**Critical path:** M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
---
## M1 — Foundation: schema, types, DB roles, partitioning
**Goal:** Land the new `AuditLog` table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
**Affected projects:**
- `src/ScadaLink.Commons/` — entity, enums, interfaces, message DTOs.
- `src/ScadaLink.ConfigurationDatabase/` — EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.
- `tests/ScadaLink.Commons.Tests/` — enum + record tests.
- `tests/ScadaLink.ConfigurationDatabase.Tests/` — migration tests, repo tests.
**Acceptance criteria:**
- `dotnet build` of the solution succeeds.
- `dotnet ef database update` against a dev MS SQL applies the migration; `AuditLog` table exists, partitioned monthly on `OccurredAtUtc`, with PK on `EventId` and the five expected indexes.
- `scadalink_audit_writer` and `scadalink_audit_purger` SQL roles exist with the documented grants; a smoke test confirms `UPDATE AuditLog` from the writer role fails.
- `AuditEvent` record, `AuditChannel`/`AuditKind`/`AuditStatus` enums, `IAuditWriter`/`ICentralAuditWriter` interfaces, `AuditTelemetryEnvelope`/`PullAuditEvents` message DTOs all exist in Commons in the right folders.
- `IAuditLogRepository` interface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposes `InsertIfNotExistsAsync`, paged read, and `SwitchOutPartitionAsync` — no update or row-delete.
- All new tests pass; no existing tests regress.
### M1 — Tasks (TDD-detail)
#### M1-T1: Add audit enums to Commons
**Files:**
- Create: `src/ScadaLink.Commons/Types/Enums/AuditChannel.cs`, `AuditKind.cs`, `AuditStatus.cs`.
- Create: `tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs`.
**Steps:**
1. Write failing test verifying `AuditChannel` has exactly `ApiOutbound | DbOutbound | Notification | ApiInbound` (asserting `Enum.GetValues` length and members).
2. Same for `AuditKind` (10 members per `Component-AuditLog.md`).
3. Same for `AuditStatus` (8 members).
4. Run: tests fail (enums don't exist). Implement the three enums.
5. Run tests: pass.
6. Commit: `feat(commons): add Audit{Channel,Kind,Status} enums for #23`.
#### M1-T2: Add AuditEvent record + ForwardState enum
**Files:**
- Create: `src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs` — public record carrying all 20 central columns (per `alog.md` §4) plus a nullable `ForwardState?` for the site-local variant.
- Create: `src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs``Pending | Forwarded | Reconciled`.
- Create: `tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs`.
**Steps:**
1. Write failing test that constructs an `AuditEvent`, sets every property, and round-trips via `with` expressions — asserts immutability and required-property behaviour.
2. Run: fail (type doesn't exist). Implement the record.
3. Run: pass.
4. Commit: `feat(commons): add AuditEvent record + ForwardState enum`.
#### M1-T3: Add IAuditWriter and ICentralAuditWriter
**Files:**
- Create: `src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs`, `ICentralAuditWriter.cs`.
- Create: `tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs` (smoke — only that the interfaces exist and have the documented signatures).
**Steps:**
1. Write failing reflection-based test asserting both interfaces expose `Task WriteAsync(AuditEvent, CancellationToken)`.
2. Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
3. Run: pass.
4. Commit: `feat(commons): add IAuditWriter and ICentralAuditWriter`.
#### M1-T4: Add audit telemetry + pull message DTOs
**Files:**
- Create: `src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs`, `PullAuditEventsRequest.cs`, `PullAuditEventsResponse.cs`.
- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs`.
**Steps:**
1. Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
2. Failing test: pull request carries `SinceUtc` + `BatchSize`; response carries events + `MoreAvailable`.
3. Implement.
4. Run: pass.
5. Commit: `feat(commons): add audit telemetry + pull message DTOs`.
#### M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
**Files:**
- Modify: `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs` — add `public DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();` at the appropriate position (after `Notifications`).
- Create: `src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs``IEntityTypeConfiguration<AuditEvent>` mapping the columns, types, length constraints, and indexes per `alog.md` §4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively.
- Modify: `OnModelCreating` — apply the new configuration.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs` — use `ModelBuilder` directly to verify the entity is mapped to `AuditLog` table, PK is `EventId`, and the expected columns + indexes are declared.
**Steps:**
1. Failing test asserts mapped table name, PK column, and column count.
2. Implement entity configuration; apply in `OnModelCreating`.
3. Failing test asserts the five expected indexes exist on the model.
4. Add `HasIndex` declarations.
5. Run: pass.
6. Commit: `feat(configdb): map AuditEvent to AuditLog table with PK and indexes`.
#### M1-T6: Generate and customize EF migration for AuditLog
**Files:**
- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.cs` via `dotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase`.
- Modify: the generated `Up()` / `Down()` to:
- Create the partition function `pf_AuditLog_Month` and partition scheme `ps_AuditLog_Month` (raw SQL via `migrationBuilder.Sql(...)`), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting).
- Alter the `CreateTable` call (or follow up with `Sql`) to align the table to `ps_AuditLog_Month(OccurredAtUtc)`.
- Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs` — applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
**Steps:**
1. Run `dotnet ef migrations add AddAuditLogTable`.
2. Failing integration test: apply migration, query `sys.partition_functions` and `sys.partition_schemes` for the expected names.
3. Edit migration to add the partition function + scheme + alignment.
4. Re-run test: pass.
5. Failing test: query `sys.indexes` for the five expected named indexes.
6. Adjust migration if any index name drifts.
7. Run: pass.
8. Commit: `feat(configdb): add AuditLog migration with monthly partitioning`.
#### M1-T7: Add DB roles in migration
**Files:**
- Modify: the M1-T6 migration `Up()` to also create the `scadalink_audit_writer` (INSERT + SELECT only) and `scadalink_audit_purger` (ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (`IF NOT EXISTS`).
- Modify: `Down()` — drop the roles.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs` — applies migration, then runs `SELECT` on `sys.database_role_members` / `sys.database_permissions` to assert the role grants. Plus a smoke test: connect as a user mapped to `scadalink_audit_writer`, attempt `UPDATE AuditLog SET Status = 'X'` and expect a permission error.
**Steps:**
1. Failing test asserts both roles exist with documented grants.
2. Add `migrationBuilder.Sql(...)` blocks.
3. Run: pass.
4. Failing test: `UPDATE AuditLog` as audit writer → expect SqlException with permission error.
5. Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
6. Run: pass.
7. Commit: `feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles`.
#### M1-T8: Add IAuditLogRepository + EF implementation
**Files:**
- Create: `src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs``InsertIfNotExistsAsync(AuditEvent, CancellationToken)`, `QueryAsync(filter, paging, CancellationToken)`, `SwitchOutPartitionAsync(monthBoundary, CancellationToken)`. **Deliberately no `UpdateAsync` or row-level `DeleteAsync`.**
- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs` — implementation using the DbContext; `InsertIfNotExistsAsync` uses `MERGE` or raw `INSERT … WHERE NOT EXISTS` to satisfy idempotency without throwing on dupes.
- Modify: `ServiceCollectionExtensions.cs` — register `IAuditLogRepository``AuditLogRepository` in DI.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs`.
**Steps:**
1. Failing test: `InsertIfNotExistsAsync` for a fresh `EventId` writes one row; calling again with the same `EventId` is a no-op (no exception, no second row).
2. Implement; use a `MERGE` or `INSERT … WHERE NOT EXISTS` strategy that does NOT rely on EF change tracking.
3. Run: pass.
4. Failing test: paged `QueryAsync` returns rows in `(OccurredAtUtc desc, EventId desc)` order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range).
5. Implement filter projection + keyset paging.
6. Run: pass.
7. Failing test: `SwitchOutPartitionAsync` for the oldest partition removes its rows from the live table.
8. Implement via `migrationBuilder`-style `Sql("ALTER TABLE ... SWITCH PARTITION ... TO ...")` (against a staging table the implementation creates and drops within the same transaction).
9. Run: pass.
10. Commit: `feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge)`.
#### M1-T9: Add AuditLogOptions configuration class + binding
**Files:**
- Create: `src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs` (new project — see M1-T11) — owns `DefaultCapBytes`, `ErrorCapBytes`, `HeaderRedactList`, `GlobalBodyRedactors`, `PerTargetOverrides`, `RetentionDays`, validation attributes.
- Add: validation on startup (`IValidateOptions<AuditLogOptions>`).
- Test: ensure `appsettings.json` bind round-trips and validation rejects out-of-range `RetentionDays`.
**Steps:**
1. Failing test: bind a valid section → values present.
2. Implement options class + binding.
3. Failing test: bind invalid `RetentionDays` → validator rejects.
4. Implement validator.
5. Run: pass.
6. Commit: `feat(auditlog): add AuditLogOptions config binding`.
#### M1-T10: Add ScadaLink.AuditLog project skeleton
**Files:**
- Create: `src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj` — TargetFramework matches the rest of the solution; ProjectReferences to `ScadaLink.Commons` and `ScadaLink.ConfigurationDatabase`.
- Create: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs``AddAuditLog(this IServiceCollection, IConfiguration)` that registers `AuditLogOptions`, `IAuditLogRepository`, plus placeholders that later milestones will fill (writer impls, actors).
- Create: `tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csproj` with one smoke test.
- Modify: `ScadaLink.slnx` — add both projects to the solution.
- Modify: `Directory.Packages.props` if any new package versions are needed.
**Steps:**
1. Create projects via `dotnet new classlib` / `dotnet new xunit`; add references; add to slnx.
2. Failing test: smoke-test `AddAuditLog()` populates DI with `IAuditLogRepository` and `IOptions<AuditLogOptions>`.
3. Implement `ServiceCollectionExtensions.AddAuditLog`.
4. Run: pass.
5. Commit: `feat(auditlog): scaffold ScadaLink.AuditLog project`.
#### M1-T11: Update Component-Host.md responsibilities + README component table
**Files:**
- Modify: `docs/requirements/Component-Host.md` — list `ScadaLink.AuditLog` in the central role's registration set.
- Modify: `README.md` — confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
**Steps:**
1. Edit, verify cross-refs, commit: `docs(audit): register ScadaLink.AuditLog project in Host role`.
---
## M2 — Site pipeline (sync-only path)
**Goal:** First end-to-end audit emission: a script-initiated `ExternalSystem.Call()` produces an audit row in the central `AuditLog` table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: `ApiOutbound` / `ApiCall`.
**Affected projects:** `Commons`, `AuditLog` (new), `Communication`, `Host`, `ExternalSystemGateway`, all matching `*.Tests/`, `tests/ScadaLink.IntegrationTests/`.
> **M1 realities to honor:**
> - **Vocabulary**: M1 enums use `AuditKind.ApiCall` (sync) and `AuditStatus.Delivered|Failed`. The original spec's `SyncCall` / `Success` names were superseded; alog.md + Component-AuditLog.md were reconciled in the M1 merge.
> - **Idempotent insert race**: M1's `AuditLogRepository.InsertIfNotExistsAsync` uses non-locking `IF NOT EXISTS … INSERT`. M2 is the first concurrent writer (`AuditLogIngestActor` will receive batches from multiple sites). **Harden the repo before relying on it** — either add `WITH (UPDLOCK, HOLDLOCK)` to the existence check, or catch SqlException numbers 2601/2627 (duplicate key on `UX_AuditLog_EventId`) and swallow. Add a new task at the head of M2 for this fix and its concurrency test.
> - **Keyset tiebreaker test gap**: M1's `QueryAsync_Keyset_NextPageStartsAfterCursor` test uses five rows with distinct `OccurredAtUtc`, so the `Guid.CompareTo` tiebreaker branch is never exercised. Add a same-OccurredAt test in M2 (Bundle D reviewer's deferred recommendation).
> - **Reusable MSSQL fixture**: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs` + `[SkippableFact]` + `Skip.IfNot(_fixture.Available, _fixture.SkipReason)` is the established pattern. Consider promoting it to a `[CollectionDefinition]`-shared fixture when M2+ adds more MSSQL-dependent test classes.
> - **Project layout**: `src/ScadaLink.AuditLog/` is wired into the solution with `Configuration/AuditLogOptions.cs` + validator + `ServiceCollectionExtensions.AddAuditLog()`. M2's `Site/` and `Central/` subfolders attach to this project; the DI extension is the registration point.
**Acceptance criteria:**
- Site-local `IAuditWriter` writes to a per-site SQLite `auditlog.db` on the hot path with `ForwardState = 'Pending'`; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric.
- `SiteAuditTelemetryActor` drains pending rows in batches via a new `IngestAuditEvents` RPC on the existing `SiteStream` gRPC service; on success flips `ForwardState = 'Forwarded'`.
- `AuditLogIngestActor` (central singleton) receives the batch, performs `InsertIfNotExistsAsync` per event, returns ack.
- `ExternalSystem.Call()` emits one `ApiOutbound.SyncCall` row via `IAuditWriter` on every call completion; audit-write failure does NOT abort the script.
- Integration test in `tests/ScadaLink.IntegrationTests/` boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the central `AuditLog` within N seconds.
- No regressions in existing ExternalSystemGateway or Communication tests.
### M2 — Tasks (TDD-detail)
#### M2-T1: `SqliteAuditWriter` — schema + connection bootstrap
**Files:**
- Create: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — implements `IAuditWriter`. Constructor takes a `SqliteOptions` (path); single `SqliteConnection` per instance gated by `SemaphoreSlim(1,1)`. Calls `InitializeSchema()` on first use. Pattern from `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:2898`.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs`.
**Steps:**
1. Failing test: opening a writer against a `:memory:` SQLite produces an `AuditLog` table with the documented columns (the 20 central columns minus `IngestedAtUtc`, plus `ForwardState`).
2. Run: fail (class doesn't exist).
3. Implement `InitializeSchema()` with `CREATE TABLE IF NOT EXISTS AuditLog (...)`. Use SQLite column types matching the EF mapping where reasonable (`TEXT` for IDs, `INTEGER` for status enums, `BLOB` not used).
4. Run: pass.
5. Commit: `feat(auditlog): SqliteAuditWriter schema bootstrap`.
#### M2-T2: `SqliteAuditWriter` — hot-path `WriteAsync`
**Files:**
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs`.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs`.
**Steps:**
1. Failing test: `WriteAsync(event)` inserts one row with `ForwardState = Pending`.
2. Failing test: 1,000 concurrent `WriteAsync` calls all complete without exception and produce exactly 1,000 rows (write-lock correctness).
3. Run: fail.
4. Implement using a parameterized `INSERT` under `SemaphoreSlim` lock.
5. Run: pass.
6. Commit: `feat(auditlog): SqliteAuditWriter hot-path INSERT with write lock`.
#### M2-T3: `RingBufferFallback` — in-memory fallback
**Files:**
- Create: `src/ScadaLink.AuditLog/Site/RingBufferFallback.cs``Channel<AuditEvent>` with `BoundedChannelFullMode.DropOldest`, default capacity 1024.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs`.
**Steps:**
1. Failing test: enqueueing 1,025 events into a 1,024-cap ring drops the oldest and emits a `RingBufferOverflow` notification (incrementing a passed-in counter).
2. Failing test: `DrainTo(writer)` writes all buffered events in FIFO order and clears the ring.
3. Implement.
4. Run: pass.
5. Commit: `feat(auditlog): RingBufferFallback with drop-oldest overflow`.
#### M2-T4: `FallbackAuditWriter` — compose primary + ring behind `IAuditWriter`
**Files:**
- Create: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` — primary writer is `SqliteAuditWriter`; on transient exception, enqueues into `RingBufferFallback` and increments `SiteAuditWriteFailures` (M2-T11). On the next successful primary write, drains the ring back through the primary.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs`.
**Steps:**
1. Failing test: when the primary throws, the event lands in the ring and the call returns successfully.
2. Failing test: when primary writes succeed again, the ring drains in FIFO order.
3. Implement.
4. Run: pass.
5. Commit: `feat(auditlog): FallbackAuditWriter composing SQLite + ring`.
#### M2-T5: Extend `sitestream.proto` with `IngestAuditEvents` RPC
**Files:**
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `message AuditEventDto { string event_id = 1; google.protobuf.Timestamp occurred_at_utc = 2; ... }` (all 20 central fields), `message AuditEventBatch { repeated AuditEventDto events = 1; }`, `message IngestAck { repeated string accepted_event_ids = 1; }`, and `rpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);` on `SiteStreamService`.
- Build: `dotnet build src/ScadaLink.Communication/` regenerates the C# stubs.
- Create: `tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs`.
**Steps:**
1. Failing test: round-trip serialize/deserialize a populated `AuditEventDto`; assert all fields survive.
2. Edit proto; rebuild.
3. Run: pass.
4. Commit: `feat(comms): add IngestAuditEvents RPC + AuditEvent proto messages`.
#### M2-T6: `AuditEvent` ↔ `AuditEventDto` mapper
**Files:**
- Create: `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs` — static `ToDto(AuditEvent)` and `FromDto(AuditEventDto)`.
- Create: `tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs`.
**Steps:**
1. Failing test: round-trip a populated `AuditEvent` through `ToDto``FromDto`; assert equality on all 20 columns.
2. Implement.
3. Run: pass.
4. Commit: `feat(auditlog): AuditEvent ↔ proto Dto mapper`.
#### M2-T7: `SiteAuditTelemetryActor` — drain loop
**Files:**
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs``ReceiveActor` with a `Drain` self-tick. On `Drain`: read up to `BatchSize` `Pending` rows from SQLite; send via gRPC; mark accepted rows `Forwarded`.
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs``BatchSize = 256`, `BusyIntervalSeconds = 5`, `IdleIntervalSeconds = 30`.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.cs` using `TestKit` + NSubstitute for the gRPC client.
**Steps:**
1. Failing test: when SQLite has 50 pending rows, a `Drain` tick sends one batch via the mocked gRPC client.
2. Failing test: on ack, the corresponding rows flip to `Forwarded` in SQLite.
3. Failing test: when gRPC throws, rows stay `Pending` and the next tick retries.
4. Failing test: cadence is 5s after a tick that drained ≥1 row, 30s after a tick that drained 0.
5. Implement.
6. Run: pass.
7. Commit: `feat(auditlog): SiteAuditTelemetryActor drain loop`.
#### M2-T8: `AuditLogIngestActor` + gRPC server handler
**Files:**
- Create: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs``ReceiveActor` accepting `IngestAuditEventsCommand(batch)`; calls `IAuditLogRepository.InsertIfNotExistsAsync` for each event inside a single `DbContext` transaction; replies with `IngestAck(acceptedEventIds)`.
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — implement the new `IngestAuditEvents` method as a thin gRPC↔Akka adapter (`Ask` against the central singleton's proxy, mapped to the gRPC reply).
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs`.
**Steps:**
1. Failing test: actor receives a batch of 5 events; repo is called 5 times; reply lists all 5 EventIds as accepted.
2. Failing test: when 2 of 5 events already exist (repo returns `Inserted = false`), the reply still lists all 5 as accepted (idempotent semantics).
3. Failing test: gRPC handler routes to actor and returns its reply.
4. Implement.
5. Run: pass.
6. Commit: `feat(auditlog): AuditLogIngestActor + gRPC server handler`.
#### M2-T9: Host registration with dedicated dispatcher
**Files:**
- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — alongside the existing wiring at `:272280`, register `AuditLogIngestActor` as central singleton and `SiteAuditTelemetryActor` as site singleton bound to `audit-telemetry-dispatcher`. Manager + proxy pair for both.
- Modify: Host HOCON (likely `src/ScadaLink.Host/Configuration/akka.conf` or similar) — add `audit-telemetry-dispatcher { type = ForkJoinDispatcher; parallelism-min = 1; parallelism-max = 2; }`.
- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register actor `Props` factories so Host can resolve them.
- Create: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs`.
**Steps:**
1. Failing test: starting the host with the audit module loaded produces healthy `IActorRef` proxies for both singletons.
2. Failing test: `SiteAuditTelemetryActor` is bound to `audit-telemetry-dispatcher` (assert via Akka actor cell inspection or via a known-good dispatcher-tagged behaviour).
3. Implement.
4. Run: pass.
5. Commit: `feat(host): register AuditLog singletons with dedicated dispatcher`.
#### M2-T10: ESG `ExternalSystemClient.CallAsync` audit emission
**Files:**
- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs` (sync `CallAsync` around line 4570) — inject `IAuditWriter` via constructor. After the call completes (success OR exception), build an `AuditEvent` (channel=`ApiOutbound`, kind=`SyncCall`, status from outcome, `DurationMs`, `HttpStatus`, target = system+method, provenance from `ScriptExecutionContext`). Call `_auditWriter.WriteAsync(evt)` inside a `try`/`catch` that swallows + logs + increments `SiteAuditWriteFailures`.
- Modify: `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs` — accept `IAuditWriter` from DI.
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientAuditEmissionTests.cs`.
**Steps:**
1. Failing test: sync `CallAsync` success → exactly one event with `Status=Success`, `Channel=ApiOutbound`, `Kind=SyncCall`.
2. Failing test: sync `CallAsync` HTTP 500 → `Status=TransientFailure`, `HttpStatus=500`.
3. Failing test: sync `CallAsync` HTTP 400 → `Status=PermanentFailure`, `HttpStatus=400`.
4. Failing test: when `IAuditWriter.WriteAsync` throws, the script call still completes normally and the script sees the original (non-audit) result.
5. Implement.
6. Run: pass.
7. Commit: `feat(esg): emit ApiOutbound.SyncCall audit event on every sync call`.
#### M2-T11: `SiteAuditWriteFailures` health metric
**Files:**
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add a `SiteAuditWriteFailures` counter; expose it in the site health report payload.
- Modify: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` (M2-T4) — accept `IHealthMetrics` (or the project's existing health counter abstraction) and increment per failed primary write.
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SiteAuditWriteFailuresMetricTests.cs`.
**Steps:**
1. Failing test: 3 simulated SQLite failures → counter reports 3 in the next snapshot.
2. Implement.
3. Run: pass.
4. Commit: `feat(health): SiteAuditWriteFailures metric`.
#### M2-T12: End-to-end integration test
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/SyncCallEmissionTests.cs` — boots a site + central pair via the existing IntegrationTests harness; deploys a tiny script that calls a stub external system; asserts the central `AuditLog` table has exactly one row with the expected channel/kind/status within 10s.
- Possibly modify: `infra/reseed.sh` if integration tests need a fresh AuditLog table per run.
**Steps:**
1. Sketch the test using existing IntegrationTests fixtures.
2. Run: fail somewhere (gaps in earlier tasks surface here).
3. Iterate fixes back through M2-T1..M2-T11 until end-to-end passes.
4. Commit: `test(auditlog): end-to-end sync call emission integration test`.
### M2 — Risk callouts
- **SiteStream proto evolution:** adding a new top-level RPC is wire-compatible; confirm generated `Sitestream.cs` rebuilds cleanly and existing tests still pass.
- **Dedicated dispatcher misconfiguration:** if `SiteAuditTelemetryActor` lands on the script blocking-I/O dispatcher, scripts will starve during telemetry bursts. Add a runtime assertion in `M2-T9` that the actor's dispatcher matches expectation.
- **Script execution context plumbing:** ESG emission (M2-T10) needs `SourceInstanceId` / `SourceScript`; confirm these are reachable via the existing `ScriptExecutionContext` (or equivalent in SiteRuntime) before starting M2-T10.
- **Integration-test DB isolation:** target an isolated MS SQL database (or a dedicated schema) so the test doesn't clash with other integration tests.
---
## M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
**Goal:** Cached external calls (`ExternalSystem.CachedCall`) and cached DB writes (`Database.CachedWrite`) produce four audit rows per operation (`Kind=CachedSubmit Status=Submitted`, `Kind=ApiCallCached/DbWriteCached Status=Forwarded`, `Kind=ApiCallCached/DbWriteCached Status=Attempted` × N, `Kind=CachedResolve Status=Delivered|Failed|Parked|Discarded`) AND populate the operational `SiteCalls` table at central — in one transaction at central, from a single combined telemetry packet.
> **M2 realities to honor:**
> - **Vocabulary**: use the M1-aligned enums. M3 will be the first code to populate `AuditKind.ApiCallCached`, `DbWriteCached`, `CachedSubmit`, `CachedResolve`. The locked spec (alog.md + Component-AuditLog.md) was reconciled in the M1 merge.
> - **Site→central gRPC client deferred to M6**: M2 ships `NoOpSiteStreamAuditClient` as the production default. Site SQLite rows accumulate as `Pending` forever in production until M6. M3 component tests should use Bundle H's `DirectActorSiteStreamAuditClient` pattern (see `tests/ScadaLink.AuditLog.Tests/Integration/SyncCallEmissionEndToEndTests.cs:277-340`). Extract that helper into `tests/ScadaLink.AuditLog.Tests/Integration/Infrastructure/` so M3 cached-call E2E tests can reuse it without re-defining.
> - **Mapper duplication**: `SiteStreamGrpcServer.IngestAuditEvents` inlines DTO→entity decoding (intentional, to avoid the AuditLog→Communication project-ref cycle). The mapper lives at `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs`. M3 should add a comment in both spots tying them together, OR move the mapper into `src/ScadaLink.Commons/` (project-ref clean) so both consumers can share it.
> - **`AuditIngestAskTimeout = 30s` is hardcoded** in `SiteStreamGrpcServer.cs:37`. M3 may want to expose this via `CommunicationOptions` or `AuditLogOptions` as central reconciliation/dual-write traffic grows.
> - **CachedCallTelemetry message**: per CLAUDE.md, the existing `CachedCallTelemetry` message **does not yet exist in code**. M3 must create it from scratch (additively, per Commons REQ-COM-5a — DO NOT rename it `CachedOperationTelemetry`). It carries BOTH the AuditLog rows (4+) AND the SiteCalls upsert in one packet.
> - **Dual-write transaction**: central writes `AuditLog` + `SiteCalls` in one MS SQL transaction. The repository's `InsertIfNotExistsAsync` swallows duplicates (M2 Bundle A fix); the SiteCalls upsert uses `MERGE` (or insert-if-not-exists then upsert-on-newer-status per CLAUDE.md). M3 must ensure the same Bundle A swallow pattern applies if duplicate `CachedCallId` arrives.
> - **AuditEvent ForwardState semantics in M3**: cached-operation telemetry rows are site-emitted just like sync M2 rows, so the same site SQLite hot-path + `Pending→Forwarded` lifecycle applies. The four lifecycle rows share a CorrelationId (the TrackedOperationId), but each is its own AuditEvent with a distinct EventId.
**Affected projects:** `Commons`, `AuditLog`, `SiteCallAudit` (new — minimum-viable surface), `ConfigurationDatabase` (new `SiteCalls` table migration), `ExternalSystemGateway`, `StoreAndForward`, `Host`. Tests across all of them + IntegrationTests.
**Prerequisite call-out:** This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — `TrackedOperationId`, site-local operation tracking SQLite, `SiteCalls` table at central, the existing-message `CachedCallTelemetry` (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
**Acceptance criteria:**
- New `SiteCalls` MS SQL table + repo (no partitioning needed; this is operational state, not audit).
- New `CachedCallTelemetry` message in Commons carrying BOTH the cached-call operational fields AND an `AuditEvent` payload.
- Site path: `CachedCall` writes the audit row to site SQLite (`Kind = CachedEnqueued`), creates the site operation-tracking row, and sends a combined telemetry packet.
- Central path: `AuditLogIngestActor` (extended) receives the combined packet, performs **one transaction containing both** the `AuditLog` insert and the `SiteCalls` upsert.
- Retry attempt → `Kind = CachedAttempt` audit row + `SiteCalls` status transition. Terminal → `Kind = CachedTerminal` audit row + `SiteCalls` terminal status.
- Integration test asserts: triggering a `CachedCall` that fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row with `Status = Delivered`, all sharing the same `TrackedOperationId` correlation key.
### M3 — Tasks (TDD-detail)
#### M3-T1: `TrackedOperationId` strong-typed ID
**Files:**
- Create: `src/ScadaLink.Commons/Types/TrackedOperationId.cs` — readonly record struct wrapping `Guid`; `New()` / `Parse(string)` / `ToString()`.
- Create: `tests/ScadaLink.Commons.Tests/Types/TrackedOperationIdTests.cs`.
**Steps:**
1. Failing test: round-trip via `ToString()` / `Parse()` and equality semantics.
2. Implement.
3. Run: pass.
4. Commit: `feat(commons): TrackedOperationId strong type`.
#### M3-T2: Site-local operation-tracking SQLite table + repo
**Files:**
- Create: `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs` — SQLite-backed store with columns: `TrackedOperationId`, `Kind`, `TargetSummary`, `Status`, `RetryCount`, `LastError`, `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`, source provenance. Schema bootstrap on first use; uses the same write-lock pattern as `SqliteAuditWriter`. Implements `IOperationTrackingStore` (interface in Commons).
- Create: `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs``RecordEnqueueAsync`, `RecordAttemptAsync`, `RecordTerminalAsync`, `GetStatusAsync(TrackedOperationId)`, `PurgeTerminalAsync(olderThanUtc)`.
- Create: `tests/ScadaLink.SiteRuntime.Tests/Tracking/OperationTrackingStoreTests.cs`.
**Steps:**
1. Failing test: schema bootstrap creates the table.
2. Failing test: `RecordEnqueueAsync` inserts a `Pending` row; `RecordAttemptAsync` updates `Status`/`RetryCount`/`LastError`; `RecordTerminalAsync` finalises.
3. Failing test: `GetStatusAsync` returns the latest snapshot (answers `Tracking.Status(id)` site-locally).
4. Failing test: `PurgeTerminalAsync` removes terminal rows older than threshold; non-terminal rows are kept regardless of age.
5. Implement.
6. Run: pass.
7. Commit: `feat(siteruntime): site-local operation tracking SQLite store`.
#### M3-T3: `Tracking.Status(id)` API surface in SiteRuntime
**Files:**
- Modify: `src/ScadaLink.SiteRuntime/Scripting/TrackingApi.cs` (new or existing — confirm via repo) — public `Status(TrackedOperationId)` method routed through `IOperationTrackingStore`.
- Modify: script trust-model allow-list to include the new `Tracking.*` surface (confirm via grep).
- Create: `tests/ScadaLink.SiteRuntime.Tests/Scripting/TrackingApiTests.cs`.
**Steps:**
1. Failing test: `Tracking.Status(unknownId)` returns a documented "not found" sentinel.
2. Failing test: `Tracking.Status(knownId)` returns the latest snapshot.
3. Implement.
4. Run: pass.
5. Commit: `feat(siteruntime): Tracking.Status(id) script API`.
#### M3-T4: `CachedCallTelemetry` Commons message — carries both operational + audit content
**Files:**
- Create: `src/ScadaLink.Commons/Messages/Integration/CachedCallTelemetry.cs` — fields: `TrackedOperationId`, `Kind` (`CachedEnqueued`/`CachedAttempt`/`CachedTerminal` audit kind), operational status, retry count, last error, timestamps, and a nested `AuditEvent` carrying the audit row content. Documented as additive-only per Commons REQ-COM-5a.
- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/CachedCallTelemetryTests.cs`.
**Steps:**
1. Failing test: construct a telemetry packet for each of the three lifecycle kinds; verify the nested AuditEvent's channel/kind alignment (e.g., a `CachedAttempt` packet must carry an `AuditEvent` with `Kind = CachedAttempt`).
2. Failing test: serialization round-trip preserves both layers.
3. Implement.
4. Run: pass.
5. Commit: `feat(commons): CachedCallTelemetry carrying combined operational + audit content`.
#### M3-T5: `SiteCalls` MS SQL table — EF mapping
**Files:**
- Create: `src/ScadaLink.Commons/Entities/Audit/SiteCall.cs` — POCO record per Component-SiteCallAudit.md.
- Create: `src/ScadaLink.ConfigurationDatabase/Entities/SiteCallEntityTypeConfiguration.cs``IEntityTypeConfiguration<SiteCall>` with PK on `TrackedOperationId`, indexes on `(SourceSite, CreatedAtUtc)` and `(Status, UpdatedAtUtc)`.
- Modify: `ScadaLinkDbContext.cs``public DbSet<SiteCall> SiteCalls => Set<SiteCall>();`.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/SiteCallEntityTypeConfigurationTests.cs`.
**Steps:**
1. Failing test: model exposes `SiteCalls` table with documented columns and indexes.
2. Implement.
3. Run: pass.
4. Commit: `feat(configdb): map SiteCall to SiteCalls table`.
#### M3-T6: `SiteCalls` migration
**Files:**
- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/<ts>_AddSiteCallsTable.cs` via `dotnet ef migrations add AddSiteCallsTable`.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddSiteCallsTableMigrationTests.cs`.
**Steps:**
1. Failing test: applying the migration creates the `SiteCalls` table with PK + indexes.
2. Generate + adjust migration.
3. Run: pass.
4. Commit: `feat(configdb): add SiteCalls migration`.
#### M3-T7: `ISiteCallAuditRepository` + EF impl
**Files:**
- Create: `src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs``UpsertAsync(SiteCall)` (insert-if-not-exists by `TrackedOperationId`, otherwise update-on-newer-status using monotonic status progression), `GetAsync(TrackedOperationId)`, `QueryAsync(filter, paging)`, `PurgeTerminalAsync(olderThanUtc)`.
- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs`.
- Modify: `ServiceCollectionExtensions.cs` — register.
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs`.
**Steps:**
1. Failing test: first `UpsertAsync` inserts; second `UpsertAsync` with an advanced status updates; an `UpsertAsync` with an older status is a no-op (monotonic progression).
2. Failing test: paged query supports the documented filter set.
3. Implement.
4. Run: pass.
5. Commit: `feat(configdb): ISiteCallAuditRepository + EF impl`.
#### M3-T8: `SiteCallAuditActor` skeleton (central singleton)
**Files:**
- Create: `src/ScadaLink.SiteCallAudit/` (new project) — `SiteCallAuditActor.cs` + `ScadaLink.SiteCallAudit.csproj` + `ServiceCollectionExtensions.cs`. Actor handles `UpsertSiteCallCommand` messages by calling `ISiteCallAuditRepository.UpsertAsync`. Note: full reconciliation, KPIs, and Retry/Discard relay are explicitly deferred — this is the minimum-viable surface for M3.
- Modify: `ScadaLink.slnx` to include the new project.
- Create: `tests/ScadaLink.SiteCallAudit.Tests/SiteCallAuditActorTests.cs`.
**Steps:**
1. Failing test: actor receives `UpsertSiteCallCommand`, calls repo, replies with ack.
2. Failing test: actor swallows transient DB errors and surfaces them as health metrics (does NOT crash the central singleton).
3. Implement.
4. Run: pass.
5. Commit: `feat(scaudit): SiteCallAuditActor minimum viable surface`.
#### M3-T9: Extend `sitestream.proto` with `IngestCachedTelemetry` RPC OR extend `IngestAuditEvents`
**Files:**
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — preferred approach: add a new top-level RPC `rpc IngestCachedTelemetry(CachedTelemetryBatch) returns (IngestAck);` and a `message CachedTelemetryPacket { AuditEventDto audit_event = 1; SiteCallOperationalDto operational = 2; }` plus `message CachedTelemetryBatch { repeated CachedTelemetryPacket packets = 1; }`. Decision should be confirmed during M3's brainstorm.
- Build to regenerate.
- Create: `tests/ScadaLink.Communication.Tests/Protos/CachedTelemetryProtoTests.cs`.
**Steps:**
1. Failing test: round-trip a populated `CachedTelemetryPacket`.
2. Add proto + rebuild.
3. Run: pass.
4. Commit: `feat(comms): IngestCachedTelemetry RPC + combined telemetry messages`.
#### M3-T10: Extend `AuditLogIngestActor` for combined telemetry — dual-write transaction
**Files:**
- Modify: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — add a handler for the cached telemetry message. Inside a **single `DbContext` transaction**: (a) call `IAuditLogRepository.InsertIfNotExistsAsync(auditEvent)`, then (b) call `ISiteCallAuditRepository.UpsertAsync(operationalState)`. Both must succeed or both must roll back.
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — route the new RPC to the central actor.
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorCombinedTelemetryTests.cs`.
**Steps:**
1. Failing test: a single combined packet produces one AuditLog row AND one SiteCalls row (or upsert).
2. Failing test: when the SiteCalls upsert throws, the AuditLog insert is rolled back (no orphan rows).
3. Failing test: when the AuditLog insert is a no-op (duplicate `EventId`), the SiteCalls upsert still runs.
4. Failing test: when both rows already exist with monotonic-equal statuses, the operation is a no-op overall (full idempotency).
5. Implement.
6. Run: pass.
7. Commit: `feat(auditlog): combined telemetry dual-write transaction`.
#### M3-T11: ESG `CachedCallAsync` — emit `CachedEnqueued` on enqueue
**Files:**
- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:75136` (cached call) — at the moment of buffering into S&F: build an `AuditEvent` (channel=`ApiOutbound`, kind=`CachedEnqueued`) AND a `SiteCallOperationalDto` (status=`Pending`); package as a `CachedTelemetryPacket`; hand to the combined-telemetry forwarder.
- Modify: `src/ScadaLink.ExternalSystemGateway/Cached/CachedCallTelemetryForwarder.cs` (new) — accumulates packets and posts to `SiteAuditTelemetryActor` (or a sibling actor — decision in milestone brainstorm).
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedCallEnqueueEmissionTests.cs`.
**Steps:**
1. Failing test: an enqueued cached call produces exactly one packet with `kind=CachedEnqueued`.
2. Implement.
3. Run: pass.
4. Commit: `feat(esg): CachedCall emits CachedEnqueued combined telemetry on buffering`.
#### M3-T12: ESG `CachedCallAsync` — emit `CachedAttempt` per retry
**Files:**
- Modify: `src/ScadaLink.StoreAndForward/` retry loop (locate the per-attempt callback site) to emit a `CachedAttempt` packet on each attempt (success OR transient failure).
- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallAttemptEmissionTests.cs`.
**Steps:**
1. Failing test: an attempt that returns HTTP 500 produces a packet with `kind=CachedAttempt`, `status=TransientFailure`, `HttpStatus=500`.
2. Failing test: a successful attempt produces a packet with `kind=CachedAttempt`, `status=Success`, `HttpStatus=200`.
3. Implement.
4. Run: pass.
5. Commit: `feat(snf): CachedCall emits CachedAttempt per retry`.
#### M3-T13: ESG `CachedCallAsync` — emit `CachedTerminal` on terminal state
**Files:**
- Modify: same retry-loop terminal-transition site — on `Delivered` / `Failed` / `Parked` / `Discarded`, emit one final `CachedTerminal` packet.
- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallTerminalEmissionTests.cs`.
**Steps:**
1. Failing test: a cached call that succeeds on attempt 3 produces (in order): 1 `CachedEnqueued`, 3 `CachedAttempt`, 1 `CachedTerminal` (with `status=Delivered`).
2. Failing test: a cached call that exhausts retries produces a final `CachedTerminal` with `status=Parked`.
3. Implement.
4. Run: pass.
5. Commit: `feat(snf): CachedCall emits CachedTerminal on lifecycle terminal`.
#### M3-T14: `Database.CachedWrite` — mirror the three-lifecycle emission for DB cached writes
**Files:**
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or equivalent — confirm via repo) — same three-event emission pattern as ESG cached calls, but `channel=DbOutbound`.
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedWriteLifecycleEmissionTests.cs`.
**Steps:**
1. Failing test: a `CachedWrite` that succeeds first try produces `CachedEnqueued` + `CachedAttempt(Success)` + `CachedTerminal(Delivered)`.
2. Failing test: a `CachedWrite` with transient retry mirrors the ESG pattern.
3. Implement.
4. Run: pass.
5. Commit: `feat(esg): Database.CachedWrite emits three-lifecycle combined telemetry`.
#### M3-T15: Host registration — `SiteCallAuditActor` central singleton
**Files:**
- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — register `SiteCallAuditActor` central singleton + proxy alongside `AuditLogIngestActor`.
- Modify: `src/ScadaLink.SiteCallAudit/ServiceCollectionExtensions.cs` — register actor props.
- Modify: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs` — extend to assert `SiteCallAuditActor` proxy resolves.
**Steps:**
1. Failing test: starting host produces the new singleton's proxy.
2. Implement.
3. Run: pass.
4. Commit: `feat(host): register SiteCallAuditActor central singleton`.
#### M3-T16: Integration test — cached external call audit (end-to-end)
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs` — site + central; stub external system returns 500 twice then 200; script invokes `ExternalSystem.CachedCall("System","Method", args)`; assert AuditLog has 5 rows (Enqueued + 3 Attempts + Terminal) AND SiteCalls has 1 row with `Status=Delivered` AND `Tracking.Status(id)` reports the same.
**Steps:**
1. Sketch test against IntegrationTests harness.
2. Run: fail (likely surfacing earlier-task gaps).
3. Iterate fixes until pass.
4. Commit: `test(auditlog): cached call combined telemetry end-to-end`.
#### M3-T17: Integration test — cached DB write audit (end-to-end)
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedWriteCombinedTelemetryTests.cs` — mirror M3-T16 against the DB cached path.
**Steps:**
1. Sketch.
2. Iterate.
3. Commit: `test(auditlog): cached DB write combined telemetry end-to-end`.
#### M3-T18: Idempotency test — duplicate telemetry doesn't double-insert / double-upsert
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CombinedTelemetryIdempotencyTests.cs` — force the same packet to arrive twice (simulated telemetry retry); assert AuditLog still has exactly one row and SiteCalls upsert is monotonic.
**Steps:**
1. Sketch.
2. Pass.
3. Commit: `test(auditlog): combined telemetry idempotency on retried packets`.
### M3 — Risk callouts
- **Combined telemetry packet evolution:** design the proto so future cached audit-kind additions are non-breaking (avoid `oneof` for fields you'll extend; use sparse field numbers).
- **Dual-write transaction failure modes:** the single `DbContext` transaction at central spans two tables; ensure retry behaviour on transient connection errors works as expected (existing `IDbExecutionStrategy` patterns may apply).
- **Idempotency cross-table:** AuditLog dedups on `EventId`, SiteCalls dedups on `TrackedOperationId` with status-monotonic update. A retried packet whose AuditLog row exists must still upsert SiteCalls (no short-circuit).
- **Scope discipline:** M3 inlines the *minimum* surface for #22 and cached-call tracking. Full #22 reconciliation, KPIs, and Retry/Discard relay are deferred. Note in the milestone brainstorm whether any extra #22 surface is genuinely needed for M3 acceptance criteria — if not, defer aggressively.
- **`Tracking.Status` semantics:** confirmed authoritative site-locally per design; no central round-trip. Ensure the test in M3-T3 reflects this.
---
## M4 — Remaining boundary emission
**Goal:** Every channel × kind from `Component-AuditLog.md` produces a row when its boundary call fires.
**Affected projects:** `ExternalSystemGateway` (sync DB writes/reads, cached DB writes), `SiteRuntime` (Database surface exposing them), `NotificationOutbox` (central direct-write of `Attempt`/`Terminal`), `InboundAPI` (middleware). Tests across all.
> **M3 realities to honor:**
> - **Vocabulary**: use the M1-aligned enums. The roadmap's old `SyncWrite/SyncRead/Notification.Attempt/Notification.Terminal/Notification.Enqueued/ApiInbound.Completed/PermanentFailure` strings are pre-M1 spec wording — DO NOT use those names in code. Translation:
> - sync DB write/read → `AuditKind.DbWrite` (Channel=DbOutbound); distinguish read vs write via `Extra` (e.g., `{"op": "read", "rowsReturned": 42}`).
> - notification delivery attempt → `AuditKind.NotifyDeliver` with `AuditStatus.Attempted`.
> - notification delivery terminal → `AuditKind.NotifyDeliver` with `AuditStatus.Delivered|Failed|Parked|Discarded`.
> - notification submit (site-emit) → `AuditKind.NotifySend` with `AuditStatus.Submitted`.
> - inbound API success → `AuditKind.InboundRequest` with `AuditStatus.Delivered`.
> - inbound API auth failure → `AuditKind.InboundAuthFailure` with `AuditStatus.Failed`.
> - "permanent failure" → `AuditStatus.Failed`. "Transient failure" never lands a terminal row.
> - **Mapper consolidation**: M3 surfaced 4 DTO mappers (AuditEventMapper, SiteStreamGrpcServer inline, SiteCall DTO mapper, DirectActorSiteStreamAuditClient test stub). M4 should extract a single `IntegrationMappers` helper in `src/ScadaLink.Commons/Messages/Integration/` or similar to consolidate before adding more channels. The project-ref cycle that motivated the inline duplication can be broken by moving the mapper into Commons (proto types are auto-generated in Communication; the mapper just needs the proto types reachable from Commons via a transitive ref).
> - **`OnCachedTelemetryWithoutDualWriteAsync` test-mode fallback**: in `AuditLogIngestActor` for the single-repo ctor. M4 may deprecate the single-repo constructor entirely and migrate tests to the IServiceProvider+harness pattern.
> - **Site SQLite drain for OperationTrackingStore**: M3 wrote the tracking half site-locally but no drain pipeline pushes it to central — central reads SiteCalls operational state via the dual-write transaction only. If M4 needs central visibility into in-flight (non-terminal) tracking entries, plan a drain.
> - **`SiteCallAuditActor`**: wired in M3 as a cluster singleton + proxy but not on the M3 hot path. M4 (or M6 reconciliation) is the natural first direct caller — wire one production code path through it.
> - **Vocabulary correction** in the body of M4 below: every M4-T*1-N step that still says `Status=PermanentFailure`, `Kind=SyncWrite/SyncRead/Completed/Attempt/Terminal/Enqueued` is stale; apply the translation above when implementing.
**Acceptance criteria:**
- Sync `Database.Connection().Execute()``DbOutbound.DbWrite` row (with `Extra.op = "write"` and `rowsAffected`); `ExecuteReader``DbOutbound.DbWrite` row (with `Extra.op = "read"` and `rowsReturned`). Parameter values captured by default; per-connection redaction opt-in supported.
- `Database.CachedWrite` → three lifecycle rows via the combined telemetry built in M3.
- Notification Outbox dispatcher: every delivery attempt writes `NotifyDeliver` with `Status=Attempted`; terminal writes `NotifyDeliver` with `Status={Delivered|Failed|Parked|Discarded}`. Site-emitted `NotifySend` (`Status=Submitted`) flows through the standard site→central audit path. Audit-write failure never affects delivery.
- Inbound API middleware writes one `ApiInbound.InboundRequest` row per request, before `await next()` returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response. Auth failures emit `ApiInbound.InboundAuthFailure` with `Status=Failed`.
### M4 — Tasks (TDD-detail)
#### M4-T1: ESG `Database.Connection().ExecuteAsync` audit emission — `DbOutbound.SyncWrite`
**Files:**
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or wherever the script-facing `Execute*` lives — confirm via repo) — wrap the call site to emit an `AuditEvent` (channel=`DbOutbound`, kind=`SyncWrite`) on every `Execute`/`ExecuteScalar`. Capture statement text, parameter values (default; redaction in M5), `DurationMs`, `rowsAffected` in `Extra`.
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncWriteEmissionTests.cs`.
**Steps:**
1. Failing test: `Execute("INSERT INTO ...", new {...})` emits one event with `Channel=DbOutbound`, `Kind=SyncWrite`, statement text + parameter values captured.
2. Failing test: `ExecuteScalar` emits the same kind.
3. Failing test: execute that throws → emission with `Status=PermanentFailure`, `ErrorMessage` populated.
4. Failing test: audit-write failure does NOT abort the SQL call (script sees the original outcome).
5. Implement.
6. Run: pass.
7. Commit: `feat(esg): emit DbOutbound.SyncWrite on script-initiated Execute*`.
#### M4-T2: ESG `Database.Connection().ExecuteReaderAsync` audit emission — `DbOutbound.SyncRead`
**Files:**
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` — wrap `ExecuteReader` to emit `DbOutbound.SyncRead`. Capture statement, parameter values, `DurationMs`, `rowsReturned` in `Extra`. Response body capture defaults to NOT including rows; opt-in via per-connection config (M5).
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncReadEmissionTests.cs`.
**Steps:**
1. Failing test: `Query<T>("SELECT ...")` emits one event with `Channel=DbOutbound`, `Kind=SyncRead`.
2. Failing test: `rowsReturned` appears in `Extra`.
3. Implement.
4. Run: pass.
5. Commit: `feat(esg): emit DbOutbound.SyncRead on script-initiated reads`.
#### M4-T3: NotificationOutboxActor — inject `ICentralAuditWriter`
**Files:**
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:2268` — constructor accepts `ICentralAuditWriter`. Wire into DI in `ServiceCollectionExtensions.cs`.
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAuditInjectionTests.cs`.
**Steps:**
1. Failing test: actor's `Props` factory accepts an `ICentralAuditWriter`; constructor stores it.
2. Implement.
3. Run: pass.
4. Commit: `feat(notif): NotificationOutboxActor accepts ICentralAuditWriter`.
#### M4-T4: NotificationOutboxActor — emit `Notification.Attempt` per dispatcher attempt
**Files:**
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` dispatcher attempt branch (after each delivery attempt resolves) — emit `Notification.Attempt` row with `Status` mapped from attempt result (`Success`, `TransientFailure`, `PermanentFailure`).
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAttemptEmissionTests.cs`.
**Steps:**
1. Failing test: a successful attempt → exactly one event with `Kind=Attempt`, `Status=Success`.
2. Failing test: a transient-failure attempt → `Status=TransientFailure`, `ErrorMessage` populated.
3. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the dispatcher's per-attempt `Notifications` row update STILL succeeds (audit must never block delivery).
4. Implement.
5. Run: pass.
6. Commit: `feat(notif): emit Notification.Attempt per dispatcher attempt`.
#### M4-T5: NotificationOutboxActor — emit `Notification.Terminal` on terminal transition
**Files:**
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` terminal branches (`Delivered` / `Parked` / `Discarded` transitions) — emit `Notification.Terminal` row.
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorTerminalEmissionTests.cs`.
**Steps:**
1. Failing test: a notification that succeeds emits one `Terminal` event with `Status=Delivered`.
2. Failing test: a Parked transition emits `Status=Parked`.
3. Failing test: an operator Discard emits `Status=Discarded`.
4. Implement.
5. Run: pass.
6. Commit: `feat(notif): emit Notification.Terminal on terminal transitions`.
#### M4-T6: Site-emitted `Notification.Enqueued`
**Files:**
- Modify: `src/ScadaLink.NotificationService/` (or wherever the site-side `Notify.To().Send()` runs — confirm via repo) — at the moment of buffering into the site S&F: emit a site-side `AuditEvent` (channel=`Notification`, kind=`Enqueued`) via `IAuditWriter`. Telemetry forwards as usual.
- Create: `tests/ScadaLink.NotificationService.Tests/NotifyEnqueueAuditEmissionTests.cs`.
**Steps:**
1. Failing test: `Notify.To("list").Send("subject", "body")` emits one event with `Channel=Notification`, `Kind=Enqueued`, target=list name, body captured (subject too).
2. Failing test: audit-write failure does not abort `Send()`.
3. Implement.
4. Run: pass.
5. Commit: `feat(notif): emit Notification.Enqueued from site-side Notify.Send`.
#### M4-T7: Inbound API — `AuditWriteMiddleware`
**Files:**
- Create: `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs` — ASP.NET Core middleware. After `await next()` (so the response is fully resolved but BEFORE flush — using `HttpResponse.OnStarting` or buffered body), build an `AuditEvent` (channel=`ApiInbound`, kind=`Completed`, `Actor`=API key NAME from request context, `Target`=method name, `HttpStatus`, `DurationMs`, `RequestSummary`/`ResponseSummary`). Call `ICentralAuditWriter.WriteAsync` inside `try`/`catch` — failures never affect the response.
- Modify: `src/ScadaLink.InboundAPI/Startup.cs` (or wherever the pipeline is configured) — register middleware.
- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/AuditWriteMiddlewareTests.cs`.
**Steps:**
1. Failing test: a successful POST to `/api/{method}` produces one `ApiInbound.Completed` event with `HttpStatus=200`.
2. Failing test: a 400/401/500 response produces an event with the matching `HttpStatus` and `Status` mapped (`PermanentFailure` for 4xx, `TransientFailure` for 5xx).
3. Failing test: `Actor` carries the API key NAME (never the key material).
4. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the HTTP response is unchanged (success stays success).
5. Failing test: request remote IP and User-Agent appear in `Extra`.
6. Implement.
7. Run: pass.
8. Commit: `feat(inbound): AuditWriteMiddleware emitting ApiInbound.Completed per request`.
#### M4-T8: Register middleware in the ASP.NET pipeline
**Files:**
- Modify: `src/ScadaLink.InboundAPI/Startup.cs` / `Program.cs``app.UseMiddleware<AuditWriteMiddleware>()` placed AFTER auth (so `Actor` resolves) and BEFORE the script-execution handler.
- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/MiddlewareOrderTests.cs`.
**Steps:**
1. Failing test: pipeline ordering puts AuditWrite after auth, before script execution.
2. Implement.
3. Run: pass.
4. Commit: `feat(inbound): register AuditWriteMiddleware in pipeline`.
#### M4-T9: Integration test — DB sync emission
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/DatabaseSyncEmissionTests.cs` — script invokes `Database.Connection().Execute("INSERT ...")` and `Query<T>("SELECT ...")`; assert central AuditLog has one `DbOutbound.SyncWrite` row and one `DbOutbound.SyncRead` row.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): DB sync emission integration test`.
#### M4-T10: Integration test — Notify dispatcher audit trail
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/NotifyDispatcherAuditTrailTests.cs` — script calls `Notify.To(list).Send(...)`; stub SMTP returns transient then success; assert AuditLog has `Enqueued` + 2 `Attempt` (one transient, one success) + 1 `Terminal(Delivered)`.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): Notify dispatcher audit trail end-to-end`.
#### M4-T11: Integration test — Inbound API request audit
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/InboundApiAuditTests.cs` — POST to `/api/{method}` with a valid API key; assert one `ApiInbound.Completed` row with the expected `Actor` (key name), `HttpStatus=200`, request/response bodies captured.
- Also test: POST with a bad API key → row with `Actor=NULL` (or "<unknown>"), `HttpStatus=401`, `Extra` carries `remoteIp`.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): Inbound API request audit end-to-end`.
#### M4-T12: Integration test — audit-write failure never aborts the action
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/AuditWriteFailureSafetyTests.cs` — inject a broken `ICentralAuditWriter` (always throws) for one test; assert that ESG sync calls, ESG cached calls, DB writes, Inbound API calls, and Notification dispatch all still complete successfully and the script/caller sees the normal outcome.
**Steps:**
1. Sketch test with broken-writer DI override per scenario.
2. Run, fix any spots where audit-write exceptions leak.
3. Commit: `test(auditlog): audit failures never abort user-facing actions`.
### M4 — Risk callouts
- **Inbound API correlation IDs:** if upstream tracing headers (W3C `traceparent`) are present, prefer them as `CorrelationId`; otherwise generate. Confirm whether existing middleware sets a request ID we can reuse.
- **`AuditWriteMiddleware` placement:** must run AFTER authentication so the API key NAME is in `HttpContext.User`. Verify with the middleware-order test in M4-T8.
- **Notification dispatcher loop hot-path:** audit emission must NOT extend per-attempt latency materially. Bench in M4-T10 if there's any concern.
- **DB parameter capture:** parameter values are captured verbatim by default (per design); redaction is opt-in (M5). For M4, just capture — don't try to second-guess what's sensitive.
---
## M5 — Payload + redaction policy
> **M4 realities to honor:**
> - **Decorator surfaces to filter**: `AuditingDbConnection`/`AuditingDbCommand`/`AuditingDbDataReader` (Bundle A) emit `RequestSummary` as raw SQL + parameters today. M5's `IAuditPayloadFilter` runs between event construction and writer call; the AuditingDb decorators must call into the filter before `WriteAsync`.
> - **CentralAuditWriter wraps `IAuditLogRepository.InsertIfNotExistsAsync`** (Bundle B). M5 should plug the filter into BOTH the site-side `FallbackAuditWriter` and the central-side `CentralAuditWriter` so direct-write paths (NotificationOutboxActor, AuditWriteMiddleware) are also filtered. Plugin location: in each writer's `WriteAsync` BEFORE the storage call.
> - **InboundAPI middleware `RequestSummary` already populates, `ResponseSummary = null`** (Bundle D punted response-body capture). M5 should add response-body buffering OR document that ResponseSummary stays null for v1 (acceptable per the spec — captures are best-effort).
> - **`AuditWriteMiddleware` path-scoped via `UseWhen(/api/)`** — M5 may want to introduce per-target redaction overrides; that path-scoped setup gives a natural hook for per-route redaction (e.g., `/api/secrets/*` has stricter caps).
> - **Error-row vocabulary**: cap raised to 64 KB on rows with `Status NOT IN ('Delivered', 'Submitted', 'Forwarded')`. The new vocabulary (`Failed/Parked/Discarded/Attempted/Skipped`) is what triggers the elevated cap. NOT "non-Success" wording from the original spec.
> - **InternalsVisibleTo precedent**: AuditLog.Tests can reach internals of SiteRuntime + NotificationOutbox + (newly) AuditLog. M5 redaction tests can exercise internal helpers similarly.
**Goal:** Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
**Affected projects:** `AuditLog` (policy engine + options), `ExternalSystemGateway` (HTTP header redactors, SQL param redaction hook), `InboundAPI` (header redactors, body capture), `NotificationOutbox` (subject/body capture follows existing rules). Tests.
**Acceptance criteria:**
- A `IAuditPayloadFilter` service is invoked between event construction and write. Truncates to default cap; raises to error cap on non-`Success` rows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and increments `AuditRedactionFailure`.
- Configuration test: changing `appsettings.json` redactors changes runtime behaviour (no rebuild needed for regex changes).
- Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
### M5 — Tasks (TDD-detail)
#### M5-T1: `IAuditPayloadFilter` interface
**Files:**
- Create: `src/ScadaLink.AuditLog/Payload/IAuditPayloadFilter.cs` — single method `AuditEvent Apply(AuditEvent rawEvent)` that returns a filtered copy (truncation + redaction applied).
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/PayloadFilterContractTests.cs`.
**Steps:**
1. Failing test: interface exists, method signature matches.
2. Implement.
3. Run: pass.
4. Commit: `feat(auditlog): IAuditPayloadFilter contract`.
#### M5-T2: `DefaultAuditPayloadFilter` — truncation (default + error cap)
**Files:**
- Create: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — composes `TruncationStage` + redactors (M5-T3/T4/T5). Truncation rule: default cap = `AuditLogOptions.DefaultCapBytes` (8 KB); error cap = `ErrorCapBytes` (64 KB) applied when `Status` is NOT in {`Success`, `Delivered`, `Enqueued`}. UTF-8 byte-safe boundary (no mid-character cuts). Set `PayloadTruncated = true` when applied.
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/TruncationTests.cs`.
**Steps:**
1. Failing test: 10 KB success body → truncated to 8 KB; `PayloadTruncated = true`.
2. Failing test: 10 KB body on `Status=TransientFailure` → not truncated (under 64 KB cap); `PayloadTruncated = false`.
3. Failing test: 70 KB body on `Status=PermanentFailure` → truncated to 64 KB; `PayloadTruncated = true`.
4. Failing test: multi-byte UTF-8 character that would straddle the cap is not split mid-character.
5. Implement.
6. Run: pass.
7. Commit: `feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety`.
#### M5-T3: HTTP header redaction
**Files:**
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add header-redaction stage. Strips header values for names in `AuditLogOptions.HeaderRedactList` (default: `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`) and any matching configured regex. Replacement: `<redacted>`.
- Headers travel in `RequestSummary` / `ResponseSummary` (JSON of headers + body) OR in `Extra` — confirm format during M5 brainstorm and document.
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/HeaderRedactionTests.cs`.
**Steps:**
1. Failing test: `Authorization: Bearer xyz` in `RequestSummary` becomes `Authorization: <redacted>`.
2. Failing test: case-insensitive match (`authorization` redacted too).
3. Failing test: custom redact-list extension works (operator adds `X-Custom-Token`).
4. Implement.
5. Run: pass.
6. Commit: `feat(auditlog): HTTP header redaction`.
#### M5-T4: Body regex redaction with safety net
**Files:**
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add body-regex stage. Global redactors apply to all bodies; per-target redactors apply to matching `Target`. Patterns precompiled at startup; rejected if compile takes >100ms.
- Safety net: if a regex throws at runtime, replace the body with `<redacted: redactor error>` and increment `AuditRedactionFailure` (M5-T7).
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/BodyRegexRedactionTests.cs`.
**Steps:**
1. Failing test: `"password":"hunter2"` in a JSON body → `"password":"<redacted>"` when the default global redactor pattern matches.
2. Failing test: per-target redactor only applies to matching `Target`.
3. Failing test: a redactor that throws → body becomes `<redacted: redactor error>` AND the counter increments.
4. Failing test: catastrophic backtracking regex rejected at startup.
5. Implement.
6. Run: pass.
7. Commit: `feat(auditlog): body regex redaction with over-redaction safety net`.
#### M5-T5: SQL parameter redaction (per-connection opt-in)
**Files:**
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — for `Channel=DbOutbound` events, parse `Extra.params` and redact parameter VALUES whose NAME matches the connection's configured regex (from `AuditLogOptions.PerTargetOverrides[<connection name>].RedactSqlParamsMatching`).
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/SqlParamRedactionTests.cs`.
**Steps:**
1. Failing test: no opt-in config → params captured verbatim (default behaviour).
2. Failing test: opt-in regex `@apikey|@token` redacts those param VALUES but keeps OTHER param values intact.
3. Failing test: regex applies to parameter NAMES (not values) and is case-insensitive.
4. Implement.
5. Run: pass.
6. Commit: `feat(auditlog): per-connection SQL parameter redaction (opt-in)`.
#### M5-T6: Wire filter into emission paths
**Files:**
- Modify: ESG (M2-T10, M3-T11/12/13, M4-T1/T2), InboundAPI middleware (M4-T7), NotificationOutbox (M4-T4/T5), NotificationService site path (M4-T6) — every emission site receives `IAuditPayloadFilter` from DI and calls `filter.Apply(rawEvent)` before handing to the writer.
- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register `DefaultAuditPayloadFilter` as `IAuditPayloadFilter` singleton.
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/FilterIntegrationTests.cs` — assert each emitter calls through the filter before the writer.
**Steps:**
1. Failing test: ESG emission writes the filter-applied event (not the raw one).
2. Failing test: same for each other emitter.
3. Implement by injecting the filter into each emitter and routing through it.
4. Run: pass.
5. Commit: `feat(auditlog): wire payload filter into all emission paths`.
#### M5-T7: `AuditRedactionFailure` health metric
**Files:**
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` (or equivalent) — add `AuditRedactionFailure` counter.
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — increment on every redactor exception.
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/AuditRedactionFailureMetricTests.cs`.
**Steps:**
1. Failing test: 5 redactor exceptions → counter shows 5.
2. Implement.
3. Run: pass.
4. Commit: `feat(health): AuditRedactionFailure metric`.
#### M5-T8: Configuration test — `appsettings.json` round-trip
**Files:**
- Create: `tests/ScadaLink.AuditLog.Tests/Configuration/AuditLogOptionsBindingTests.cs` — bind a realistic `appsettings.json` block (with header-redact list, body redactors, per-target overrides, retention) and assert values appear in `IOptions<AuditLogOptions>`. Re-bind with a hot-reload simulation and assert filter behaviour changes accordingly.
**Steps:**
1. Failing test: bind + read → matches.
2. Failing test: change config → filter behaviour updates without restart (`IOptionsMonitor` pattern).
3. Implement (likely needs adjusting M1-T9 from `IOptions` to `IOptionsMonitor`).
4. Run: pass.
5. Commit: `feat(auditlog): hot-reloadable AuditLogOptions`.
#### M5-T9: Performance test — hot-path latency budget
**Files:**
- Create: `tests/ScadaLink.PerformanceTests/AuditLog/HotPathLatencyTests.cs` — bench `filter.Apply(event)` for a 4 KB JSON body with the default redactor set; target P95 < 50 µs (number set during M5 brainstorm based on baseline measurements). Also bench `SqliteAuditWriter.WriteAsync` end-to-end target P95 < 500 µs.
**Steps:**
1. Sketch test using BenchmarkDotNet or the existing performance test harness.
2. Run baseline; if over budget, profile + optimise.
3. Commit: `test(auditlog): hot-path latency budget`.
#### M5-T10: Safety-net test — bad regex over-redacts
**Files:**
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/RedactionSafetyNetTests.cs` — register a deliberately bad regex that throws; assert the body is over-redacted (`<redacted: redactor error>`) rather than under-redacted (passing through unmodified).
**Steps:**
1. Failing test.
2. Verify the safety net from M5-T4 covers this.
3. Commit: `test(auditlog): redaction safety net over-redacts on regex failure`.
### M5 — Risk callouts
- **Regex catastrophic backtracking:** validate patterns at startup with a short-running compile test; reject patterns that exceed a timeout. Document the rejection behaviour.
- **Order of stages matters:** truncation BEFORE redaction means a redaction target halfway through the cap could get cut. Confirm the chosen order during M5 brainstorm; current draft applies redaction FIRST, then truncation — that way the redacted-replacement text is what gets truncated, not a half-secret.
- **Body capture format:** decide whether headers travel in `RequestSummary`/`ResponseSummary` or `Extra`. Affects M5-T3's redaction strategy. Lock during the M5 brainstorm.
- **Hot-reload semantics:** `IOptionsMonitor` snapshots — ensure pre-compiled regex cache invalidates when config changes.
---
## M6 — Reconciliation, purge, partition maintenance, health metrics
> **M5 realities to honor:**
> - **IOptionsMonitor pattern** works for hot reload (M5 verified via tests). M6's retention/partition cadence options can use the same pattern.
> - **AuditRedactionFailure counter** is wired SITE-ONLY (M5 Bundle C deferred central wiring here). M6 ships the central wiring as part of "all five new health metrics live".
> - **Filter pattern** is integrated at the THREE writer entry points (FallbackAuditWriter, CentralAuditWriter, AuditLogIngestActor). M6's AuditLogPurgeActor does NOT emit events; it only reads + partitions, so no filter integration required.
> - **`SwitchOutPartitionAsync` blocked by UX_AuditLog_EventId**: per the M1 reality note, M6-T4 must replace the M1 NotSupportedException stub with the drop-and-rebuild dance around the non-aligned unique index (already in the roadmap M6-T4 section).
> - **Partition function pre-seeded with 24 monthly boundaries** (Jan 2026 Dec 2027). M6-T5's partition maintenance must SPLIT a new boundary for the upcoming month.
> - **Site→central gRPC client still NoOpSiteStreamAuditClient**: M6's `SiteAuditReconciliationActor` is naturally the first central component that needs to pull from sites; alternatively, the production push path can ship here. EITHER (a) M6 implements the real `ISiteStreamAuditClient` to enable push telemetry (and reconciliation pulls leverage it bidirectionally), OR (b) M6 implements ONLY reconciliation pull (push stays NoOp until a later milestone). Recommended (a) — push is more general and reconciliation is the catch-up.
**Goal:** Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
**Affected projects:** `AuditLog` (3 new actors: `SiteAuditReconciliationActor`, `AuditLogPurgeActor`, partition-maintenance worker), `Communication` (the `PullAuditEvents` RPC), `HealthMonitoring` (5 new metrics), `ConfigurationDatabase` (partition-roll-forward SQL helper).
**Acceptance criteria:**
- `SiteAuditReconciliationActor` runs every 5 minutes per site; pulls events the site reports as `Pending`; central performs `InsertIfNotExistsAsync` then signals the site to flip those rows to `Reconciled`.
- `AuditLogPurgeActor` runs daily; for each partition older than `RetentionDays`, switches it out to a staging table and drops the staging table. Emits an `AuditLog:Purged` event with rowcount + duration.
- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
- 5 new health metrics published per site: `SiteAuditBacklog` (count + oldest + bytes), `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`; and per central node: `CentralAuditWriteFailures`, `AuditRedactionFailure`.
- Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
### M6 — Tasks (TDD-detail)
#### M6-T1: Extend `sitestream.proto` with `PullAuditEvents` RPC
**Files:**
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `rpc PullAuditEvents(PullAuditEventsRequest) returns (PullAuditEventsResponse);` and the corresponding request/response messages (`sinceUtc`, `batchSize`, `events`, `more_available`).
- Build: regenerate stubs.
- Create: `tests/ScadaLink.Communication.Tests/Protos/PullAuditEventsProtoTests.cs`.
**Steps:**
1. Failing test: round-trip request and response messages.
2. Add proto + rebuild.
3. Run: pass.
4. Commit: `feat(comms): PullAuditEvents RPC for audit reconciliation`.
#### M6-T2: Site-side handler for `PullAuditEvents`
**Files:**
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` (the site-side server inside each site cluster) — handle `PullAuditEvents` by reading `Pending` rows older than `SinceUtc` from `SqliteAuditWriter` (read-only path) and streaming them back. After ack, mark them `Reconciled`.
- Create: `tests/ScadaLink.Communication.Tests/SiteStreamPullAuditEventsTests.cs`.
**Steps:**
1. Failing test: a pull request with N pending rows returns those rows; rows flip to `Reconciled` after the response is acked.
2. Implement.
3. Run: pass.
4. Commit: `feat(comms): site-side PullAuditEvents handler`.
#### M6-T3: `SiteAuditReconciliationActor` — central, timer-driven
**Files:**
- Create: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — central singleton; on a 5-minute timer (configurable), for each known site, ask: "what's your oldest `Pending` row?" If the site reports a non-draining backlog (compared with the previous tick), issue a `PullAuditEvents` and ingest the returned rows via `IAuditLogRepository.InsertIfNotExistsAsync`. Keeps a per-site `LastReconciledAt` cursor.
- Create: `tests/ScadaLink.AuditLog.Tests/Central/SiteAuditReconciliationActorTests.cs`.
**Steps:**
1. Failing test: actor's timer fires every 5 minutes (test via `TestKit` virtual scheduler).
2. Failing test: when site reports non-draining backlog over two consecutive ticks, the actor issues a pull and ingests results.
3. Failing test: idempotency — re-running the pull doesn't double-insert (relies on AuditLog PK).
4. Implement.
5. Run: pass.
6. Commit: `feat(auditlog): SiteAuditReconciliationActor`.
#### M6-T4: `AuditLogPurgeActor` — daily partition-switch purge
> **M1 reality**: `IAuditLogRepository.SwitchOutPartitionAsync` ships in M1 as a `NotSupportedException` stub because the non-aligned `UX_AuditLog_EventId` unique index (necessary for first-write-wins idempotency without including `OccurredAtUtc` in the unique key) blocks `ALTER TABLE … SWITCH PARTITION`. **M6 must replace the stub with the drop-and-rebuild dance**: (1) `DROP INDEX UX_AuditLog_EventId ON dbo.AuditLog;` (2) create the staging table on `[PRIMARY]` with identical schema; (3) `ALTER TABLE dbo.AuditLog SWITCH PARTITION <n> TO dbo.<staging>;` (4) `DROP TABLE dbo.<staging>;` (5) `CREATE UNIQUE NONCLUSTERED INDEX UX_AuditLog_EventId ON dbo.AuditLog(EventId) ON [PRIMARY];`. The small unique-index outage window during the switch is acceptable — partition switches are O(seconds) and `InsertIfNotExistsAsync` callers will see a transient retry surface; document this in the actor.
**Files:**
- Create: `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs` — central singleton; daily timer. For each partition whose latest `OccurredAtUtc` is older than `AuditLogOptions.RetentionDays`, call `IAuditLogRepository.SwitchOutPartitionAsync(partitionBoundary)`. Emit an `AuditLogPurged` event (logged + metricked) with partition range, row count, and duration.
- Modify: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs` — replace the M1 `NotSupportedException` stub with the drop-and-rebuild dance described above. Wrap in a transaction. Add a regression test asserting the unique index is rebuilt and the data left behind matches the un-switched partitions.
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogPurgeActorTests.cs`.
**Steps:**
1. Failing test: with retention = 30 days, partitions older than 30 days are switched out; newer partitions are kept.
2. Failing test: purge emits the `AuditLogPurged` event with correct row count.
3. Failing test: partition switch under the `scadalink_audit_purger` role completes successfully (requires the role to ALSO be granted permission to DROP/CREATE the `UX_AuditLog_EventId` index — extend the role grants in this milestone if not in M1's role definition; M1 granted `ALTER ON SCHEMA::dbo` which should cover this).
4. Failing test: post-switch, `InsertIfNotExistsAsync` continues to enforce first-write-wins (unique index successfully rebuilt).
5. Implement.
6. Run: pass.
7. Commit: `feat(auditlog): AuditLogPurgeActor with partition-switch purge (drop-and-rebuild around UX_AuditLog_EventId)`.
#### M6-T5: `AuditLogPartitionMaintenanceService` — monthly roll-forward
> **M1 reality**: the partition function `pf_AuditLog_Month` ships with 24 explicit monthly boundaries (Jan 2026 through Dec 2027) on filegroup `[PRIMARY]`. M6's hosted service must keep this rolling — split a new boundary for the upcoming month and (if a separate hot/cold filegroup strategy is adopted later) drop oldest boundaries via MERGE after purge.
**Files:**
- Create: `src/ScadaLink.AuditLog/Central/AuditLogPartitionMaintenanceService.cs``IHostedService` that runs on startup AND every month: ensures the next month's partition range exists on `pf_AuditLog_Month` and the partition scheme has a destination filegroup. Implemented via raw SQL (`ALTER PARTITION FUNCTION pf_AuditLog_Month SPLIT RANGE (<next-month-boundary>)`); ensure the scheme stays `ALL TO ([PRIMARY])` unless production deployment overrides per-filegroup.
- Create: `tests/ScadaLink.AuditLog.Tests/Central/PartitionMaintenanceServiceTests.cs` (integration via `MsSqlMigrationFixture`; runs against a temp DB).
**Steps:**
1. Failing test: against a DB seeded with the M1 migration (covering through Dec 2027), running the service in Apr 2028 splits a Jan 2028 boundary so the function has a range for "current month + at least the next month".
2. Implement.
3. Failing test: subsequent monthly runs add successive future boundaries (idempotent: already-split boundaries are no-ops, not errors).
4. Run: pass.
5. Commit: `feat(auditlog): partition maintenance HostedService (SPLIT RANGE roll-forward)`.
#### M6-T6: Health metric `SiteAuditBacklog`
**Files:**
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — expose `GetBacklogStatsAsync()` returning `(pendingCount, oldestPendingUtc, onDiskBytes)`.
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add `SiteAuditBacklog` metric (3-tuple), populated per site-health-report tick.
- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditBacklogMetricTests.cs`.
**Steps:**
1. Failing test: with 100 pending rows in SQLite, the metric reports `pendingCount=100`.
2. Failing test: oldest pending age is reported in seconds since `OccurredAtUtc`.
3. Failing test: on-disk bytes ≈ SQLite file size.
4. Implement.
5. Run: pass.
6. Commit: `feat(health): SiteAuditBacklog metric (count + age + bytes)`.
#### M6-T7: Health metric `SiteAuditTelemetryStalled`
**Files:**
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add boolean `SiteAuditTelemetryStalled`.
- Modify: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — set the flag when reconciliation detects a non-draining backlog over two consecutive cycles.
- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditTelemetryStalledTests.cs`.
**Steps:**
1. Failing test: two consecutive non-draining cycles → flag set.
2. Failing test: a subsequent draining cycle → flag cleared.
3. Implement.
4. Run: pass.
5. Commit: `feat(health): SiteAuditTelemetryStalled flag`.
#### M6-T8: Health metric `CentralAuditWriteFailures`
**Files:**
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — add `CentralAuditWriteFailures` counter.
- Modify: every `ICentralAuditWriter` call site (Inbound API middleware M4-T7, NotificationOutboxActor M4-T4/T5) — increment on caught exceptions.
- Create: `tests/ScadaLink.HealthMonitoring.Tests/CentralAuditWriteFailuresTests.cs`.
**Steps:**
1. Failing test: 3 forced central direct-write failures → counter reports 3.
2. Implement.
3. Run: pass.
4. Commit: `feat(health): CentralAuditWriteFailures metric`.
#### M6-T9: Surface `AuditRedactionFailure` in central health
**Files:**
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — register the counter created in M5-T7 so it appears in the central health report payload.
- Create: `tests/ScadaLink.HealthMonitoring.Tests/AuditRedactionFailureSurfacingTests.cs`.
**Steps:**
1. Failing test: incrementing the counter is visible in the next central health snapshot.
2. Implement.
3. Run: pass.
4. Commit: `feat(health): surface AuditRedactionFailure in central health`.
#### M6-T10: Integration test — central outage + reconciliation recovery
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/OutageReconciliationTests.cs` — site + central; simulate a 5-minute central gRPC outage; during outage, site emits 200 events; restore central; assert reconciliation pulls catch up within one cycle and all 200 events land in central AuditLog with no duplicates.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): outage + reconciliation recovery end-to-end`.
#### M6-T11: Integration test — partition switch purge
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionPurgeTests.cs` — pre-populate AuditLog with rows in three monthly partitions (one older than retention, two newer); trigger `AuditLogPurgeActor`; assert the oldest partition's rows are gone and newer partitions are untouched.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): partition-switch purge end-to-end`.
#### M6-T12: Integration test — partition maintenance roll-forward
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionMaintenanceTests.cs` — assert that after `AuditLogPartitionMaintenanceService` runs, the partition function covers the next month's range.
**Steps:**
1. Sketch, iterate, commit: `test(auditlog): partition maintenance roll-forward end-to-end`.
### M6 — Risk callouts
- **Partition switch on a live table:** SQL Server `ALTER TABLE ... SWITCH PARTITION` is metadata-only when source and target match in structure and filegroup; verify with a load test that ingest isn't paused during purge.
- **Pull cadence vs ingest rate:** a site producing >`BatchSize`/5s sustained may never let telemetry catch up — reconciliation must close the gap. The non-draining detection in M6-T3 is the safety net.
- **Site SQLite `ForwardState` flip after reconciliation:** must be atomic with the central ack; otherwise a site crash mid-flip can re-send rows (idempotent at central, harmless but worth noting).
- **HostedService scheduling:** ensure the partition maintenance service runs on the ACTIVE central node only (not both — would cause SQL errors trying to add the same range twice).
---
## M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
> **M6 realities to honor:**
> - **`IAuditCentralHealthSnapshot`** (M6 Bundle E) exists as the central aggregator for `CentralAuditWriteFailures`, `AuditRedactionFailure`, and per-site `SiteAuditTelemetryStalled`. M7's Health dashboard tiles should read this snapshot.
> - **`SiteHealthReport.SiteAuditBacklog`** (`SiteAuditBacklogSnapshot` — count + age + bytes) is on the existing per-site report. M7's per-site tiles can surface this without new wiring.
> - **`IAuditLogRepository.QueryAsync`** (M1 Bundle D) is the data source for the Audit Log page; uses keyset paging on (OccurredAtUtc desc, EventId desc).
> - **`IAuditLogRepository.GetPartitionBoundariesOlderThanAsync`** (M6 Bundle C) — surfaces existing partitions; M7 Export feature could leverage but isn't required.
> - **Pre-existing `Components/Pages/Monitoring/AuditLog.razor`** (the IAuditService config-change viewer from before M1) must be renamed in code to `ConfigurationAuditLog.razor` with URL `/audit/configuration` — the doc-renaming was completed pre-M1 but the .razor file rename hasn't been verified.
> - **Permissions**: `OperationalAudit` (read) and `AuditExport` (export) permission strings need to exist in the security model — verify before M7.
> - **Real gRPC pull client still deferred from M6** — M7 doesn't depend on it.
**Goal:** User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
**Affected projects:** `CentralUI`, `CentralUI.Tests`, `CentralUI.PlaywrightTests`.
**Acceptance criteria:**
- New `Components/Pages/Audit/AuditLogPage.razor` exists; new "Audit" nav group sibling to Notifications.
- All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per `Component-AuditLog.md` §10.
- Existing `Components/Pages/Monitoring/AuditLog.razor` (the IAuditService config-change viewer) **renamed in code** to `ConfigurationAuditLog.razor`, with URL `/audit/configuration` to match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added.
- 3 KPI tiles added to the Health dashboard; data sourced from `HealthMonitoring`.
- Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on `ApiInbound` rows, drill-in from Notifications to filtered Audit Log.
- `OperationalAudit` read permission gating + `AuditExport` for the Export button.
### M7 — Tasks (TDD-detail)
#### M7-T1: New `AuditLogPage.razor` scaffold + route + Audit nav group
**Files:**
- Create: `src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor` + `.razor.cs` + `.razor.css`. Route `/audit/log`. Empty body for now beyond `<h1>Audit Log</h1>`.
- Modify: `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor` (or equivalent) — add a new top-level **Audit** nav group sibling to Notifications, containing this page.
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPageScaffoldTests.cs` — Blazor component test (bUnit if it's used in the codebase; else Playwright).
**Steps:**
1. Failing test: navigating to `/audit/log` renders the page (heading present).
2. Failing test: nav menu shows the Audit group.
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): scaffold Audit Log page + Audit nav group`.
#### M7-T2: `<AuditFilterBar>` component
**Files:**
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor` + `.razor.cs` — 10 filter elements per `Component-AuditLog.md` §10. Multi-select chips for Channel/Kind/Status/Site (Bootstrap custom; NO third-party UI library). Time-range relative dropdown + custom date picker. Text search for Instance/Script/Target/Actor/CorrelationId. "Errors only" toggle.
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditFilterBarTests.cs`.
**Steps:**
1. Failing test: rendering shows all 10 elements.
2. Failing test: selecting filters and clicking "Apply" raises a `FilterChanged` event with the right `AuditQuery` payload.
3. Failing test: Kind options narrow when Channels are selected.
4. Implement.
5. Run: pass.
6. Commit: `feat(ui): AuditFilterBar component`.
#### M7-T3: `<AuditResultsGrid>` component with keyset paging
**Files:**
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor` + `.razor.cs` — custom Bootstrap table (no third-party grid). 10 columns per `Component-AuditLog.md`. Resizable + reorderable + persistable-per-user (persistence via existing user-settings store).
- Keyset paging via `(OccurredAtUtc desc, EventId desc)` cursor; default page 100.
- Data source: server-side via `IAuditLogRepository.QueryAsync` (M1-T8). Wire through a `IAuditLogQueryService` (new) that the page injects.
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditResultsGridTests.cs`.
**Steps:**
1. Failing test: grid renders rows from a stub query service; columns match the documented set.
2. Failing test: clicking "next page" calls the service with the keyset cursor of the last row.
3. Failing test: column reordering persists across navigations (user-settings).
4. Failing test: row click emits a `RowSelected` event with the selected `AuditEvent`.
5. Implement.
6. Run: pass.
7. Commit: `feat(ui): AuditResultsGrid with keyset paging`.
#### M7-T4: `<AuditDrilldownDrawer>` — JSON pretty-print
**Files:**
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` + `.razor.cs` — slide-in drawer triggered by `RowSelected`. Renders all fields of the selected `AuditEvent`. JSON detection: if `RequestSummary` or `ResponseSummary` is valid JSON, pretty-print with indentation.
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditDrilldownDrawerJsonTests.cs`.
**Steps:**
1. Failing test: opening drawer with an event whose `RequestSummary` is valid JSON renders an indented version.
2. Failing test: non-JSON body renders verbatim.
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): drilldown drawer JSON pretty-print`.
#### M7-T5: Drilldown — SQL syntax highlighting
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel=DbOutbound` events, treat `RequestSummary` as SQL; apply syntax highlighting via a lightweight client-side library (Prism.js or Highlight.js if already in the project; else a small custom highlighter — confirm during M7 brainstorm).
- Modify: `src/ScadaLink.CentralUI/wwwroot/` — add the highlighter assets if needed.
**Steps:**
1. Failing test: a `DbOutbound` event's `RequestSummary` is rendered inside a `<code class="language-sql">` block.
2. Implement.
3. Run: pass.
4. Commit: `feat(ui): drilldown SQL syntax highlighting`.
#### M7-T6: Drilldown — "Copy as cURL" for ApiOutbound / ApiInbound
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel ∈ {ApiOutbound, ApiInbound}` events, render a "Copy as cURL" button. Clicking generates a cURL command from the event's URL/headers/body and copies to clipboard via `IJSRuntime`.
**Steps:**
1. Failing test: button appears only for HTTP-bearing events.
2. Failing test: clicking generates the correct cURL string (verified against a known event fixture).
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): drilldown Copy as cURL action`.
#### M7-T7: Drilldown — "Show all events for this operation"
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — when the event has a non-null `CorrelationId`, render a link "Show all events for this operation" that re-applies the page's filter set with `CorrelationId = <value>` (other filters cleared).
**Steps:**
1. Failing test: link appears only when CorrelationId is non-null.
2. Failing test: clicking re-navigates to the Audit Log page with the filter applied.
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): drilldown "Show all events" by CorrelationId`.
#### M7-T8: Drilldown — redaction indicators
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` — wherever a payload contains the string `<redacted>` or `<redacted: redactor error>`, render a small badge indicating the field was redacted. Show a tooltip linking to "Payload Capture Policy" in the Component-AuditLog docs.
**Steps:**
1. Failing test: a payload with `<redacted>` shows the badge.
2. Implement.
3. Run: pass.
4. Commit: `feat(ui): drilldown redaction indicators`.
#### M7-T9: Rename `AuditLog.razor` → `ConfigurationAuditLog.razor`
**Files:**
- Rename: `src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor``Components/Pages/Audit/ConfigurationAuditLog.razor`.
- Update: the file's `@page` directive to `/audit/configuration`.
- Update: all `<NavLink>` and any other inbound references to the old path.
- Update: tests referencing the old name.
- Modify: nav menu — sit `ConfigurationAuditLog` under the Audit group as a sibling to the new Audit Log page.
**Steps:**
1. Failing test: navigating to `/audit/configuration` renders the (renamed) page.
2. Failing test: the old `/monitoring/auditlog` returns 404 (or a redirect — choose during M7 brainstorm; redirect is safer for any external bookmarks).
3. Implement rename + path updates.
4. Run: pass.
5. Commit: `refactor(ui): rename Audit Log Viewer to Configuration Audit Log Viewer`.
#### M7-T10: Drill-in from Notifications page
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` (or row-action panel) — add "View audit history" action to each row. Navigates to `/audit/log?correlationId={NotificationId}`.
**Steps:**
1. Failing test: row action exists.
2. Failing test: click navigates with the right query string.
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): drill-in from Notifications to Audit Log`.
#### M7-T11: Drill-in from Site Calls page
**Files:**
- Modify: the Site Calls listing page (or create one if missing — defer to a follow-up if it doesn't exist yet — Site Call Audit #22 UI work is mostly out of scope here). For M7 acceptance: drill-in only required from pages that exist.
- If the page exists, mirror M7-T10's pattern with `?correlationId={TrackedOperationId}`.
**Steps:**
1. Conditional on page existence — confirm during M7 brainstorm.
2. Implement.
3. Commit: `feat(ui): drill-in from Site Calls to Audit Log`.
#### M7-T12: Drill-in from External Systems / Inbound API Keys / Sites / Instances detail pages
**Files:**
- Modify (per page): External Systems detail, Inbound API Keys detail, Sites detail, Instances detail. Each gets a "Recent activity" / "Recent calls" / "Audit feed" link or tab navigating to `/audit/log` with the appropriate pre-filter (`target=<system>` / `actor=<key name> AND channel=ApiInbound` / `site=<site>` / `instance=<instance>`).
- Tests: one per drill-in.
**Steps:**
1. Failing tests per page.
2. Implement.
3. Run: pass.
4. Commit: `feat(ui): drill-ins from detail pages to Audit Log`.
#### M7-T13: 3 KPI tiles on the Health dashboard
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Health/HealthDashboard.razor` (or equivalent) — add three tiles under a new "Audit" group: Audit volume, Audit error rate, Audit backlog. Data fed from the metrics defined in M5-T7 and M6-T6/T7/T8/T9.
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/Health/AuditKpiTilesTests.cs`.
**Steps:**
1. Failing test: tiles render with stub data; clicking each navigates to the relevant Audit Log filtered view (or to a per-site breakdown for the backlog tile).
2. Implement.
3. Run: pass.
4. Commit: `feat(ui): Audit KPI tiles on Health dashboard`.
#### M7-T14: Server-side CSV export streaming
**Files:**
- Create: `src/ScadaLink.CentralUI/Services/AuditLogExportService.cs` — accepts the current filter, streams server-side CSV via `IAuditLogRepository.QueryAsync` paged enumeration; writes to the HTTP response without buffering the whole result in memory.
- Modify: `AuditLogPage.razor` — Export button calls the service. Requires `AuditExport` permission (M7-T15).
- Create: `tests/ScadaLink.CentralUI.Tests/Services/AuditLogExportServiceTests.cs`.
**Steps:**
1. Failing test: exporting 10,000 rows streams as CSV; memory usage stays bounded.
2. Failing test: default cap of 100k rows enforced; larger requests get a "use the CLI" error.
3. Implement.
4. Run: pass.
5. Commit: `feat(ui): server-side streaming CSV export of Audit Log`.
#### M7-T15: `OperationalAudit` + `AuditExport` permission gating
**Files:**
- Modify: `src/ScadaLink.Security/` (or wherever the role/permission model lives) — add `OperationalAudit` and `AuditExport` permissions; map them to the Audit role (existing) by default.
- Modify: `AuditLogPage.razor` — gate page access on `OperationalAudit`; gate the Export button on `AuditExport`.
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPagePermissionTests.cs`.
**Steps:**
1. Failing test: a user without `OperationalAudit` gets a 403 / hidden page.
2. Failing test: a user with `OperationalAudit` but no `AuditExport` can read but Export button is hidden.
3. Implement permission checks.
4. Run: pass.
5. Commit: `feat(security): OperationalAudit + AuditExport permissions for the Audit Log surface`.
#### M7-T16: Playwright E2E tests
**Files:**
- Create: `tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs` — covers: filter narrowing, drilldown drawer JSON pretty-print, "Copy as cURL" on ApiInbound, drill-in from Notifications to filtered Audit Log, CSV export end-to-end, permission gating.
**Steps:**
1. Sketch tests using the existing Playwright harness.
2. Iterate until all green.
3. Commit: `test(ui): Audit Log Playwright E2E coverage`.
### M7 — Risk callouts
- **Custom data grid scope:** keyset paging + reorderable columns + per-user persistence is non-trivial. Bench the existing `NotificationReport.razor` grid to see whether it can be generalised vs forking it. Decision during M7 brainstorm.
- **SignalR + large drawer payloads:** the drilldown payload (up to 64 KB on errors) is rendered server-side via SignalR. Confirm `MaxRecvMessageSize` is large enough; bump if needed.
- **Permission infrastructure assumptions:** confirm during M7 brainstorm that the codebase already supports per-permission gates at the page level, not just role-level. If only role-level, fall back to gating via the existing Audit role with a feature flag for the export.
- **The rename to `ConfigurationAuditLog.razor`** breaks any external bookmarks. Decide redirect vs 404 explicitly during M7 brainstorm.
---
## M8 — CLI: `scadalink audit query | export | verify-chain`
**Goal:** Operator surface for the centralized Audit Log.
**Affected projects:** `CLI`, `CLI.Tests`, `ManagementService` (new HTTP endpoint), `IntegrationTests`.
**Acceptance criteria:**
- `scadalink audit query` mirrors the UI filter set; results stream as JSON (default) or table.
- `scadalink audit export` streams server-side to CSV / JSONL / Parquet; requires `AuditExport` permission.
- `scadalink audit verify-chain --month YYYY-MM` is a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).
- Existing `audit-log query` (IAuditService config-change viewer) **renamed** in code to `audit-config query` to disambiguate; old name kept as a deprecated alias for one minor version.
- Permissions: `audit query` and `audit verify-chain` require `OperationalAudit`; `audit export` additionally requires `AuditExport`.
### M8 — Tasks (TDD-detail)
#### M8-T1: Create `AuditCommands.cs` (separate from existing `AuditLogCommands.cs`)
**Files:**
- Create: `src/ScadaLink.CLI/Commands/AuditCommands.cs``static AuditCommands { public static Command Build() }` following the System.CommandLine pattern from `AuditLogCommands.cs:153`. Sets up the `audit` parent command with three subcommands (T2/T3/T4).
- Modify: `src/ScadaLink.CLI/Program.cs` — register `AuditCommands.Build()` alongside the existing command groups.
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditCommandsScaffoldTests.cs`.
**Steps:**
1. Failing test: `scadalink audit --help` lists three subcommands (query, export, verify-chain).
2. Implement.
3. Run: pass.
4. Commit: `feat(cli): scaffold scadalink audit command group`.
#### M8-T2: `audit query` subcommand
**Files:**
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `query` subcommand with the flag set matching the Central UI Audit Log filter set (post-Bundle-D fix): `--since`, `--until`, `--channel`, `--kind`, `--status`, `--site`, `--instance`, `--target`, `--actor`, `--correlation-id`, `--errors-only`, `--page`, `--page-size`. Output JSON by default; `--format table` opt-in.
- Create: `src/ScadaLink.Commons/Messages/Cli/QueryAuditLogCommand.cs` (or wherever the CLI↔Management messages live — confirm via repo).
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs`.
**Steps:**
1. Failing test: parsing the documented flag set produces a `QueryAuditLogCommand` with the expected fields.
2. Failing test: `--format table` switches the output formatter.
3. Failing test: unknown flag returns non-zero exit code with a helpful error.
4. Implement.
5. Run: pass.
6. Commit: `feat(cli): scadalink audit query subcommand`.
#### M8-T3: `audit export` subcommand
**Files:**
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `export` subcommand with flags `--since` (required), `--until` (required), `--format csv|jsonl|parquet` (required), `--output <path>` (required), `--channel`, `--kind`, `--status`, `--site`, `--target`, `--actor`.
- Create: `src/ScadaLink.Commons/Messages/Cli/ExportAuditLogCommand.cs`.
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditExportCommandTests.cs`.
**Steps:**
1. Failing test: missing required flag returns helpful error.
2. Failing test: valid invocation creates an `ExportAuditLogCommand` with all fields.
3. Failing test: streams results to `--output`; doesn't buffer entire export in memory (test with 100k+ rows).
4. Implement.
5. Run: pass.
6. Commit: `feat(cli): scadalink audit export subcommand (csv|jsonl|parquet)`.
#### M8-T4: `audit verify-chain` subcommand (no-op stub)
**Files:**
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `verify-chain --month <YYYY-MM>` subcommand. In v1, returns a documented "hash chain not yet enabled in this release; see Component-AuditLog.md Security & Tamper-Evidence for the v1.x roadmap" message with exit code 0.
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditVerifyChainCommandTests.cs`.
**Steps:**
1. Failing test: `scadalink audit verify-chain --month 2026-05` exits 0 with the documented message.
2. Failing test: malformed month string (e.g., `2026-13`) exits non-zero with a parse error.
3. Implement.
4. Run: pass.
5. Commit: `feat(cli): scadalink audit verify-chain subcommand (v1 no-op)`.
#### M8-T5: ManagementService HTTP endpoints
**Files:**
- Modify: `src/ScadaLink.ManagementService/Controllers/AuditController.cs` (new) — REST endpoints `GET /api/audit/query` (paged) and `GET /api/audit/export` (streaming). Both gated on `OperationalAudit` / `AuditExport` permissions (matching the UI's permission split from M7-T15).
- Create: `tests/ScadaLink.ManagementService.Tests/Controllers/AuditControllerTests.cs`.
**Steps:**
1. Failing test: `GET /api/audit/query` with valid params returns JSON page of audit events.
2. Failing test: `GET /api/audit/export` streams CSV/JSONL/Parquet without buffering.
3. Failing test: a request without `OperationalAudit` returns 403.
4. Failing test: `/export` without `AuditExport` returns 403.
5. Implement.
6. Run: pass.
7. Commit: `feat(mgmt): /api/audit/{query,export} endpoints with permission gates`.
#### M8-T6: Output formatters (JSON + table)
**Files:**
- Modify: `src/ScadaLink.CLI/Output/` — add an `AuditEventTableFormatter` that renders results as an aligned table with sensible defaults (truncate long fields with `…`).
- The JSON formatter follows existing CLI patterns (one event per line for streaming, or array for paged results — confirm during M8 brainstorm).
- Create: `tests/ScadaLink.CLI.Tests/Output/AuditEventFormatterTests.cs`.
**Steps:**
1. Failing test: table format includes columns: OccurredAtUtc, Channel, Kind, Status, Target, Actor, DurationMs.
2. Failing test: JSON format is one event per line.
3. Implement.
4. Run: pass.
5. Commit: `feat(cli): JSON + table formatters for audit events`.
#### M8-T7: Rename existing `audit-log query` → `audit-config query` with deprecation alias
**Files:**
- Modify: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs` — rename the top-level command from `audit-log` to `audit-config` (clearer disambiguation from the new `audit` group). Add an alias `audit-log` that prints a deprecation warning and forwards to `audit-config` for one minor version.
- Modify: `src/ScadaLink.CLI/README.md` and CLI help text to document the rename and the deprecation timeline.
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditConfigDeprecationTests.cs`.
**Steps:**
1. Failing test: `scadalink audit-config query --user alice` works.
2. Failing test: `scadalink audit-log query --user alice` works but emits a deprecation warning to stderr.
3. Failing test: `scadalink audit query --since ...` (the NEW operational command) and `scadalink audit-config query --user ...` (the renamed config command) are clearly different surfaces and do not conflict.
4. Implement.
5. Run: pass.
6. Commit: `refactor(cli): rename audit-log → audit-config with deprecation alias`.
#### M8-T8: CLI README + help text updates
**Files:**
- Modify: `src/ScadaLink.CLI/README.md` — document the new `audit` group, the renamed `audit-config` group, the permission requirements, the `verify-chain` no-op note, and the CLI ↔ UI filter parity.
- Modify: each subcommand's `--help` description for clarity.
**Steps:**
1. Inline doc edits.
2. Verify `scadalink audit --help` and `scadalink audit-config --help` produce the documented output.
3. Commit: `docs(cli): document new scadalink audit group and audit-config rename`.
#### M8-T9: CLI integration test — end-to-end query + export
**Files:**
- Create: `tests/ScadaLink.IntegrationTests/Cli/AuditCliEndToEndTests.cs` — boots central with a populated AuditLog table; invokes `scadalink audit query --since ...` against the running ManagementService; asserts results match the database. Same for export.
**Steps:**
1. Sketch test using existing IntegrationTests harness.
2. Iterate until all flag combinations work end-to-end.
3. Commit: `test(cli): scadalink audit end-to-end against running ManagementService`.
### M8 — Risk callouts
- **Operator script breakage from the `audit-log` rename:** the deprecation alias is the safety net but only for one minor version; document the deprecation timeline clearly in the CLI README. Coordinate with anyone running `audit-log` in CI/cron.
- **Parquet output:** requires a Parquet writer library. If one isn't already in `Directory.Packages.props`, add the smallest viable dependency (`ParquetSharp` or `Parquet.Net`). Decide during M8 brainstorm.
- **Streaming export from CLI:** the CLI invokes the ManagementService HTTP endpoint, which itself streams. Confirm `HttpClient.SendAsync` with `HttpCompletionOption.ResponseHeadersRead` is used so the CLI doesn't buffer the whole response.
- **Permission model parity:** ensure the CLI's permission errors mirror the UI's (HTTP 403 → CLI exit code 2 with a clear message).
---
## Cross-cutting concerns (apply at every milestone)
- **Branching:** every milestone gets its own `feature/audit-log-mN-<slice>` branch; merged with `--no-ff` to `main` on milestone completion. No pushes without explicit user authorization.
- **Tests:** Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
- **Commits:** small and frequent. Bite-sized per writing-plans skill.
- **Reviews:** per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
- **Docs:** if implementation reveals a design gap, fix the design doc FIRST (in `docs/requirements/Component-AuditLog.md` and/or `alog.md`), commit, then implement. Don't let the code and docs drift.
- **Infra:** the 3 `infra/*` working-tree modifications still uncommitted on `main` are unrelated and stay that way unless the user explicitly addresses them. Use explicit `git add <path>` throughout, never `git commit -am`.
---
## Per-milestone execution flow (template)
When a milestone is about to start, run this sequence:
1. **Brainstorm**: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
2. **Writing-plans**: produce a milestone-specific plan with TDD detail per task — saved to `docs/plans/2026-XX-XX-auditlog-mN-<slice>.md` + peer `.tasks.json`.
3. **Subagent-driven execution**: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to `main` with `--no-ff`.
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.