M5 head records M4 realities: - AuditingDbConnection/Command/DataReader decorators need filter plug-in at WriteAsync emission point. - CentralAuditWriter + FallbackAuditWriter are both filter integration points for the direct-write + chained-write paths. - InboundAPI middleware RequestSummary populated, ResponseSummary=null pending response-body buffering decision in M5. - UseWhen(/api/) path-scoped middleware gives natural per-target redaction hook. - Error-row cap raised on Status IN (Failed, Parked, Discarded, Attempted, Skipped) per M1 vocab reconciliation.
1634 lines
115 KiB
Markdown
1634 lines
115 KiB
Markdown
# Audit Log (#23) Code Implementation Roadmap
|
||
|
||
> **For Claude:** REQUIRED SUB-SKILL FLOW per milestone: `brainstorming` → `writing-plans` → `subagent-driven-development`. Use `docs/requirements/Component-AuditLog.md` + `alog.md` as the spec; this document is the roadmap that sequences milestones and locks acceptance criteria for each. **M1 carries full TDD-level task detail; M2–M8 are milestone-shape detail and will be expanded into bite-sized plans by their own writing-plans pass when their turn comes.**
|
||
|
||
**Goal:** Implement central component #23 Audit Log — append-only forensic + operational record across every script-trust-boundary action — into the existing ScadaLink codebase.
|
||
|
||
**Architecture:** Layered alongside (not replacing) the future Notifications/SiteCalls operational stores. Site-local SQLite hot-path append + gRPC telemetry batches + reconciliation pulls; central direct-write for Inbound API and Notification Outbox dispatch; monthly-partitioned MS SQL with single global retention; strict append-only enforced via DB roles. See `alog.md` for the locked design decisions and `Component-AuditLog.md` for the component spec.
|
||
|
||
**Tech Stack:** Akka.NET (clustering, singletons, ClusterClient), EF Core (MS SQL provider, code-first migrations), Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, gRPC (HTTP/2 server-streaming on the existing `SiteStream` channel), ASP.NET Core (Inbound API middleware), Blazor Server + Bootstrap (Central UI), System.CommandLine (CLI), xUnit + Akka.TestKit.Xunit2 + NSubstitute (tests).
|
||
|
||
**Spec:** `/Users/dohertj2/Desktop/scadalink-design/alog.md` (validated, immutable; commit `fec0bb1`). Component design at `/Users/dohertj2/Desktop/scadalink-design/docs/requirements/Component-AuditLog.md`.
|
||
|
||
---
|
||
|
||
## Codebase Reality Check (what already exists)
|
||
|
||
- **All 22 prior components have source + tests.** Audit Log slots in as a new `src/ScadaLink.AuditLog/` project plus changes to: Commons, ConfigurationDatabase, Communication (proto), Host (DI + actor registration), ExternalSystemGateway, InboundAPI, NotificationOutbox, HealthMonitoring, CentralUI, CLI, SiteRuntime (audit hook surface).
|
||
- **Existing patterns to copy from:**
|
||
- Singleton wiring: `src/ScadaLink.Host/Actors/AkkaHostedService.cs:272–280` (NotificationOutboxActor) — `ClusterSingletonManager.Props` + manager/proxy pair.
|
||
- EF migration: `src/ScadaLink.ConfigurationDatabase/Migrations/20260519050659_AddNotificationsTable.cs` — table create + indexes; **no partitioning yet — Audit Log will be the first.**
|
||
- Site SQLite hot-path: `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98` — single connection, write lock, Channel-based background writer.
|
||
- Site-buffer + forwarder: `src/ScadaLink.StoreAndForward/` — `StoreAndForwardStorage` + `NotificationForwarder` show the Pending → Forwarded transition we'll mirror.
|
||
- Actor + repo + test trio: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` and `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorIngestTests.cs:20` — TestKit base class, NSubstitute repo, `Sys.ActorOf`, `ExpectMsg<T>`.
|
||
- gRPC additive: `src/ScadaLink.Communication/Protos/sitestream.proto` — currently carries only `AttributeValueUpdate` and `AlarmStateUpdate` in a `oneof`; we extend it.
|
||
- CLI command shape: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs:1–53` — System.CommandLine pattern; new group will live alongside it (the file's existing commands are for the IAuditService config audit and stay).
|
||
- Blazor listing page: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` — filter bar + keyset paging + status badges idiom.
|
||
- **`AuditLog.razor` and `AuditLogCommands.cs` already exist** but they're the **IAuditService config-change viewer**. Per the design pass we renamed them in docs to "Configuration Audit Log Viewer"; in code they'll be renamed (file + URL + command name) so the new operational Audit Log can take the unqualified name.
|
||
- **Test framework:** xUnit + Akka.TestKit.Xunit2 + NSubstitute. Integration tests under `tests/ScadaLink.IntegrationTests/`. Playwright UI tests under `tests/ScadaLink.CentralUI.PlaywrightTests/`. A `tests/ScadaLink.PerformanceTests/` exists for load tests.
|
||
|
||
---
|
||
|
||
## Prerequisite: Site Call Audit (#22) + cached-call tracking are NOT implemented in code
|
||
|
||
The design for both is merged on `main` (`alog.md` cached-call tracking section; `Component-SiteCallAudit.md`), but `grep` finds zero references to `TrackedOperationId` or `CachedCallTelemetry` in `src/`. This matters because **M3 (cached operations + dual-write transaction) cannot be built without them**.
|
||
|
||
**Three ways to handle this — pick before M3:**
|
||
|
||
1. **Inline into M3 (Recommended):** Implement just enough of Site Call Audit (#22) and cached-call tracking inside M3 — specifically the `CachedCallTelemetry` message, the operational-tracking SQLite table at sites, the `SiteCalls` table + repo + `SiteCallAuditActor` skeleton at central. This makes M3 the biggest milestone but ships a coherent slice (cached calls audited end-to-end).
|
||
2. **M0 prerequisite milestone:** Implement #22 and cached-call tracking as a separate slice before M3 starts. Cleanest dependency story; slowest to first-audit-row.
|
||
3. **Ship Audit Log sync-only first, retrofit cached path later:** M1, M2, M4 (sync-only emissions), M5, M6 (no cached features), M7, M8 ship as-is; cached audit is a separate follow-up. Lowest first-shippable scope but leaves cached calls unaudited until much later.
|
||
|
||
**Default choice in this roadmap: (1).** M3 absorbs the minimum #22 + cached-call tracking surface needed to make combined telemetry work; the rest of #22 (full reconciliation, KPIs, Retry/Discard relay) can be a follow-up.
|
||
|
||
---
|
||
|
||
## Milestone index
|
||
|
||
| M | Title | Ships | Touches | Depends on |
|
||
|---|---|---|---|---|
|
||
| **M1** | Foundation: schema, types, DB roles, partitioning | Migration deployed; Commons types exist; no observable behavior yet. | Commons, ConfigurationDatabase, ConfigurationDatabase.Tests | — |
|
||
| **M2** | Site pipeline (sync-only path) | One emission path end-to-end (ESG sync `Call()` audited from script to central row). | Commons, AuditLog (new), Communication (proto), Host, ExternalSystemGateway, all Tests projects, IntegrationTests | M1 |
|
||
| **M3** | Cached operations + dual-write transaction | Cached external calls and DB writes audited; SiteCalls table populated alongside; combined telemetry packet contract live. | Commons, AuditLog, SiteCallAudit (new), ConfigurationDatabase, ExternalSystemGateway, StoreAndForward, Host | M2; #22 + cached-call tracking inlined here per the prerequisite section |
|
||
| **M4** | Remaining boundary emission | All four channels emitting: sync DB writes/reads, Notify dispatcher attempt/terminal, Inbound API middleware. | ExternalSystemGateway, InboundAPI, NotificationOutbox, SiteRuntime (Database surface) | M2; M3 (NotificationOutbox terminal/attempt uses ICentralAuditWriter pattern) |
|
||
| **M5** | Payload + redaction policy | Header redaction, body redactor regex, SQL parameter redaction, safety net, configuration binding. | AuditLog, ExternalSystemGateway, InboundAPI, all emitter projects | M2 |
|
||
| **M6** | Reconciliation, purge, partition maintenance, health metrics | Self-healing telemetry, monthly partition switch, the five new health metrics + their dashboard tiles. | AuditLog, ConfigurationDatabase (partition maintenance), HealthMonitoring | M2, M3 |
|
||
| **M7** | Central UI — new Audit Log page + drill-ins + KPI tiles | User-visible Audit Log surface; existing `AuditLog.razor` renamed to ConfigurationAuditLog. | CentralUI, CentralUI.Tests, CentralUI.PlaywrightTests | M2, M4, M6 |
|
||
| **M8** | CLI — `scadalink audit query / export / verify-chain` | Operator surface for query/export; `verify-chain` is a no-op stub until v1.x hash chain ships. | CLI, ManagementService (HTTP endpoint), CLI.Tests, IntegrationTests | M2 |
|
||
|
||
**Ship-state at end of each milestone is the shippable slice** — each milestone leaves the system in a working, testable, deployable state (no half-built actors mid-pipeline). M1 ships no user-visible behaviour but produces a clean foundation; from M2 onward each ships an observable audit capability.
|
||
|
||
**Critical path:** M1 → M2 → (M3 ∥ M4 ∥ M5) → M6 → (M7 ∥ M8). M3, M4, M5 can overlap once M2 is solid. M7 and M8 can overlap once M6 lands.
|
||
|
||
---
|
||
|
||
## M1 — Foundation: schema, types, DB roles, partitioning
|
||
|
||
**Goal:** Land the new `AuditLog` table (partitioned) and DB roles in MS SQL, plus the Commons types every later milestone needs. After M1 the database is ready and types compile; nothing else changes.
|
||
|
||
**Affected projects:**
|
||
- `src/ScadaLink.Commons/` — entity, enums, interfaces, message DTOs.
|
||
- `src/ScadaLink.ConfigurationDatabase/` — EF mapping, DbContext registration, migration, DB role script, partition function/scheme, retention options.
|
||
- `tests/ScadaLink.Commons.Tests/` — enum + record tests.
|
||
- `tests/ScadaLink.ConfigurationDatabase.Tests/` — migration tests, repo tests.
|
||
|
||
**Acceptance criteria:**
|
||
- `dotnet build` of the solution succeeds.
|
||
- `dotnet ef database update` against a dev MS SQL applies the migration; `AuditLog` table exists, partitioned monthly on `OccurredAtUtc`, with PK on `EventId` and the five expected indexes.
|
||
- `scadalink_audit_writer` and `scadalink_audit_purger` SQL roles exist with the documented grants; a smoke test confirms `UPDATE AuditLog` from the writer role fails.
|
||
- `AuditEvent` record, `AuditChannel`/`AuditKind`/`AuditStatus` enums, `IAuditWriter`/`ICentralAuditWriter` interfaces, `AuditTelemetryEnvelope`/`PullAuditEvents` message DTOs all exist in Commons in the right folders.
|
||
- `IAuditLogRepository` interface (Commons) and EF implementation (ConfigurationDatabase) exist; the implementation only exposes `InsertIfNotExistsAsync`, paged read, and `SwitchOutPartitionAsync` — no update or row-delete.
|
||
- All new tests pass; no existing tests regress.
|
||
|
||
### M1 — Tasks (TDD-detail)
|
||
|
||
#### M1-T1: Add audit enums to Commons
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Types/Enums/AuditChannel.cs`, `AuditKind.cs`, `AuditStatus.cs`.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Types/Enums/AuditEnumTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Write failing test verifying `AuditChannel` has exactly `ApiOutbound | DbOutbound | Notification | ApiInbound` (asserting `Enum.GetValues` length and members).
|
||
2. Same for `AuditKind` (10 members per `Component-AuditLog.md`).
|
||
3. Same for `AuditStatus` (8 members).
|
||
4. Run: tests fail (enums don't exist). Implement the three enums.
|
||
5. Run tests: pass.
|
||
6. Commit: `feat(commons): add Audit{Channel,Kind,Status} enums for #23`.
|
||
|
||
#### M1-T2: Add AuditEvent record + ForwardState enum
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Entities/Audit/AuditEvent.cs` — public record carrying all 20 central columns (per `alog.md` §4) plus a nullable `ForwardState?` for the site-local variant.
|
||
- Create: `src/ScadaLink.Commons/Types/Enums/AuditForwardState.cs` — `Pending | Forwarded | Reconciled`.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Entities/Audit/AuditEventTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Write failing test that constructs an `AuditEvent`, sets every property, and round-trips via `with` expressions — asserts immutability and required-property behaviour.
|
||
2. Run: fail (type doesn't exist). Implement the record.
|
||
3. Run: pass.
|
||
4. Commit: `feat(commons): add AuditEvent record + ForwardState enum`.
|
||
|
||
#### M1-T3: Add IAuditWriter and ICentralAuditWriter
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Interfaces/Services/IAuditWriter.cs`, `ICentralAuditWriter.cs`.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Interfaces/Services/AuditWriterContractTests.cs` (smoke — only that the interfaces exist and have the documented signatures).
|
||
|
||
**Steps:**
|
||
1. Write failing reflection-based test asserting both interfaces expose `Task WriteAsync(AuditEvent, CancellationToken)`.
|
||
2. Run: fail. Implement both interfaces; document each with XML doc comments naming Audit Log #23 as the owner.
|
||
3. Run: pass.
|
||
4. Commit: `feat(commons): add IAuditWriter and ICentralAuditWriter`.
|
||
|
||
#### M1-T4: Add audit telemetry + pull message DTOs
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Messages/Integration/AuditTelemetryEnvelope.cs`, `PullAuditEventsRequest.cs`, `PullAuditEventsResponse.cs`.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/AuditTelemetryMessagesTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: construct envelope with a batch of 3 events, assert immutability + batch enumerability.
|
||
2. Failing test: pull request carries `SinceUtc` + `BatchSize`; response carries events + `MoreAvailable`.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(commons): add audit telemetry + pull message DTOs`.
|
||
|
||
#### M1-T5: Extend ScadaLinkDbContext with AuditLogs DbSet + entity config
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs` — add `public DbSet<AuditEvent> AuditLogs => Set<AuditEvent>();` at the appropriate position (after `Notifications`).
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Entities/AuditLogEntityTypeConfiguration.cs` — `IEntityTypeConfiguration<AuditEvent>` mapping the columns, types, length constraints, and indexes per `alog.md` §4. Note: this is an EF mapping only; the partition function and scheme are created in the SQL migration (next task) since EF Core doesn't model them natively.
|
||
- Modify: `OnModelCreating` — apply the new configuration.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/AuditLogEntityTypeConfigurationTests.cs` — use `ModelBuilder` directly to verify the entity is mapped to `AuditLog` table, PK is `EventId`, and the expected columns + indexes are declared.
|
||
|
||
**Steps:**
|
||
1. Failing test asserts mapped table name, PK column, and column count.
|
||
2. Implement entity configuration; apply in `OnModelCreating`.
|
||
3. Failing test asserts the five expected indexes exist on the model.
|
||
4. Add `HasIndex` declarations.
|
||
5. Run: pass.
|
||
6. Commit: `feat(configdb): map AuditEvent to AuditLog table with PK and indexes`.
|
||
|
||
#### M1-T6: Generate and customize EF migration for AuditLog
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/<timestamp>_AddAuditLogTable.cs` via `dotnet ef migrations add AddAuditLogTable --project ScadaLink.ConfigurationDatabase`.
|
||
- Modify: the generated `Up()` / `Down()` to:
|
||
- Create the partition function `pf_AuditLog_Month` and partition scheme `ps_AuditLog_Month` (raw SQL via `migrationBuilder.Sql(...)`), tied to a dedicated filegroup (or PRIMARY in dev — configurable via a migration setting).
|
||
- Alter the `CreateTable` call (or follow up with `Sql`) to align the table to `ps_AuditLog_Month(OccurredAtUtc)`.
|
||
- Add the five indexes generated by EF; ensure each is also partition-aligned where appropriate.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddAuditLogTableMigrationTests.cs` — applies the migration to an isolated MS SQL LocalDB instance (existing IntegrationTests harness), asserts table + partition function + scheme + indexes are present.
|
||
|
||
**Steps:**
|
||
1. Run `dotnet ef migrations add AddAuditLogTable`.
|
||
2. Failing integration test: apply migration, query `sys.partition_functions` and `sys.partition_schemes` for the expected names.
|
||
3. Edit migration to add the partition function + scheme + alignment.
|
||
4. Re-run test: pass.
|
||
5. Failing test: query `sys.indexes` for the five expected named indexes.
|
||
6. Adjust migration if any index name drifts.
|
||
7. Run: pass.
|
||
8. Commit: `feat(configdb): add AuditLog migration with monthly partitioning`.
|
||
|
||
#### M1-T7: Add DB roles in migration
|
||
|
||
**Files:**
|
||
- Modify: the M1-T6 migration `Up()` to also create the `scadalink_audit_writer` (INSERT + SELECT only) and `scadalink_audit_purger` (ALTER PARTITION FUNCTION + ALTER TABLE … SWITCH PARTITION + SELECT) roles via raw SQL. Make role creation idempotent (`IF NOT EXISTS`).
|
||
- Modify: `Down()` — drop the roles.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AuditLogRoleGrantsTests.cs` — applies migration, then runs `SELECT` on `sys.database_role_members` / `sys.database_permissions` to assert the role grants. Plus a smoke test: connect as a user mapped to `scadalink_audit_writer`, attempt `UPDATE AuditLog SET Status = 'X'` and expect a permission error.
|
||
|
||
**Steps:**
|
||
1. Failing test asserts both roles exist with documented grants.
|
||
2. Add `migrationBuilder.Sql(...)` blocks.
|
||
3. Run: pass.
|
||
4. Failing test: `UPDATE AuditLog` as audit writer → expect SqlException with permission error.
|
||
5. Verify the role's permissions deny UPDATE (they should by default since only INSERT + SELECT granted).
|
||
6. Run: pass.
|
||
7. Commit: `feat(configdb): add scadalink_audit_writer and scadalink_audit_purger roles`.
|
||
|
||
#### M1-T8: Add IAuditLogRepository + EF implementation
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Interfaces/Repositories/IAuditLogRepository.cs` — `InsertIfNotExistsAsync(AuditEvent, CancellationToken)`, `QueryAsync(filter, paging, CancellationToken)`, `SwitchOutPartitionAsync(monthBoundary, CancellationToken)`. **Deliberately no `UpdateAsync` or row-level `DeleteAsync`.**
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs` — implementation using the DbContext; `InsertIfNotExistsAsync` uses `MERGE` or raw `INSERT … WHERE NOT EXISTS` to satisfy idempotency without throwing on dupes.
|
||
- Modify: `ServiceCollectionExtensions.cs` — register `IAuditLogRepository` → `AuditLogRepository` in DI.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `InsertIfNotExistsAsync` for a fresh `EventId` writes one row; calling again with the same `EventId` is a no-op (no exception, no second row).
|
||
2. Implement; use a `MERGE` or `INSERT … WHERE NOT EXISTS` strategy that does NOT rely on EF change tracking.
|
||
3. Run: pass.
|
||
4. Failing test: paged `QueryAsync` returns rows in `(OccurredAtUtc desc, EventId desc)` order, respecting filter predicates (channel, kind, status, site, target, actor, correlation, time range).
|
||
5. Implement filter projection + keyset paging.
|
||
6. Run: pass.
|
||
7. Failing test: `SwitchOutPartitionAsync` for the oldest partition removes its rows from the live table.
|
||
8. Implement via `migrationBuilder`-style `Sql("ALTER TABLE ... SWITCH PARTITION ... TO ...")` (against a staging table the implementation creates and drops within the same transaction).
|
||
9. Run: pass.
|
||
10. Commit: `feat(configdb): IAuditLogRepository + EF implementation (append-only, partition-switch purge)`.
|
||
|
||
#### M1-T9: Add AuditLogOptions configuration class + binding
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Configuration/AuditLogOptions.cs` (new project — see M1-T11) — owns `DefaultCapBytes`, `ErrorCapBytes`, `HeaderRedactList`, `GlobalBodyRedactors`, `PerTargetOverrides`, `RetentionDays`, validation attributes.
|
||
- Add: validation on startup (`IValidateOptions<AuditLogOptions>`).
|
||
- Test: ensure `appsettings.json` bind round-trips and validation rejects out-of-range `RetentionDays`.
|
||
|
||
**Steps:**
|
||
1. Failing test: bind a valid section → values present.
|
||
2. Implement options class + binding.
|
||
3. Failing test: bind invalid `RetentionDays` → validator rejects.
|
||
4. Implement validator.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): add AuditLogOptions config binding`.
|
||
|
||
#### M1-T10: Add ScadaLink.AuditLog project skeleton
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj` — TargetFramework matches the rest of the solution; ProjectReferences to `ScadaLink.Commons` and `ScadaLink.ConfigurationDatabase`.
|
||
- Create: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — `AddAuditLog(this IServiceCollection, IConfiguration)` that registers `AuditLogOptions`, `IAuditLogRepository`, plus placeholders that later milestones will fill (writer impls, actors).
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/ScadaLink.AuditLog.Tests.csproj` with one smoke test.
|
||
- Modify: `ScadaLink.slnx` — add both projects to the solution.
|
||
- Modify: `Directory.Packages.props` if any new package versions are needed.
|
||
|
||
**Steps:**
|
||
1. Create projects via `dotnet new classlib` / `dotnet new xunit`; add references; add to slnx.
|
||
2. Failing test: smoke-test `AddAuditLog()` populates DI with `IAuditLogRepository` and `IOptions<AuditLogOptions>`.
|
||
3. Implement `ServiceCollectionExtensions.AddAuditLog`.
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): scaffold ScadaLink.AuditLog project`.
|
||
|
||
#### M1-T11: Update Component-Host.md responsibilities + README component table
|
||
|
||
**Files:**
|
||
- Modify: `docs/requirements/Component-Host.md` — list `ScadaLink.AuditLog` in the central role's registration set.
|
||
- Modify: `README.md` — confirm row #23 link reflects the new project (no functional change; this is a paper-trail update).
|
||
|
||
**Steps:**
|
||
1. Edit, verify cross-refs, commit: `docs(audit): register ScadaLink.AuditLog project in Host role`.
|
||
|
||
---
|
||
|
||
## M2 — Site pipeline (sync-only path)
|
||
|
||
**Goal:** First end-to-end audit emission: a script-initiated `ExternalSystem.Call()` produces an audit row in the central `AuditLog` table. No cached paths yet, no notifications, no inbound API, no UI. Just one channel + kind: `ApiOutbound` / `ApiCall`.
|
||
|
||
**Affected projects:** `Commons`, `AuditLog` (new), `Communication`, `Host`, `ExternalSystemGateway`, all matching `*.Tests/`, `tests/ScadaLink.IntegrationTests/`.
|
||
|
||
> **M1 realities to honor:**
|
||
> - **Vocabulary**: M1 enums use `AuditKind.ApiCall` (sync) and `AuditStatus.Delivered|Failed`. The original spec's `SyncCall` / `Success` names were superseded; alog.md + Component-AuditLog.md were reconciled in the M1 merge.
|
||
> - **Idempotent insert race**: M1's `AuditLogRepository.InsertIfNotExistsAsync` uses non-locking `IF NOT EXISTS … INSERT`. M2 is the first concurrent writer (`AuditLogIngestActor` will receive batches from multiple sites). **Harden the repo before relying on it** — either add `WITH (UPDLOCK, HOLDLOCK)` to the existence check, or catch SqlException numbers 2601/2627 (duplicate key on `UX_AuditLog_EventId`) and swallow. Add a new task at the head of M2 for this fix and its concurrency test.
|
||
> - **Keyset tiebreaker test gap**: M1's `QueryAsync_Keyset_NextPageStartsAfterCursor` test uses five rows with distinct `OccurredAtUtc`, so the `Guid.CompareTo` tiebreaker branch is never exercised. Add a same-OccurredAt test in M2 (Bundle D reviewer's deferred recommendation).
|
||
> - **Reusable MSSQL fixture**: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/MsSqlMigrationFixture.cs` + `[SkippableFact]` + `Skip.IfNot(_fixture.Available, _fixture.SkipReason)` is the established pattern. Consider promoting it to a `[CollectionDefinition]`-shared fixture when M2+ adds more MSSQL-dependent test classes.
|
||
> - **Project layout**: `src/ScadaLink.AuditLog/` is wired into the solution with `Configuration/AuditLogOptions.cs` + validator + `ServiceCollectionExtensions.AddAuditLog()`. M2's `Site/` and `Central/` subfolders attach to this project; the DI extension is the registration point.
|
||
|
||
**Acceptance criteria:**
|
||
- Site-local `IAuditWriter` writes to a per-site SQLite `auditlog.db` on the hot path with `ForwardState = 'Pending'`; durability is sub-millisecond; failures fall back to a bounded in-memory ring and surface a metric.
|
||
- `SiteAuditTelemetryActor` drains pending rows in batches via a new `IngestAuditEvents` RPC on the existing `SiteStream` gRPC service; on success flips `ForwardState = 'Forwarded'`.
|
||
- `AuditLogIngestActor` (central singleton) receives the batch, performs `InsertIfNotExistsAsync` per event, returns ack.
|
||
- `ExternalSystem.Call()` emits one `ApiOutbound.SyncCall` row via `IAuditWriter` on every call completion; audit-write failure does NOT abort the script.
|
||
- Integration test in `tests/ScadaLink.IntegrationTests/` boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the central `AuditLog` within N seconds.
|
||
- No regressions in existing ExternalSystemGateway or Communication tests.
|
||
|
||
### M2 — Tasks (TDD-detail)
|
||
|
||
#### M2-T1: `SqliteAuditWriter` — schema + connection bootstrap
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — implements `IAuditWriter`. Constructor takes a `SqliteOptions` (path); single `SqliteConnection` per instance gated by `SemaphoreSlim(1,1)`. Calls `InitializeSchema()` on first use. Pattern from `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98`.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: opening a writer against a `:memory:` SQLite produces an `AuditLog` table with the documented columns (the 20 central columns minus `IngestedAtUtc`, plus `ForwardState`).
|
||
2. Run: fail (class doesn't exist).
|
||
3. Implement `InitializeSchema()` with `CREATE TABLE IF NOT EXISTS AuditLog (...)`. Use SQLite column types matching the EF mapping where reasonable (`TEXT` for IDs, `INTEGER` for status enums, `BLOB` not used).
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): SqliteAuditWriter schema bootstrap`.
|
||
|
||
#### M2-T2: `SqliteAuditWriter` — hot-path `WriteAsync`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs`.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `WriteAsync(event)` inserts one row with `ForwardState = Pending`.
|
||
2. Failing test: 1,000 concurrent `WriteAsync` calls all complete without exception and produce exactly 1,000 rows (write-lock correctness).
|
||
3. Run: fail.
|
||
4. Implement using a parameterized `INSERT` under `SemaphoreSlim` lock.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): SqliteAuditWriter hot-path INSERT with write lock`.
|
||
|
||
#### M2-T3: `RingBufferFallback` — in-memory fallback
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Site/RingBufferFallback.cs` — `Channel<AuditEvent>` with `BoundedChannelFullMode.DropOldest`, default capacity 1024.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: enqueueing 1,025 events into a 1,024-cap ring drops the oldest and emits a `RingBufferOverflow` notification (incrementing a passed-in counter).
|
||
2. Failing test: `DrainTo(writer)` writes all buffered events in FIFO order and clears the ring.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): RingBufferFallback with drop-oldest overflow`.
|
||
|
||
#### M2-T4: `FallbackAuditWriter` — compose primary + ring behind `IAuditWriter`
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` — primary writer is `SqliteAuditWriter`; on transient exception, enqueues into `RingBufferFallback` and increments `SiteAuditWriteFailures` (M2-T11). On the next successful primary write, drains the ring back through the primary.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: when the primary throws, the event lands in the ring and the call returns successfully.
|
||
2. Failing test: when primary writes succeed again, the ring drains in FIFO order.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): FallbackAuditWriter composing SQLite + ring`.
|
||
|
||
#### M2-T5: Extend `sitestream.proto` with `IngestAuditEvents` RPC
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `message AuditEventDto { string event_id = 1; google.protobuf.Timestamp occurred_at_utc = 2; ... }` (all 20 central fields), `message AuditEventBatch { repeated AuditEventDto events = 1; }`, `message IngestAck { repeated string accepted_event_ids = 1; }`, and `rpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);` on `SiteStreamService`.
|
||
- Build: `dotnet build src/ScadaLink.Communication/` regenerates the C# stubs.
|
||
- Create: `tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: round-trip serialize/deserialize a populated `AuditEventDto`; assert all fields survive.
|
||
2. Edit proto; rebuild.
|
||
3. Run: pass.
|
||
4. Commit: `feat(comms): add IngestAuditEvents RPC + AuditEvent proto messages`.
|
||
|
||
#### M2-T6: `AuditEvent` ↔ `AuditEventDto` mapper
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs` — static `ToDto(AuditEvent)` and `FromDto(AuditEventDto)`.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: round-trip a populated `AuditEvent` through `ToDto` → `FromDto`; assert equality on all 20 columns.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(auditlog): AuditEvent ↔ proto Dto mapper`.
|
||
|
||
#### M2-T7: `SiteAuditTelemetryActor` — drain loop
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs` — `ReceiveActor` with a `Drain` self-tick. On `Drain`: read up to `BatchSize` `Pending` rows from SQLite; send via gRPC; mark accepted rows `Forwarded`.
|
||
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs` — `BatchSize = 256`, `BusyIntervalSeconds = 5`, `IdleIntervalSeconds = 30`.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.cs` using `TestKit` + NSubstitute for the gRPC client.
|
||
|
||
**Steps:**
|
||
1. Failing test: when SQLite has 50 pending rows, a `Drain` tick sends one batch via the mocked gRPC client.
|
||
2. Failing test: on ack, the corresponding rows flip to `Forwarded` in SQLite.
|
||
3. Failing test: when gRPC throws, rows stay `Pending` and the next tick retries.
|
||
4. Failing test: cadence is 5s after a tick that drained ≥1 row, 30s after a tick that drained 0.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(auditlog): SiteAuditTelemetryActor drain loop`.
|
||
|
||
#### M2-T8: `AuditLogIngestActor` + gRPC server handler
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — `ReceiveActor` accepting `IngestAuditEventsCommand(batch)`; calls `IAuditLogRepository.InsertIfNotExistsAsync` for each event inside a single `DbContext` transaction; replies with `IngestAck(acceptedEventIds)`.
|
||
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — implement the new `IngestAuditEvents` method as a thin gRPC↔Akka adapter (`Ask` against the central singleton's proxy, mapped to the gRPC reply).
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: actor receives a batch of 5 events; repo is called 5 times; reply lists all 5 EventIds as accepted.
|
||
2. Failing test: when 2 of 5 events already exist (repo returns `Inserted = false`), the reply still lists all 5 as accepted (idempotent semantics).
|
||
3. Failing test: gRPC handler routes to actor and returns its reply.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): AuditLogIngestActor + gRPC server handler`.
|
||
|
||
#### M2-T9: Host registration with dedicated dispatcher
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — alongside the existing wiring at `:272–280`, register `AuditLogIngestActor` as central singleton and `SiteAuditTelemetryActor` as site singleton bound to `audit-telemetry-dispatcher`. Manager + proxy pair for both.
|
||
- Modify: Host HOCON (likely `src/ScadaLink.Host/Configuration/akka.conf` or similar) — add `audit-telemetry-dispatcher { type = ForkJoinDispatcher; parallelism-min = 1; parallelism-max = 2; }`.
|
||
- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register actor `Props` factories so Host can resolve them.
|
||
- Create: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: starting the host with the audit module loaded produces healthy `IActorRef` proxies for both singletons.
|
||
2. Failing test: `SiteAuditTelemetryActor` is bound to `audit-telemetry-dispatcher` (assert via Akka actor cell inspection or via a known-good dispatcher-tagged behaviour).
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(host): register AuditLog singletons with dedicated dispatcher`.
|
||
|
||
#### M2-T10: ESG `ExternalSystemClient.CallAsync` audit emission
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs` (sync `CallAsync` around line 45–70) — inject `IAuditWriter` via constructor. After the call completes (success OR exception), build an `AuditEvent` (channel=`ApiOutbound`, kind=`SyncCall`, status from outcome, `DurationMs`, `HttpStatus`, target = system+method, provenance from `ScriptExecutionContext`). Call `_auditWriter.WriteAsync(evt)` inside a `try`/`catch` that swallows + logs + increments `SiteAuditWriteFailures`.
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs` — accept `IAuditWriter` from DI.
|
||
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientAuditEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: sync `CallAsync` success → exactly one event with `Status=Success`, `Channel=ApiOutbound`, `Kind=SyncCall`.
|
||
2. Failing test: sync `CallAsync` HTTP 500 → `Status=TransientFailure`, `HttpStatus=500`.
|
||
3. Failing test: sync `CallAsync` HTTP 400 → `Status=PermanentFailure`, `HttpStatus=400`.
|
||
4. Failing test: when `IAuditWriter.WriteAsync` throws, the script call still completes normally and the script sees the original (non-audit) result.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(esg): emit ApiOutbound.SyncCall audit event on every sync call`.
|
||
|
||
#### M2-T11: `SiteAuditWriteFailures` health metric
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add a `SiteAuditWriteFailures` counter; expose it in the site health report payload.
|
||
- Modify: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` (M2-T4) — accept `IHealthMetrics` (or the project's existing health counter abstraction) and increment per failed primary write.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SiteAuditWriteFailuresMetricTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: 3 simulated SQLite failures → counter reports 3 in the next snapshot.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(health): SiteAuditWriteFailures metric`.
|
||
|
||
#### M2-T12: End-to-end integration test
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/SyncCallEmissionTests.cs` — boots a site + central pair via the existing IntegrationTests harness; deploys a tiny script that calls a stub external system; asserts the central `AuditLog` table has exactly one row with the expected channel/kind/status within 10s.
|
||
- Possibly modify: `infra/reseed.sh` if integration tests need a fresh AuditLog table per run.
|
||
|
||
**Steps:**
|
||
1. Sketch the test using existing IntegrationTests fixtures.
|
||
2. Run: fail somewhere (gaps in earlier tasks surface here).
|
||
3. Iterate fixes back through M2-T1..M2-T11 until end-to-end passes.
|
||
4. Commit: `test(auditlog): end-to-end sync call emission integration test`.
|
||
|
||
### M2 — Risk callouts
|
||
|
||
- **SiteStream proto evolution:** adding a new top-level RPC is wire-compatible; confirm generated `Sitestream.cs` rebuilds cleanly and existing tests still pass.
|
||
- **Dedicated dispatcher misconfiguration:** if `SiteAuditTelemetryActor` lands on the script blocking-I/O dispatcher, scripts will starve during telemetry bursts. Add a runtime assertion in `M2-T9` that the actor's dispatcher matches expectation.
|
||
- **Script execution context plumbing:** ESG emission (M2-T10) needs `SourceInstanceId` / `SourceScript`; confirm these are reachable via the existing `ScriptExecutionContext` (or equivalent in SiteRuntime) before starting M2-T10.
|
||
- **Integration-test DB isolation:** target an isolated MS SQL database (or a dedicated schema) so the test doesn't clash with other integration tests.
|
||
|
||
---
|
||
|
||
## M3 — Cached operations + dual-write transaction + (inlined) Site Call Audit foundations
|
||
|
||
**Goal:** Cached external calls (`ExternalSystem.CachedCall`) and cached DB writes (`Database.CachedWrite`) produce four audit rows per operation (`Kind=CachedSubmit Status=Submitted`, `Kind=ApiCallCached/DbWriteCached Status=Forwarded`, `Kind=ApiCallCached/DbWriteCached Status=Attempted` × N, `Kind=CachedResolve Status=Delivered|Failed|Parked|Discarded`) AND populate the operational `SiteCalls` table at central — in one transaction at central, from a single combined telemetry packet.
|
||
|
||
> **M2 realities to honor:**
|
||
> - **Vocabulary**: use the M1-aligned enums. M3 will be the first code to populate `AuditKind.ApiCallCached`, `DbWriteCached`, `CachedSubmit`, `CachedResolve`. The locked spec (alog.md + Component-AuditLog.md) was reconciled in the M1 merge.
|
||
> - **Site→central gRPC client deferred to M6**: M2 ships `NoOpSiteStreamAuditClient` as the production default. Site SQLite rows accumulate as `Pending` forever in production until M6. M3 component tests should use Bundle H's `DirectActorSiteStreamAuditClient` pattern (see `tests/ScadaLink.AuditLog.Tests/Integration/SyncCallEmissionEndToEndTests.cs:277-340`). Extract that helper into `tests/ScadaLink.AuditLog.Tests/Integration/Infrastructure/` so M3 cached-call E2E tests can reuse it without re-defining.
|
||
> - **Mapper duplication**: `SiteStreamGrpcServer.IngestAuditEvents` inlines DTO→entity decoding (intentional, to avoid the AuditLog→Communication project-ref cycle). The mapper lives at `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs`. M3 should add a comment in both spots tying them together, OR move the mapper into `src/ScadaLink.Commons/` (project-ref clean) so both consumers can share it.
|
||
> - **`AuditIngestAskTimeout = 30s` is hardcoded** in `SiteStreamGrpcServer.cs:37`. M3 may want to expose this via `CommunicationOptions` or `AuditLogOptions` as central reconciliation/dual-write traffic grows.
|
||
> - **CachedCallTelemetry message**: per CLAUDE.md, the existing `CachedCallTelemetry` message **does not yet exist in code**. M3 must create it from scratch (additively, per Commons REQ-COM-5a — DO NOT rename it `CachedOperationTelemetry`). It carries BOTH the AuditLog rows (4+) AND the SiteCalls upsert in one packet.
|
||
> - **Dual-write transaction**: central writes `AuditLog` + `SiteCalls` in one MS SQL transaction. The repository's `InsertIfNotExistsAsync` swallows duplicates (M2 Bundle A fix); the SiteCalls upsert uses `MERGE` (or insert-if-not-exists then upsert-on-newer-status per CLAUDE.md). M3 must ensure the same Bundle A swallow pattern applies if duplicate `CachedCallId` arrives.
|
||
> - **AuditEvent ForwardState semantics in M3**: cached-operation telemetry rows are site-emitted just like sync M2 rows, so the same site SQLite hot-path + `Pending→Forwarded` lifecycle applies. The four lifecycle rows share a CorrelationId (the TrackedOperationId), but each is its own AuditEvent with a distinct EventId.
|
||
|
||
**Affected projects:** `Commons`, `AuditLog`, `SiteCallAudit` (new — minimum-viable surface), `ConfigurationDatabase` (new `SiteCalls` table migration), `ExternalSystemGateway`, `StoreAndForward`, `Host`. Tests across all of them + IntegrationTests.
|
||
|
||
**Prerequisite call-out:** This milestone implements the minimum-viable Site Call Audit (#22) surface and cached-call tracking pieces — `TrackedOperationId`, site-local operation tracking SQLite, `SiteCalls` table at central, the existing-message `CachedCallTelemetry` (must be created from scratch since it doesn't exist in code despite living in the docs). Full reconciliation, KPIs, and Retry/Discard relay for #22 are deferred — they're not on the critical path for the audit log's combined telemetry.
|
||
|
||
**Acceptance criteria:**
|
||
- New `SiteCalls` MS SQL table + repo (no partitioning needed; this is operational state, not audit).
|
||
- New `CachedCallTelemetry` message in Commons carrying BOTH the cached-call operational fields AND an `AuditEvent` payload.
|
||
- Site path: `CachedCall` writes the audit row to site SQLite (`Kind = CachedEnqueued`), creates the site operation-tracking row, and sends a combined telemetry packet.
|
||
- Central path: `AuditLogIngestActor` (extended) receives the combined packet, performs **one transaction containing both** the `AuditLog` insert and the `SiteCalls` upsert.
|
||
- Retry attempt → `Kind = CachedAttempt` audit row + `SiteCalls` status transition. Terminal → `Kind = CachedTerminal` audit row + `SiteCalls` terminal status.
|
||
- Integration test asserts: triggering a `CachedCall` that fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row with `Status = Delivered`, all sharing the same `TrackedOperationId` correlation key.
|
||
|
||
### M3 — Tasks (TDD-detail)
|
||
|
||
#### M3-T1: `TrackedOperationId` strong-typed ID
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Types/TrackedOperationId.cs` — readonly record struct wrapping `Guid`; `New()` / `Parse(string)` / `ToString()`.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Types/TrackedOperationIdTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: round-trip via `ToString()` / `Parse()` and equality semantics.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(commons): TrackedOperationId strong type`.
|
||
|
||
#### M3-T2: Site-local operation-tracking SQLite table + repo
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs` — SQLite-backed store with columns: `TrackedOperationId`, `Kind`, `TargetSummary`, `Status`, `RetryCount`, `LastError`, `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`, source provenance. Schema bootstrap on first use; uses the same write-lock pattern as `SqliteAuditWriter`. Implements `IOperationTrackingStore` (interface in Commons).
|
||
- Create: `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs` — `RecordEnqueueAsync`, `RecordAttemptAsync`, `RecordTerminalAsync`, `GetStatusAsync(TrackedOperationId)`, `PurgeTerminalAsync(olderThanUtc)`.
|
||
- Create: `tests/ScadaLink.SiteRuntime.Tests/Tracking/OperationTrackingStoreTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: schema bootstrap creates the table.
|
||
2. Failing test: `RecordEnqueueAsync` inserts a `Pending` row; `RecordAttemptAsync` updates `Status`/`RetryCount`/`LastError`; `RecordTerminalAsync` finalises.
|
||
3. Failing test: `GetStatusAsync` returns the latest snapshot (answers `Tracking.Status(id)` site-locally).
|
||
4. Failing test: `PurgeTerminalAsync` removes terminal rows older than threshold; non-terminal rows are kept regardless of age.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(siteruntime): site-local operation tracking SQLite store`.
|
||
|
||
#### M3-T3: `Tracking.Status(id)` API surface in SiteRuntime
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.SiteRuntime/Scripting/TrackingApi.cs` (new or existing — confirm via repo) — public `Status(TrackedOperationId)` method routed through `IOperationTrackingStore`.
|
||
- Modify: script trust-model allow-list to include the new `Tracking.*` surface (confirm via grep).
|
||
- Create: `tests/ScadaLink.SiteRuntime.Tests/Scripting/TrackingApiTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `Tracking.Status(unknownId)` returns a documented "not found" sentinel.
|
||
2. Failing test: `Tracking.Status(knownId)` returns the latest snapshot.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(siteruntime): Tracking.Status(id) script API`.
|
||
|
||
#### M3-T4: `CachedCallTelemetry` Commons message — carries both operational + audit content
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Messages/Integration/CachedCallTelemetry.cs` — fields: `TrackedOperationId`, `Kind` (`CachedEnqueued`/`CachedAttempt`/`CachedTerminal` audit kind), operational status, retry count, last error, timestamps, and a nested `AuditEvent` carrying the audit row content. Documented as additive-only per Commons REQ-COM-5a.
|
||
- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/CachedCallTelemetryTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: construct a telemetry packet for each of the three lifecycle kinds; verify the nested AuditEvent's channel/kind alignment (e.g., a `CachedAttempt` packet must carry an `AuditEvent` with `Kind = CachedAttempt`).
|
||
2. Failing test: serialization round-trip preserves both layers.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(commons): CachedCallTelemetry carrying combined operational + audit content`.
|
||
|
||
#### M3-T5: `SiteCalls` MS SQL table — EF mapping
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Entities/Audit/SiteCall.cs` — POCO record per Component-SiteCallAudit.md.
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Entities/SiteCallEntityTypeConfiguration.cs` — `IEntityTypeConfiguration<SiteCall>` with PK on `TrackedOperationId`, indexes on `(SourceSite, CreatedAtUtc)` and `(Status, UpdatedAtUtc)`.
|
||
- Modify: `ScadaLinkDbContext.cs` — `public DbSet<SiteCall> SiteCalls => Set<SiteCall>();`.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/SiteCallEntityTypeConfigurationTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: model exposes `SiteCalls` table with documented columns and indexes.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(configdb): map SiteCall to SiteCalls table`.
|
||
|
||
#### M3-T6: `SiteCalls` migration
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/<ts>_AddSiteCallsTable.cs` via `dotnet ef migrations add AddSiteCallsTable`.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddSiteCallsTableMigrationTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: applying the migration creates the `SiteCalls` table with PK + indexes.
|
||
2. Generate + adjust migration.
|
||
3. Run: pass.
|
||
4. Commit: `feat(configdb): add SiteCalls migration`.
|
||
|
||
#### M3-T7: `ISiteCallAuditRepository` + EF impl
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs` — `UpsertAsync(SiteCall)` (insert-if-not-exists by `TrackedOperationId`, otherwise update-on-newer-status using monotonic status progression), `GetAsync(TrackedOperationId)`, `QueryAsync(filter, paging)`, `PurgeTerminalAsync(olderThanUtc)`.
|
||
- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs`.
|
||
- Modify: `ServiceCollectionExtensions.cs` — register.
|
||
- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: first `UpsertAsync` inserts; second `UpsertAsync` with an advanced status updates; an `UpsertAsync` with an older status is a no-op (monotonic progression).
|
||
2. Failing test: paged query supports the documented filter set.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(configdb): ISiteCallAuditRepository + EF impl`.
|
||
|
||
#### M3-T8: `SiteCallAuditActor` skeleton (central singleton)
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.SiteCallAudit/` (new project) — `SiteCallAuditActor.cs` + `ScadaLink.SiteCallAudit.csproj` + `ServiceCollectionExtensions.cs`. Actor handles `UpsertSiteCallCommand` messages by calling `ISiteCallAuditRepository.UpsertAsync`. Note: full reconciliation, KPIs, and Retry/Discard relay are explicitly deferred — this is the minimum-viable surface for M3.
|
||
- Modify: `ScadaLink.slnx` to include the new project.
|
||
- Create: `tests/ScadaLink.SiteCallAudit.Tests/SiteCallAuditActorTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: actor receives `UpsertSiteCallCommand`, calls repo, replies with ack.
|
||
2. Failing test: actor swallows transient DB errors and surfaces them as health metrics (does NOT crash the central singleton).
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(scaudit): SiteCallAuditActor minimum viable surface`.
|
||
|
||
#### M3-T9: Extend `sitestream.proto` with `IngestCachedTelemetry` RPC OR extend `IngestAuditEvents`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — preferred approach: add a new top-level RPC `rpc IngestCachedTelemetry(CachedTelemetryBatch) returns (IngestAck);` and a `message CachedTelemetryPacket { AuditEventDto audit_event = 1; SiteCallOperationalDto operational = 2; }` plus `message CachedTelemetryBatch { repeated CachedTelemetryPacket packets = 1; }`. Decision should be confirmed during M3's brainstorm.
|
||
- Build to regenerate.
|
||
- Create: `tests/ScadaLink.Communication.Tests/Protos/CachedTelemetryProtoTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: round-trip a populated `CachedTelemetryPacket`.
|
||
2. Add proto + rebuild.
|
||
3. Run: pass.
|
||
4. Commit: `feat(comms): IngestCachedTelemetry RPC + combined telemetry messages`.
|
||
|
||
#### M3-T10: Extend `AuditLogIngestActor` for combined telemetry — dual-write transaction
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — add a handler for the cached telemetry message. Inside a **single `DbContext` transaction**: (a) call `IAuditLogRepository.InsertIfNotExistsAsync(auditEvent)`, then (b) call `ISiteCallAuditRepository.UpsertAsync(operationalState)`. Both must succeed or both must roll back.
|
||
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — route the new RPC to the central actor.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorCombinedTelemetryTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a single combined packet produces one AuditLog row AND one SiteCalls row (or upsert).
|
||
2. Failing test: when the SiteCalls upsert throws, the AuditLog insert is rolled back (no orphan rows).
|
||
3. Failing test: when the AuditLog insert is a no-op (duplicate `EventId`), the SiteCalls upsert still runs.
|
||
4. Failing test: when both rows already exist with monotonic-equal statuses, the operation is a no-op overall (full idempotency).
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(auditlog): combined telemetry dual-write transaction`.
|
||
|
||
#### M3-T11: ESG `CachedCallAsync` — emit `CachedEnqueued` on enqueue
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:75–136` (cached call) — at the moment of buffering into S&F: build an `AuditEvent` (channel=`ApiOutbound`, kind=`CachedEnqueued`) AND a `SiteCallOperationalDto` (status=`Pending`); package as a `CachedTelemetryPacket`; hand to the combined-telemetry forwarder.
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/Cached/CachedCallTelemetryForwarder.cs` (new) — accumulates packets and posts to `SiteAuditTelemetryActor` (or a sibling actor — decision in milestone brainstorm).
|
||
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedCallEnqueueEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: an enqueued cached call produces exactly one packet with `kind=CachedEnqueued`.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(esg): CachedCall emits CachedEnqueued combined telemetry on buffering`.
|
||
|
||
#### M3-T12: ESG `CachedCallAsync` — emit `CachedAttempt` per retry
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.StoreAndForward/` retry loop (locate the per-attempt callback site) to emit a `CachedAttempt` packet on each attempt (success OR transient failure).
|
||
- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallAttemptEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: an attempt that returns HTTP 500 produces a packet with `kind=CachedAttempt`, `status=TransientFailure`, `HttpStatus=500`.
|
||
2. Failing test: a successful attempt produces a packet with `kind=CachedAttempt`, `status=Success`, `HttpStatus=200`.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(snf): CachedCall emits CachedAttempt per retry`.
|
||
|
||
#### M3-T13: ESG `CachedCallAsync` — emit `CachedTerminal` on terminal state
|
||
|
||
**Files:**
|
||
- Modify: same retry-loop terminal-transition site — on `Delivered` / `Failed` / `Parked` / `Discarded`, emit one final `CachedTerminal` packet.
|
||
- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallTerminalEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a cached call that succeeds on attempt 3 produces (in order): 1 `CachedEnqueued`, 3 `CachedAttempt`, 1 `CachedTerminal` (with `status=Delivered`).
|
||
2. Failing test: a cached call that exhausts retries produces a final `CachedTerminal` with `status=Parked`.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(snf): CachedCall emits CachedTerminal on lifecycle terminal`.
|
||
|
||
#### M3-T14: `Database.CachedWrite` — mirror the three-lifecycle emission for DB cached writes
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or equivalent — confirm via repo) — same three-event emission pattern as ESG cached calls, but `channel=DbOutbound`.
|
||
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedWriteLifecycleEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a `CachedWrite` that succeeds first try produces `CachedEnqueued` + `CachedAttempt(Success)` + `CachedTerminal(Delivered)`.
|
||
2. Failing test: a `CachedWrite` with transient retry mirrors the ESG pattern.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(esg): Database.CachedWrite emits three-lifecycle combined telemetry`.
|
||
|
||
#### M3-T15: Host registration — `SiteCallAuditActor` central singleton
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — register `SiteCallAuditActor` central singleton + proxy alongside `AuditLogIngestActor`.
|
||
- Modify: `src/ScadaLink.SiteCallAudit/ServiceCollectionExtensions.cs` — register actor props.
|
||
- Modify: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs` — extend to assert `SiteCallAuditActor` proxy resolves.
|
||
|
||
**Steps:**
|
||
1. Failing test: starting host produces the new singleton's proxy.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(host): register SiteCallAuditActor central singleton`.
|
||
|
||
#### M3-T16: Integration test — cached external call audit (end-to-end)
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs` — site + central; stub external system returns 500 twice then 200; script invokes `ExternalSystem.CachedCall("System","Method", args)`; assert AuditLog has 5 rows (Enqueued + 3 Attempts + Terminal) AND SiteCalls has 1 row with `Status=Delivered` AND `Tracking.Status(id)` reports the same.
|
||
|
||
**Steps:**
|
||
1. Sketch test against IntegrationTests harness.
|
||
2. Run: fail (likely surfacing earlier-task gaps).
|
||
3. Iterate fixes until pass.
|
||
4. Commit: `test(auditlog): cached call combined telemetry end-to-end`.
|
||
|
||
#### M3-T17: Integration test — cached DB write audit (end-to-end)
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedWriteCombinedTelemetryTests.cs` — mirror M3-T16 against the DB cached path.
|
||
|
||
**Steps:**
|
||
1. Sketch.
|
||
2. Iterate.
|
||
3. Commit: `test(auditlog): cached DB write combined telemetry end-to-end`.
|
||
|
||
#### M3-T18: Idempotency test — duplicate telemetry doesn't double-insert / double-upsert
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CombinedTelemetryIdempotencyTests.cs` — force the same packet to arrive twice (simulated telemetry retry); assert AuditLog still has exactly one row and SiteCalls upsert is monotonic.
|
||
|
||
**Steps:**
|
||
1. Sketch.
|
||
2. Pass.
|
||
3. Commit: `test(auditlog): combined telemetry idempotency on retried packets`.
|
||
|
||
### M3 — Risk callouts
|
||
|
||
- **Combined telemetry packet evolution:** design the proto so future cached audit-kind additions are non-breaking (avoid `oneof` for fields you'll extend; use sparse field numbers).
|
||
- **Dual-write transaction failure modes:** the single `DbContext` transaction at central spans two tables; ensure retry behaviour on transient connection errors works as expected (existing `IDbExecutionStrategy` patterns may apply).
|
||
- **Idempotency cross-table:** AuditLog dedups on `EventId`, SiteCalls dedups on `TrackedOperationId` with status-monotonic update. A retried packet whose AuditLog row exists must still upsert SiteCalls (no short-circuit).
|
||
- **Scope discipline:** M3 inlines the *minimum* surface for #22 and cached-call tracking. Full #22 reconciliation, KPIs, and Retry/Discard relay are deferred. Note in the milestone brainstorm whether any extra #22 surface is genuinely needed for M3 acceptance criteria — if not, defer aggressively.
|
||
- **`Tracking.Status` semantics:** confirmed authoritative site-locally per design; no central round-trip. Ensure the test in M3-T3 reflects this.
|
||
|
||
---
|
||
|
||
## M4 — Remaining boundary emission
|
||
|
||
**Goal:** Every channel × kind from `Component-AuditLog.md` produces a row when its boundary call fires.
|
||
|
||
**Affected projects:** `ExternalSystemGateway` (sync DB writes/reads, cached DB writes), `SiteRuntime` (Database surface exposing them), `NotificationOutbox` (central direct-write of `Attempt`/`Terminal`), `InboundAPI` (middleware). Tests across all.
|
||
|
||
> **M3 realities to honor:**
|
||
> - **Vocabulary**: use the M1-aligned enums. The roadmap's old `SyncWrite/SyncRead/Notification.Attempt/Notification.Terminal/Notification.Enqueued/ApiInbound.Completed/PermanentFailure` strings are pre-M1 spec wording — DO NOT use those names in code. Translation:
|
||
> - sync DB write/read → `AuditKind.DbWrite` (Channel=DbOutbound); distinguish read vs write via `Extra` (e.g., `{"op": "read", "rowsReturned": 42}`).
|
||
> - notification delivery attempt → `AuditKind.NotifyDeliver` with `AuditStatus.Attempted`.
|
||
> - notification delivery terminal → `AuditKind.NotifyDeliver` with `AuditStatus.Delivered|Failed|Parked|Discarded`.
|
||
> - notification submit (site-emit) → `AuditKind.NotifySend` with `AuditStatus.Submitted`.
|
||
> - inbound API success → `AuditKind.InboundRequest` with `AuditStatus.Delivered`.
|
||
> - inbound API auth failure → `AuditKind.InboundAuthFailure` with `AuditStatus.Failed`.
|
||
> - "permanent failure" → `AuditStatus.Failed`. "Transient failure" never lands a terminal row.
|
||
> - **Mapper consolidation**: M3 surfaced 4 DTO mappers (AuditEventMapper, SiteStreamGrpcServer inline, SiteCall DTO mapper, DirectActorSiteStreamAuditClient test stub). M4 should extract a single `IntegrationMappers` helper in `src/ScadaLink.Commons/Messages/Integration/` or similar to consolidate before adding more channels. The project-ref cycle that motivated the inline duplication can be broken by moving the mapper into Commons (proto types are auto-generated in Communication; the mapper just needs the proto types reachable from Commons via a transitive ref).
|
||
> - **`OnCachedTelemetryWithoutDualWriteAsync` test-mode fallback**: in `AuditLogIngestActor` for the single-repo ctor. M4 may deprecate the single-repo constructor entirely and migrate tests to the IServiceProvider+harness pattern.
|
||
> - **Site SQLite drain for OperationTrackingStore**: M3 wrote the tracking half site-locally but no drain pipeline pushes it to central — central reads SiteCalls operational state via the dual-write transaction only. If M4 needs central visibility into in-flight (non-terminal) tracking entries, plan a drain.
|
||
> - **`SiteCallAuditActor`**: wired in M3 as a cluster singleton + proxy but not on the M3 hot path. M4 (or M6 reconciliation) is the natural first direct caller — wire one production code path through it.
|
||
> - **Vocabulary correction** in the body of M4 below: every M4-T*1-N step that still says `Status=PermanentFailure`, `Kind=SyncWrite/SyncRead/Completed/Attempt/Terminal/Enqueued` is stale; apply the translation above when implementing.
|
||
|
||
**Acceptance criteria:**
|
||
- Sync `Database.Connection().Execute()` → `DbOutbound.DbWrite` row (with `Extra.op = "write"` and `rowsAffected`); `ExecuteReader` → `DbOutbound.DbWrite` row (with `Extra.op = "read"` and `rowsReturned`). Parameter values captured by default; per-connection redaction opt-in supported.
|
||
- `Database.CachedWrite` → three lifecycle rows via the combined telemetry built in M3.
|
||
- Notification Outbox dispatcher: every delivery attempt writes `NotifyDeliver` with `Status=Attempted`; terminal writes `NotifyDeliver` with `Status={Delivered|Failed|Parked|Discarded}`. Site-emitted `NotifySend` (`Status=Submitted`) flows through the standard site→central audit path. Audit-write failure never affects delivery.
|
||
- Inbound API middleware writes one `ApiInbound.InboundRequest` row per request, before `await next()` returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response. Auth failures emit `ApiInbound.InboundAuthFailure` with `Status=Failed`.
|
||
|
||
### M4 — Tasks (TDD-detail)
|
||
|
||
#### M4-T1: ESG `Database.Connection().ExecuteAsync` audit emission — `DbOutbound.SyncWrite`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or wherever the script-facing `Execute*` lives — confirm via repo) — wrap the call site to emit an `AuditEvent` (channel=`DbOutbound`, kind=`SyncWrite`) on every `Execute`/`ExecuteScalar`. Capture statement text, parameter values (default; redaction in M5), `DurationMs`, `rowsAffected` in `Extra`.
|
||
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncWriteEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `Execute("INSERT INTO ...", new {...})` emits one event with `Channel=DbOutbound`, `Kind=SyncWrite`, statement text + parameter values captured.
|
||
2. Failing test: `ExecuteScalar` emits the same kind.
|
||
3. Failing test: execute that throws → emission with `Status=PermanentFailure`, `ErrorMessage` populated.
|
||
4. Failing test: audit-write failure does NOT abort the SQL call (script sees the original outcome).
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(esg): emit DbOutbound.SyncWrite on script-initiated Execute*`.
|
||
|
||
#### M4-T2: ESG `Database.Connection().ExecuteReaderAsync` audit emission — `DbOutbound.SyncRead`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` — wrap `ExecuteReader` to emit `DbOutbound.SyncRead`. Capture statement, parameter values, `DurationMs`, `rowsReturned` in `Extra`. Response body capture defaults to NOT including rows; opt-in via per-connection config (M5).
|
||
- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncReadEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `Query<T>("SELECT ...")` emits one event with `Channel=DbOutbound`, `Kind=SyncRead`.
|
||
2. Failing test: `rowsReturned` appears in `Extra`.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(esg): emit DbOutbound.SyncRead on script-initiated reads`.
|
||
|
||
#### M4-T3: NotificationOutboxActor — inject `ICentralAuditWriter`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:22–68` — constructor accepts `ICentralAuditWriter`. Wire into DI in `ServiceCollectionExtensions.cs`.
|
||
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAuditInjectionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: actor's `Props` factory accepts an `ICentralAuditWriter`; constructor stores it.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(notif): NotificationOutboxActor accepts ICentralAuditWriter`.
|
||
|
||
#### M4-T4: NotificationOutboxActor — emit `Notification.Attempt` per dispatcher attempt
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` dispatcher attempt branch (after each delivery attempt resolves) — emit `Notification.Attempt` row with `Status` mapped from attempt result (`Success`, `TransientFailure`, `PermanentFailure`).
|
||
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAttemptEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a successful attempt → exactly one event with `Kind=Attempt`, `Status=Success`.
|
||
2. Failing test: a transient-failure attempt → `Status=TransientFailure`, `ErrorMessage` populated.
|
||
3. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the dispatcher's per-attempt `Notifications` row update STILL succeeds (audit must never block delivery).
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(notif): emit Notification.Attempt per dispatcher attempt`.
|
||
|
||
#### M4-T5: NotificationOutboxActor — emit `Notification.Terminal` on terminal transition
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` terminal branches (`Delivered` / `Parked` / `Discarded` transitions) — emit `Notification.Terminal` row.
|
||
- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorTerminalEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a notification that succeeds emits one `Terminal` event with `Status=Delivered`.
|
||
2. Failing test: a Parked transition emits `Status=Parked`.
|
||
3. Failing test: an operator Discard emits `Status=Discarded`.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(notif): emit Notification.Terminal on terminal transitions`.
|
||
|
||
#### M4-T6: Site-emitted `Notification.Enqueued`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.NotificationService/` (or wherever the site-side `Notify.To().Send()` runs — confirm via repo) — at the moment of buffering into the site S&F: emit a site-side `AuditEvent` (channel=`Notification`, kind=`Enqueued`) via `IAuditWriter`. Telemetry forwards as usual.
|
||
- Create: `tests/ScadaLink.NotificationService.Tests/NotifyEnqueueAuditEmissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `Notify.To("list").Send("subject", "body")` emits one event with `Channel=Notification`, `Kind=Enqueued`, target=list name, body captured (subject too).
|
||
2. Failing test: audit-write failure does not abort `Send()`.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(notif): emit Notification.Enqueued from site-side Notify.Send`.
|
||
|
||
#### M4-T7: Inbound API — `AuditWriteMiddleware`
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs` — ASP.NET Core middleware. After `await next()` (so the response is fully resolved but BEFORE flush — using `HttpResponse.OnStarting` or buffered body), build an `AuditEvent` (channel=`ApiInbound`, kind=`Completed`, `Actor`=API key NAME from request context, `Target`=method name, `HttpStatus`, `DurationMs`, `RequestSummary`/`ResponseSummary`). Call `ICentralAuditWriter.WriteAsync` inside `try`/`catch` — failures never affect the response.
|
||
- Modify: `src/ScadaLink.InboundAPI/Startup.cs` (or wherever the pipeline is configured) — register middleware.
|
||
- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/AuditWriteMiddlewareTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a successful POST to `/api/{method}` produces one `ApiInbound.Completed` event with `HttpStatus=200`.
|
||
2. Failing test: a 400/401/500 response produces an event with the matching `HttpStatus` and `Status` mapped (`PermanentFailure` for 4xx, `TransientFailure` for 5xx).
|
||
3. Failing test: `Actor` carries the API key NAME (never the key material).
|
||
4. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the HTTP response is unchanged (success stays success).
|
||
5. Failing test: request remote IP and User-Agent appear in `Extra`.
|
||
6. Implement.
|
||
7. Run: pass.
|
||
8. Commit: `feat(inbound): AuditWriteMiddleware emitting ApiInbound.Completed per request`.
|
||
|
||
#### M4-T8: Register middleware in the ASP.NET pipeline
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.InboundAPI/Startup.cs` / `Program.cs` — `app.UseMiddleware<AuditWriteMiddleware>()` placed AFTER auth (so `Actor` resolves) and BEFORE the script-execution handler.
|
||
- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/MiddlewareOrderTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: pipeline ordering puts AuditWrite after auth, before script execution.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(inbound): register AuditWriteMiddleware in pipeline`.
|
||
|
||
#### M4-T9: Integration test — DB sync emission
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/DatabaseSyncEmissionTests.cs` — script invokes `Database.Connection().Execute("INSERT ...")` and `Query<T>("SELECT ...")`; assert central AuditLog has one `DbOutbound.SyncWrite` row and one `DbOutbound.SyncRead` row.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): DB sync emission integration test`.
|
||
|
||
#### M4-T10: Integration test — Notify dispatcher audit trail
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/NotifyDispatcherAuditTrailTests.cs` — script calls `Notify.To(list).Send(...)`; stub SMTP returns transient then success; assert AuditLog has `Enqueued` + 2 `Attempt` (one transient, one success) + 1 `Terminal(Delivered)`.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): Notify dispatcher audit trail end-to-end`.
|
||
|
||
#### M4-T11: Integration test — Inbound API request audit
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/InboundApiAuditTests.cs` — POST to `/api/{method}` with a valid API key; assert one `ApiInbound.Completed` row with the expected `Actor` (key name), `HttpStatus=200`, request/response bodies captured.
|
||
- Also test: POST with a bad API key → row with `Actor=NULL` (or "<unknown>"), `HttpStatus=401`, `Extra` carries `remoteIp`.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): Inbound API request audit end-to-end`.
|
||
|
||
#### M4-T12: Integration test — audit-write failure never aborts the action
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/AuditWriteFailureSafetyTests.cs` — inject a broken `ICentralAuditWriter` (always throws) for one test; assert that ESG sync calls, ESG cached calls, DB writes, Inbound API calls, and Notification dispatch all still complete successfully and the script/caller sees the normal outcome.
|
||
|
||
**Steps:**
|
||
1. Sketch test with broken-writer DI override per scenario.
|
||
2. Run, fix any spots where audit-write exceptions leak.
|
||
3. Commit: `test(auditlog): audit failures never abort user-facing actions`.
|
||
|
||
### M4 — Risk callouts
|
||
|
||
- **Inbound API correlation IDs:** if upstream tracing headers (W3C `traceparent`) are present, prefer them as `CorrelationId`; otherwise generate. Confirm whether existing middleware sets a request ID we can reuse.
|
||
- **`AuditWriteMiddleware` placement:** must run AFTER authentication so the API key NAME is in `HttpContext.User`. Verify with the middleware-order test in M4-T8.
|
||
- **Notification dispatcher loop hot-path:** audit emission must NOT extend per-attempt latency materially. Bench in M4-T10 if there's any concern.
|
||
- **DB parameter capture:** parameter values are captured verbatim by default (per design); redaction is opt-in (M5). For M4, just capture — don't try to second-guess what's sensitive.
|
||
|
||
---
|
||
|
||
## M5 — Payload + redaction policy
|
||
|
||
> **M4 realities to honor:**
|
||
> - **Decorator surfaces to filter**: `AuditingDbConnection`/`AuditingDbCommand`/`AuditingDbDataReader` (Bundle A) emit `RequestSummary` as raw SQL + parameters today. M5's `IAuditPayloadFilter` runs between event construction and writer call; the AuditingDb decorators must call into the filter before `WriteAsync`.
|
||
> - **CentralAuditWriter wraps `IAuditLogRepository.InsertIfNotExistsAsync`** (Bundle B). M5 should plug the filter into BOTH the site-side `FallbackAuditWriter` and the central-side `CentralAuditWriter` so direct-write paths (NotificationOutboxActor, AuditWriteMiddleware) are also filtered. Plugin location: in each writer's `WriteAsync` BEFORE the storage call.
|
||
> - **InboundAPI middleware `RequestSummary` already populates, `ResponseSummary = null`** (Bundle D punted response-body capture). M5 should add response-body buffering OR document that ResponseSummary stays null for v1 (acceptable per the spec — captures are best-effort).
|
||
> - **`AuditWriteMiddleware` path-scoped via `UseWhen(/api/)`** — M5 may want to introduce per-target redaction overrides; that path-scoped setup gives a natural hook for per-route redaction (e.g., `/api/secrets/*` has stricter caps).
|
||
> - **Error-row vocabulary**: cap raised to 64 KB on rows with `Status NOT IN ('Delivered', 'Submitted', 'Forwarded')`. The new vocabulary (`Failed/Parked/Discarded/Attempted/Skipped`) is what triggers the elevated cap. NOT "non-Success" wording from the original spec.
|
||
> - **InternalsVisibleTo precedent**: AuditLog.Tests can reach internals of SiteRuntime + NotificationOutbox + (newly) AuditLog. M5 redaction tests can exercise internal helpers similarly.
|
||
|
||
**Goal:** Payload capture is bounded (8 KB / 64 KB on error), headers are redacted by default, SQL parameter values are captured by default with per-connection opt-out, body redactor regexes are configurable per target, and the safety net over-redacts on misconfiguration.
|
||
|
||
**Affected projects:** `AuditLog` (policy engine + options), `ExternalSystemGateway` (HTTP header redactors, SQL param redaction hook), `InboundAPI` (header redactors, body capture), `NotificationOutbox` (subject/body capture follows existing rules). Tests.
|
||
|
||
**Acceptance criteria:**
|
||
- A `IAuditPayloadFilter` service is invoked between event construction and write. Truncates to default cap; raises to error cap on non-`Success` rows; applies header redactors; applies body regex redactors; applies SQL parameter redactors (per-connection); over-redacts on regex error and increments `AuditRedactionFailure`.
|
||
- Configuration test: changing `appsettings.json` redactors changes runtime behaviour (no rebuild needed for regex changes).
|
||
- Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm).
|
||
|
||
### M5 — Tasks (TDD-detail)
|
||
|
||
#### M5-T1: `IAuditPayloadFilter` interface
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Payload/IAuditPayloadFilter.cs` — single method `AuditEvent Apply(AuditEvent rawEvent)` that returns a filtered copy (truncation + redaction applied).
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/PayloadFilterContractTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: interface exists, method signature matches.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(auditlog): IAuditPayloadFilter contract`.
|
||
|
||
#### M5-T2: `DefaultAuditPayloadFilter` — truncation (default + error cap)
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — composes `TruncationStage` + redactors (M5-T3/T4/T5). Truncation rule: default cap = `AuditLogOptions.DefaultCapBytes` (8 KB); error cap = `ErrorCapBytes` (64 KB) applied when `Status` is NOT in {`Success`, `Delivered`, `Enqueued`}. UTF-8 byte-safe boundary (no mid-character cuts). Set `PayloadTruncated = true` when applied.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/TruncationTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: 10 KB success body → truncated to 8 KB; `PayloadTruncated = true`.
|
||
2. Failing test: 10 KB body on `Status=TransientFailure` → not truncated (under 64 KB cap); `PayloadTruncated = false`.
|
||
3. Failing test: 70 KB body on `Status=PermanentFailure` → truncated to 64 KB; `PayloadTruncated = true`.
|
||
4. Failing test: multi-byte UTF-8 character that would straddle the cap is not split mid-character.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety`.
|
||
|
||
#### M5-T3: HTTP header redaction
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add header-redaction stage. Strips header values for names in `AuditLogOptions.HeaderRedactList` (default: `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`) and any matching configured regex. Replacement: `<redacted>`.
|
||
- Headers travel in `RequestSummary` / `ResponseSummary` (JSON of headers + body) OR in `Extra` — confirm format during M5 brainstorm and document.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/HeaderRedactionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `Authorization: Bearer xyz` in `RequestSummary` becomes `Authorization: <redacted>`.
|
||
2. Failing test: case-insensitive match (`authorization` redacted too).
|
||
3. Failing test: custom redact-list extension works (operator adds `X-Custom-Token`).
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): HTTP header redaction`.
|
||
|
||
#### M5-T4: Body regex redaction with safety net
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add body-regex stage. Global redactors apply to all bodies; per-target redactors apply to matching `Target`. Patterns precompiled at startup; rejected if compile takes >100ms.
|
||
- Safety net: if a regex throws at runtime, replace the body with `<redacted: redactor error>` and increment `AuditRedactionFailure` (M5-T7).
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/BodyRegexRedactionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `"password":"hunter2"` in a JSON body → `"password":"<redacted>"` when the default global redactor pattern matches.
|
||
2. Failing test: per-target redactor only applies to matching `Target`.
|
||
3. Failing test: a redactor that throws → body becomes `<redacted: redactor error>` AND the counter increments.
|
||
4. Failing test: catastrophic backtracking regex rejected at startup.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(auditlog): body regex redaction with over-redaction safety net`.
|
||
|
||
#### M5-T5: SQL parameter redaction (per-connection opt-in)
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — for `Channel=DbOutbound` events, parse `Extra.params` and redact parameter VALUES whose NAME matches the connection's configured regex (from `AuditLogOptions.PerTargetOverrides[<connection name>].RedactSqlParamsMatching`).
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/SqlParamRedactionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: no opt-in config → params captured verbatim (default behaviour).
|
||
2. Failing test: opt-in regex `@apikey|@token` redacts those param VALUES but keeps OTHER param values intact.
|
||
3. Failing test: regex applies to parameter NAMES (not values) and is case-insensitive.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): per-connection SQL parameter redaction (opt-in)`.
|
||
|
||
#### M5-T6: Wire filter into emission paths
|
||
|
||
**Files:**
|
||
- Modify: ESG (M2-T10, M3-T11/12/13, M4-T1/T2), InboundAPI middleware (M4-T7), NotificationOutbox (M4-T4/T5), NotificationService site path (M4-T6) — every emission site receives `IAuditPayloadFilter` from DI and calls `filter.Apply(rawEvent)` before handing to the writer.
|
||
- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register `DefaultAuditPayloadFilter` as `IAuditPayloadFilter` singleton.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/FilterIntegrationTests.cs` — assert each emitter calls through the filter before the writer.
|
||
|
||
**Steps:**
|
||
1. Failing test: ESG emission writes the filter-applied event (not the raw one).
|
||
2. Failing test: same for each other emitter.
|
||
3. Implement by injecting the filter into each emitter and routing through it.
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): wire payload filter into all emission paths`.
|
||
|
||
#### M5-T7: `AuditRedactionFailure` health metric
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` (or equivalent) — add `AuditRedactionFailure` counter.
|
||
- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — increment on every redactor exception.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/AuditRedactionFailureMetricTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: 5 redactor exceptions → counter shows 5.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(health): AuditRedactionFailure metric`.
|
||
|
||
#### M5-T8: Configuration test — `appsettings.json` round-trip
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Configuration/AuditLogOptionsBindingTests.cs` — bind a realistic `appsettings.json` block (with header-redact list, body redactors, per-target overrides, retention) and assert values appear in `IOptions<AuditLogOptions>`. Re-bind with a hot-reload simulation and assert filter behaviour changes accordingly.
|
||
|
||
**Steps:**
|
||
1. Failing test: bind + read → matches.
|
||
2. Failing test: change config → filter behaviour updates without restart (`IOptionsMonitor` pattern).
|
||
3. Implement (likely needs adjusting M1-T9 from `IOptions` to `IOptionsMonitor`).
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): hot-reloadable AuditLogOptions`.
|
||
|
||
#### M5-T9: Performance test — hot-path latency budget
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.PerformanceTests/AuditLog/HotPathLatencyTests.cs` — bench `filter.Apply(event)` for a 4 KB JSON body with the default redactor set; target P95 < 50 µs (number set during M5 brainstorm based on baseline measurements). Also bench `SqliteAuditWriter.WriteAsync` end-to-end target P95 < 500 µs.
|
||
|
||
**Steps:**
|
||
1. Sketch test using BenchmarkDotNet or the existing performance test harness.
|
||
2. Run baseline; if over budget, profile + optimise.
|
||
3. Commit: `test(auditlog): hot-path latency budget`.
|
||
|
||
#### M5-T10: Safety-net test — bad regex over-redacts
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Payload/RedactionSafetyNetTests.cs` — register a deliberately bad regex that throws; assert the body is over-redacted (`<redacted: redactor error>`) rather than under-redacted (passing through unmodified).
|
||
|
||
**Steps:**
|
||
1. Failing test.
|
||
2. Verify the safety net from M5-T4 covers this.
|
||
3. Commit: `test(auditlog): redaction safety net over-redacts on regex failure`.
|
||
|
||
### M5 — Risk callouts
|
||
|
||
- **Regex catastrophic backtracking:** validate patterns at startup with a short-running compile test; reject patterns that exceed a timeout. Document the rejection behaviour.
|
||
- **Order of stages matters:** truncation BEFORE redaction means a redaction target halfway through the cap could get cut. Confirm the chosen order during M5 brainstorm; current draft applies redaction FIRST, then truncation — that way the redacted-replacement text is what gets truncated, not a half-secret.
|
||
- **Body capture format:** decide whether headers travel in `RequestSummary`/`ResponseSummary` or `Extra`. Affects M5-T3's redaction strategy. Lock during the M5 brainstorm.
|
||
- **Hot-reload semantics:** `IOptionsMonitor` snapshots — ensure pre-compiled regex cache invalidates when config changes.
|
||
|
||
---
|
||
|
||
## M6 — Reconciliation, purge, partition maintenance, health metrics
|
||
|
||
**Goal:** Self-healing telemetry, monthly partition rollover, daily purge, all five new health metrics live and feeding the existing health-report pipeline.
|
||
|
||
**Affected projects:** `AuditLog` (3 new actors: `SiteAuditReconciliationActor`, `AuditLogPurgeActor`, partition-maintenance worker), `Communication` (the `PullAuditEvents` RPC), `HealthMonitoring` (5 new metrics), `ConfigurationDatabase` (partition-roll-forward SQL helper).
|
||
|
||
**Acceptance criteria:**
|
||
- `SiteAuditReconciliationActor` runs every 5 minutes per site; pulls events the site reports as `Pending`; central performs `InsertIfNotExistsAsync` then signals the site to flip those rows to `Reconciled`.
|
||
- `AuditLogPurgeActor` runs daily; for each partition older than `RetentionDays`, switches it out to a staging table and drops the staging table. Emits an `AuditLog:Purged` event with rowcount + duration.
|
||
- Partition-maintenance job runs at month boundary to add the next month's partition function range and ensure the scheme has a destination filegroup.
|
||
- 5 new health metrics published per site: `SiteAuditBacklog` (count + oldest + bytes), `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`; and per central node: `CentralAuditWriteFailures`, `AuditRedactionFailure`.
|
||
- Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains.
|
||
|
||
### M6 — Tasks (TDD-detail)
|
||
|
||
#### M6-T1: Extend `sitestream.proto` with `PullAuditEvents` RPC
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `rpc PullAuditEvents(PullAuditEventsRequest) returns (PullAuditEventsResponse);` and the corresponding request/response messages (`sinceUtc`, `batchSize`, `events`, `more_available`).
|
||
- Build: regenerate stubs.
|
||
- Create: `tests/ScadaLink.Communication.Tests/Protos/PullAuditEventsProtoTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: round-trip request and response messages.
|
||
2. Add proto + rebuild.
|
||
3. Run: pass.
|
||
4. Commit: `feat(comms): PullAuditEvents RPC for audit reconciliation`.
|
||
|
||
#### M6-T2: Site-side handler for `PullAuditEvents`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` (the site-side server inside each site cluster) — handle `PullAuditEvents` by reading `Pending` rows older than `SinceUtc` from `SqliteAuditWriter` (read-only path) and streaming them back. After ack, mark them `Reconciled`.
|
||
- Create: `tests/ScadaLink.Communication.Tests/SiteStreamPullAuditEventsTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a pull request with N pending rows returns those rows; rows flip to `Reconciled` after the response is acked.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(comms): site-side PullAuditEvents handler`.
|
||
|
||
#### M6-T3: `SiteAuditReconciliationActor` — central, timer-driven
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — central singleton; on a 5-minute timer (configurable), for each known site, ask: "what's your oldest `Pending` row?" If the site reports a non-draining backlog (compared with the previous tick), issue a `PullAuditEvents` and ingest the returned rows via `IAuditLogRepository.InsertIfNotExistsAsync`. Keeps a per-site `LastReconciledAt` cursor.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Central/SiteAuditReconciliationActorTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: actor's timer fires every 5 minutes (test via `TestKit` virtual scheduler).
|
||
2. Failing test: when site reports non-draining backlog over two consecutive ticks, the actor issues a pull and ingests results.
|
||
3. Failing test: idempotency — re-running the pull doesn't double-insert (relies on AuditLog PK).
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(auditlog): SiteAuditReconciliationActor`.
|
||
|
||
#### M6-T4: `AuditLogPurgeActor` — daily partition-switch purge
|
||
|
||
> **M1 reality**: `IAuditLogRepository.SwitchOutPartitionAsync` ships in M1 as a `NotSupportedException` stub because the non-aligned `UX_AuditLog_EventId` unique index (necessary for first-write-wins idempotency without including `OccurredAtUtc` in the unique key) blocks `ALTER TABLE … SWITCH PARTITION`. **M6 must replace the stub with the drop-and-rebuild dance**: (1) `DROP INDEX UX_AuditLog_EventId ON dbo.AuditLog;` (2) create the staging table on `[PRIMARY]` with identical schema; (3) `ALTER TABLE dbo.AuditLog SWITCH PARTITION <n> TO dbo.<staging>;` (4) `DROP TABLE dbo.<staging>;` (5) `CREATE UNIQUE NONCLUSTERED INDEX UX_AuditLog_EventId ON dbo.AuditLog(EventId) ON [PRIMARY];`. The small unique-index outage window during the switch is acceptable — partition switches are O(seconds) and `InsertIfNotExistsAsync` callers will see a transient retry surface; document this in the actor.
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs` — central singleton; daily timer. For each partition whose latest `OccurredAtUtc` is older than `AuditLogOptions.RetentionDays`, call `IAuditLogRepository.SwitchOutPartitionAsync(partitionBoundary)`. Emit an `AuditLogPurged` event (logged + metricked) with partition range, row count, and duration.
|
||
- Modify: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs` — replace the M1 `NotSupportedException` stub with the drop-and-rebuild dance described above. Wrap in a transaction. Add a regression test asserting the unique index is rebuilt and the data left behind matches the un-switched partitions.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogPurgeActorTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: with retention = 30 days, partitions older than 30 days are switched out; newer partitions are kept.
|
||
2. Failing test: purge emits the `AuditLogPurged` event with correct row count.
|
||
3. Failing test: partition switch under the `scadalink_audit_purger` role completes successfully (requires the role to ALSO be granted permission to DROP/CREATE the `UX_AuditLog_EventId` index — extend the role grants in this milestone if not in M1's role definition; M1 granted `ALTER ON SCHEMA::dbo` which should cover this).
|
||
4. Failing test: post-switch, `InsertIfNotExistsAsync` continues to enforce first-write-wins (unique index successfully rebuilt).
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(auditlog): AuditLogPurgeActor with partition-switch purge (drop-and-rebuild around UX_AuditLog_EventId)`.
|
||
|
||
#### M6-T5: `AuditLogPartitionMaintenanceService` — monthly roll-forward
|
||
|
||
> **M1 reality**: the partition function `pf_AuditLog_Month` ships with 24 explicit monthly boundaries (Jan 2026 through Dec 2027) on filegroup `[PRIMARY]`. M6's hosted service must keep this rolling — split a new boundary for the upcoming month and (if a separate hot/cold filegroup strategy is adopted later) drop oldest boundaries via MERGE after purge.
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.AuditLog/Central/AuditLogPartitionMaintenanceService.cs` — `IHostedService` that runs on startup AND every month: ensures the next month's partition range exists on `pf_AuditLog_Month` and the partition scheme has a destination filegroup. Implemented via raw SQL (`ALTER PARTITION FUNCTION pf_AuditLog_Month SPLIT RANGE (<next-month-boundary>)`); ensure the scheme stays `ALL TO ([PRIMARY])` unless production deployment overrides per-filegroup.
|
||
- Create: `tests/ScadaLink.AuditLog.Tests/Central/PartitionMaintenanceServiceTests.cs` (integration via `MsSqlMigrationFixture`; runs against a temp DB).
|
||
|
||
**Steps:**
|
||
1. Failing test: against a DB seeded with the M1 migration (covering through Dec 2027), running the service in Apr 2028 splits a Jan 2028 boundary so the function has a range for "current month + at least the next month".
|
||
2. Implement.
|
||
3. Failing test: subsequent monthly runs add successive future boundaries (idempotent: already-split boundaries are no-ops, not errors).
|
||
4. Run: pass.
|
||
5. Commit: `feat(auditlog): partition maintenance HostedService (SPLIT RANGE roll-forward)`.
|
||
|
||
#### M6-T6: Health metric `SiteAuditBacklog`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — expose `GetBacklogStatsAsync()` returning `(pendingCount, oldestPendingUtc, onDiskBytes)`.
|
||
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add `SiteAuditBacklog` metric (3-tuple), populated per site-health-report tick.
|
||
- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditBacklogMetricTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: with 100 pending rows in SQLite, the metric reports `pendingCount=100`.
|
||
2. Failing test: oldest pending age is reported in seconds since `OccurredAtUtc`.
|
||
3. Failing test: on-disk bytes ≈ SQLite file size.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(health): SiteAuditBacklog metric (count + age + bytes)`.
|
||
|
||
#### M6-T7: Health metric `SiteAuditTelemetryStalled`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add boolean `SiteAuditTelemetryStalled`.
|
||
- Modify: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — set the flag when reconciliation detects a non-draining backlog over two consecutive cycles.
|
||
- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditTelemetryStalledTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: two consecutive non-draining cycles → flag set.
|
||
2. Failing test: a subsequent draining cycle → flag cleared.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(health): SiteAuditTelemetryStalled flag`.
|
||
|
||
#### M6-T8: Health metric `CentralAuditWriteFailures`
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — add `CentralAuditWriteFailures` counter.
|
||
- Modify: every `ICentralAuditWriter` call site (Inbound API middleware M4-T7, NotificationOutboxActor M4-T4/T5) — increment on caught exceptions.
|
||
- Create: `tests/ScadaLink.HealthMonitoring.Tests/CentralAuditWriteFailuresTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: 3 forced central direct-write failures → counter reports 3.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(health): CentralAuditWriteFailures metric`.
|
||
|
||
#### M6-T9: Surface `AuditRedactionFailure` in central health
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — register the counter created in M5-T7 so it appears in the central health report payload.
|
||
- Create: `tests/ScadaLink.HealthMonitoring.Tests/AuditRedactionFailureSurfacingTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: incrementing the counter is visible in the next central health snapshot.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(health): surface AuditRedactionFailure in central health`.
|
||
|
||
#### M6-T10: Integration test — central outage + reconciliation recovery
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/OutageReconciliationTests.cs` — site + central; simulate a 5-minute central gRPC outage; during outage, site emits 200 events; restore central; assert reconciliation pulls catch up within one cycle and all 200 events land in central AuditLog with no duplicates.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): outage + reconciliation recovery end-to-end`.
|
||
|
||
#### M6-T11: Integration test — partition switch purge
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionPurgeTests.cs` — pre-populate AuditLog with rows in three monthly partitions (one older than retention, two newer); trigger `AuditLogPurgeActor`; assert the oldest partition's rows are gone and newer partitions are untouched.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): partition-switch purge end-to-end`.
|
||
|
||
#### M6-T12: Integration test — partition maintenance roll-forward
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionMaintenanceTests.cs` — assert that after `AuditLogPartitionMaintenanceService` runs, the partition function covers the next month's range.
|
||
|
||
**Steps:**
|
||
1. Sketch, iterate, commit: `test(auditlog): partition maintenance roll-forward end-to-end`.
|
||
|
||
### M6 — Risk callouts
|
||
|
||
- **Partition switch on a live table:** SQL Server `ALTER TABLE ... SWITCH PARTITION` is metadata-only when source and target match in structure and filegroup; verify with a load test that ingest isn't paused during purge.
|
||
- **Pull cadence vs ingest rate:** a site producing >`BatchSize`/5s sustained may never let telemetry catch up — reconciliation must close the gap. The non-draining detection in M6-T3 is the safety net.
|
||
- **Site SQLite `ForwardState` flip after reconciliation:** must be atomic with the central ack; otherwise a site crash mid-flip can re-send rows (idempotent at central, harmless but worth noting).
|
||
- **HostedService scheduling:** ensure the partition maintenance service runs on the ACTIVE central node only (not both — would cause SQL errors trying to add the same range twice).
|
||
|
||
---
|
||
|
||
## M7 — Central UI: new Audit Log page + drill-ins + KPI tiles
|
||
|
||
**Goal:** User-visible Audit Log: filter bar, results grid (custom Blazor + Bootstrap, no third-party grid), drilldown drawer with cURL / "show all events" / redaction indicators / pretty-printed payloads. 6 drill-in entry points from existing pages. 3 KPI tiles on Health dashboard.
|
||
|
||
**Affected projects:** `CentralUI`, `CentralUI.Tests`, `CentralUI.PlaywrightTests`.
|
||
|
||
**Acceptance criteria:**
|
||
- New `Components/Pages/Audit/AuditLogPage.razor` exists; new "Audit" nav group sibling to Notifications.
|
||
- All 10 filter elements, 10 grid columns, keyset pagination + default page 100, drilldown drawer per `Component-AuditLog.md` §10.
|
||
- Existing `Components/Pages/Monitoring/AuditLog.razor` (the IAuditService config-change viewer) **renamed in code** to `ConfigurationAuditLog.razor`, with URL `/audit/configuration` to match the doc-renaming we did. Drill-ins from existing pages (Notifications, Site Calls, External Systems, Inbound API Keys, Sites, Instances) added.
|
||
- 3 KPI tiles added to the Health dashboard; data sourced from `HealthMonitoring`.
|
||
- Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on `ApiInbound` rows, drill-in from Notifications to filtered Audit Log.
|
||
- `OperationalAudit` read permission gating + `AuditExport` for the Export button.
|
||
|
||
### M7 — Tasks (TDD-detail)
|
||
|
||
#### M7-T1: New `AuditLogPage.razor` scaffold + route + Audit nav group
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor` + `.razor.cs` + `.razor.css`. Route `/audit/log`. Empty body for now beyond `<h1>Audit Log</h1>`.
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor` (or equivalent) — add a new top-level **Audit** nav group sibling to Notifications, containing this page.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPageScaffoldTests.cs` — Blazor component test (bUnit if it's used in the codebase; else Playwright).
|
||
|
||
**Steps:**
|
||
1. Failing test: navigating to `/audit/log` renders the page (heading present).
|
||
2. Failing test: nav menu shows the Audit group.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): scaffold Audit Log page + Audit nav group`.
|
||
|
||
#### M7-T2: `<AuditFilterBar>` component
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor` + `.razor.cs` — 10 filter elements per `Component-AuditLog.md` §10. Multi-select chips for Channel/Kind/Status/Site (Bootstrap custom; NO third-party UI library). Time-range relative dropdown + custom date picker. Text search for Instance/Script/Target/Actor/CorrelationId. "Errors only" toggle.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditFilterBarTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: rendering shows all 10 elements.
|
||
2. Failing test: selecting filters and clicking "Apply" raises a `FilterChanged` event with the right `AuditQuery` payload.
|
||
3. Failing test: Kind options narrow when Channels are selected.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(ui): AuditFilterBar component`.
|
||
|
||
#### M7-T3: `<AuditResultsGrid>` component with keyset paging
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor` + `.razor.cs` — custom Bootstrap table (no third-party grid). 10 columns per `Component-AuditLog.md`. Resizable + reorderable + persistable-per-user (persistence via existing user-settings store).
|
||
- Keyset paging via `(OccurredAtUtc desc, EventId desc)` cursor; default page 100.
|
||
- Data source: server-side via `IAuditLogRepository.QueryAsync` (M1-T8). Wire through a `IAuditLogQueryService` (new) that the page injects.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditResultsGridTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: grid renders rows from a stub query service; columns match the documented set.
|
||
2. Failing test: clicking "next page" calls the service with the keyset cursor of the last row.
|
||
3. Failing test: column reordering persists across navigations (user-settings).
|
||
4. Failing test: row click emits a `RowSelected` event with the selected `AuditEvent`.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(ui): AuditResultsGrid with keyset paging`.
|
||
|
||
#### M7-T4: `<AuditDrilldownDrawer>` — JSON pretty-print
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` + `.razor.cs` — slide-in drawer triggered by `RowSelected`. Renders all fields of the selected `AuditEvent`. JSON detection: if `RequestSummary` or `ResponseSummary` is valid JSON, pretty-print with indentation.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditDrilldownDrawerJsonTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: opening drawer with an event whose `RequestSummary` is valid JSON renders an indented version.
|
||
2. Failing test: non-JSON body renders verbatim.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): drilldown drawer JSON pretty-print`.
|
||
|
||
#### M7-T5: Drilldown — SQL syntax highlighting
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel=DbOutbound` events, treat `RequestSummary` as SQL; apply syntax highlighting via a lightweight client-side library (Prism.js or Highlight.js if already in the project; else a small custom highlighter — confirm during M7 brainstorm).
|
||
- Modify: `src/ScadaLink.CentralUI/wwwroot/` — add the highlighter assets if needed.
|
||
|
||
**Steps:**
|
||
1. Failing test: a `DbOutbound` event's `RequestSummary` is rendered inside a `<code class="language-sql">` block.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(ui): drilldown SQL syntax highlighting`.
|
||
|
||
#### M7-T6: Drilldown — "Copy as cURL" for ApiOutbound / ApiInbound
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel ∈ {ApiOutbound, ApiInbound}` events, render a "Copy as cURL" button. Clicking generates a cURL command from the event's URL/headers/body and copies to clipboard via `IJSRuntime`.
|
||
|
||
**Steps:**
|
||
1. Failing test: button appears only for HTTP-bearing events.
|
||
2. Failing test: clicking generates the correct cURL string (verified against a known event fixture).
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): drilldown Copy as cURL action`.
|
||
|
||
#### M7-T7: Drilldown — "Show all events for this operation"
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — when the event has a non-null `CorrelationId`, render a link "Show all events for this operation" that re-applies the page's filter set with `CorrelationId = <value>` (other filters cleared).
|
||
|
||
**Steps:**
|
||
1. Failing test: link appears only when CorrelationId is non-null.
|
||
2. Failing test: clicking re-navigates to the Audit Log page with the filter applied.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): drilldown "Show all events" by CorrelationId`.
|
||
|
||
#### M7-T8: Drilldown — redaction indicators
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` — wherever a payload contains the string `<redacted>` or `<redacted: redactor error>`, render a small badge indicating the field was redacted. Show a tooltip linking to "Payload Capture Policy" in the Component-AuditLog docs.
|
||
|
||
**Steps:**
|
||
1. Failing test: a payload with `<redacted>` shows the badge.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(ui): drilldown redaction indicators`.
|
||
|
||
#### M7-T9: Rename `AuditLog.razor` → `ConfigurationAuditLog.razor`
|
||
|
||
**Files:**
|
||
- Rename: `src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor` → `Components/Pages/Audit/ConfigurationAuditLog.razor`.
|
||
- Update: the file's `@page` directive to `/audit/configuration`.
|
||
- Update: all `<NavLink>` and any other inbound references to the old path.
|
||
- Update: tests referencing the old name.
|
||
- Modify: nav menu — sit `ConfigurationAuditLog` under the Audit group as a sibling to the new Audit Log page.
|
||
|
||
**Steps:**
|
||
1. Failing test: navigating to `/audit/configuration` renders the (renamed) page.
|
||
2. Failing test: the old `/monitoring/auditlog` returns 404 (or a redirect — choose during M7 brainstorm; redirect is safer for any external bookmarks).
|
||
3. Implement rename + path updates.
|
||
4. Run: pass.
|
||
5. Commit: `refactor(ui): rename Audit Log Viewer to Configuration Audit Log Viewer`.
|
||
|
||
#### M7-T10: Drill-in from Notifications page
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` (or row-action panel) — add "View audit history" action to each row. Navigates to `/audit/log?correlationId={NotificationId}`.
|
||
|
||
**Steps:**
|
||
1. Failing test: row action exists.
|
||
2. Failing test: click navigates with the right query string.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): drill-in from Notifications to Audit Log`.
|
||
|
||
#### M7-T11: Drill-in from Site Calls page
|
||
|
||
**Files:**
|
||
- Modify: the Site Calls listing page (or create one if missing — defer to a follow-up if it doesn't exist yet — Site Call Audit #22 UI work is mostly out of scope here). For M7 acceptance: drill-in only required from pages that exist.
|
||
- If the page exists, mirror M7-T10's pattern with `?correlationId={TrackedOperationId}`.
|
||
|
||
**Steps:**
|
||
1. Conditional on page existence — confirm during M7 brainstorm.
|
||
2. Implement.
|
||
3. Commit: `feat(ui): drill-in from Site Calls to Audit Log`.
|
||
|
||
#### M7-T12: Drill-in from External Systems / Inbound API Keys / Sites / Instances detail pages
|
||
|
||
**Files:**
|
||
- Modify (per page): External Systems detail, Inbound API Keys detail, Sites detail, Instances detail. Each gets a "Recent activity" / "Recent calls" / "Audit feed" link or tab navigating to `/audit/log` with the appropriate pre-filter (`target=<system>` / `actor=<key name> AND channel=ApiInbound` / `site=<site>` / `instance=<instance>`).
|
||
- Tests: one per drill-in.
|
||
|
||
**Steps:**
|
||
1. Failing tests per page.
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(ui): drill-ins from detail pages to Audit Log`.
|
||
|
||
#### M7-T13: 3 KPI tiles on the Health dashboard
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Health/HealthDashboard.razor` (or equivalent) — add three tiles under a new "Audit" group: Audit volume, Audit error rate, Audit backlog. Data fed from the metrics defined in M5-T7 and M6-T6/T7/T8/T9.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/Health/AuditKpiTilesTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: tiles render with stub data; clicking each navigates to the relevant Audit Log filtered view (or to a per-site breakdown for the backlog tile).
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(ui): Audit KPI tiles on Health dashboard`.
|
||
|
||
#### M7-T14: Server-side CSV export streaming
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CentralUI/Services/AuditLogExportService.cs` — accepts the current filter, streams server-side CSV via `IAuditLogRepository.QueryAsync` paged enumeration; writes to the HTTP response without buffering the whole result in memory.
|
||
- Modify: `AuditLogPage.razor` — Export button calls the service. Requires `AuditExport` permission (M7-T15).
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Services/AuditLogExportServiceTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: exporting 10,000 rows streams as CSV; memory usage stays bounded.
|
||
2. Failing test: default cap of 100k rows enforced; larger requests get a "use the CLI" error.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(ui): server-side streaming CSV export of Audit Log`.
|
||
|
||
#### M7-T15: `OperationalAudit` + `AuditExport` permission gating
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.Security/` (or wherever the role/permission model lives) — add `OperationalAudit` and `AuditExport` permissions; map them to the Audit role (existing) by default.
|
||
- Modify: `AuditLogPage.razor` — gate page access on `OperationalAudit`; gate the Export button on `AuditExport`.
|
||
- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPagePermissionTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: a user without `OperationalAudit` gets a 403 / hidden page.
|
||
2. Failing test: a user with `OperationalAudit` but no `AuditExport` can read but Export button is hidden.
|
||
3. Implement permission checks.
|
||
4. Run: pass.
|
||
5. Commit: `feat(security): OperationalAudit + AuditExport permissions for the Audit Log surface`.
|
||
|
||
#### M7-T16: Playwright E2E tests
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs` — covers: filter narrowing, drilldown drawer JSON pretty-print, "Copy as cURL" on ApiInbound, drill-in from Notifications to filtered Audit Log, CSV export end-to-end, permission gating.
|
||
|
||
**Steps:**
|
||
1. Sketch tests using the existing Playwright harness.
|
||
2. Iterate until all green.
|
||
3. Commit: `test(ui): Audit Log Playwright E2E coverage`.
|
||
|
||
### M7 — Risk callouts
|
||
|
||
- **Custom data grid scope:** keyset paging + reorderable columns + per-user persistence is non-trivial. Bench the existing `NotificationReport.razor` grid to see whether it can be generalised vs forking it. Decision during M7 brainstorm.
|
||
- **SignalR + large drawer payloads:** the drilldown payload (up to 64 KB on errors) is rendered server-side via SignalR. Confirm `MaxRecvMessageSize` is large enough; bump if needed.
|
||
- **Permission infrastructure assumptions:** confirm during M7 brainstorm that the codebase already supports per-permission gates at the page level, not just role-level. If only role-level, fall back to gating via the existing Audit role with a feature flag for the export.
|
||
- **The rename to `ConfigurationAuditLog.razor`** breaks any external bookmarks. Decide redirect vs 404 explicitly during M7 brainstorm.
|
||
|
||
---
|
||
|
||
## M8 — CLI: `scadalink audit query | export | verify-chain`
|
||
|
||
**Goal:** Operator surface for the centralized Audit Log.
|
||
|
||
**Affected projects:** `CLI`, `CLI.Tests`, `ManagementService` (new HTTP endpoint), `IntegrationTests`.
|
||
|
||
**Acceptance criteria:**
|
||
- `scadalink audit query` mirrors the UI filter set; results stream as JSON (default) or table.
|
||
- `scadalink audit export` streams server-side to CSV / JSONL / Parquet; requires `AuditExport` permission.
|
||
- `scadalink audit verify-chain --month YYYY-MM` is a no-op stub returning a "hash-chain not yet enabled in this release" message and exit code 0 (per v1.x deferral).
|
||
- Existing `audit-log query` (IAuditService config-change viewer) **renamed** in code to `audit-config query` to disambiguate; old name kept as a deprecated alias for one minor version.
|
||
- Permissions: `audit query` and `audit verify-chain` require `OperationalAudit`; `audit export` additionally requires `AuditExport`.
|
||
|
||
### M8 — Tasks (TDD-detail)
|
||
|
||
#### M8-T1: Create `AuditCommands.cs` (separate from existing `AuditLogCommands.cs`)
|
||
|
||
**Files:**
|
||
- Create: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — `static AuditCommands { public static Command Build() }` following the System.CommandLine pattern from `AuditLogCommands.cs:1–53`. Sets up the `audit` parent command with three subcommands (T2/T3/T4).
|
||
- Modify: `src/ScadaLink.CLI/Program.cs` — register `AuditCommands.Build()` alongside the existing command groups.
|
||
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditCommandsScaffoldTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `scadalink audit --help` lists three subcommands (query, export, verify-chain).
|
||
2. Implement.
|
||
3. Run: pass.
|
||
4. Commit: `feat(cli): scaffold scadalink audit command group`.
|
||
|
||
#### M8-T2: `audit query` subcommand
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `query` subcommand with the flag set matching the Central UI Audit Log filter set (post-Bundle-D fix): `--since`, `--until`, `--channel`, `--kind`, `--status`, `--site`, `--instance`, `--target`, `--actor`, `--correlation-id`, `--errors-only`, `--page`, `--page-size`. Output JSON by default; `--format table` opt-in.
|
||
- Create: `src/ScadaLink.Commons/Messages/Cli/QueryAuditLogCommand.cs` (or wherever the CLI↔Management messages live — confirm via repo).
|
||
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: parsing the documented flag set produces a `QueryAuditLogCommand` with the expected fields.
|
||
2. Failing test: `--format table` switches the output formatter.
|
||
3. Failing test: unknown flag returns non-zero exit code with a helpful error.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(cli): scadalink audit query subcommand`.
|
||
|
||
#### M8-T3: `audit export` subcommand
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `export` subcommand with flags `--since` (required), `--until` (required), `--format csv|jsonl|parquet` (required), `--output <path>` (required), `--channel`, `--kind`, `--status`, `--site`, `--target`, `--actor`.
|
||
- Create: `src/ScadaLink.Commons/Messages/Cli/ExportAuditLogCommand.cs`.
|
||
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditExportCommandTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: missing required flag returns helpful error.
|
||
2. Failing test: valid invocation creates an `ExportAuditLogCommand` with all fields.
|
||
3. Failing test: streams results to `--output`; doesn't buffer entire export in memory (test with 100k+ rows).
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `feat(cli): scadalink audit export subcommand (csv|jsonl|parquet)`.
|
||
|
||
#### M8-T4: `audit verify-chain` subcommand (no-op stub)
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `verify-chain --month <YYYY-MM>` subcommand. In v1, returns a documented "hash chain not yet enabled in this release; see Component-AuditLog.md Security & Tamper-Evidence for the v1.x roadmap" message with exit code 0.
|
||
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditVerifyChainCommandTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `scadalink audit verify-chain --month 2026-05` exits 0 with the documented message.
|
||
2. Failing test: malformed month string (e.g., `2026-13`) exits non-zero with a parse error.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(cli): scadalink audit verify-chain subcommand (v1 no-op)`.
|
||
|
||
#### M8-T5: ManagementService HTTP endpoints
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.ManagementService/Controllers/AuditController.cs` (new) — REST endpoints `GET /api/audit/query` (paged) and `GET /api/audit/export` (streaming). Both gated on `OperationalAudit` / `AuditExport` permissions (matching the UI's permission split from M7-T15).
|
||
- Create: `tests/ScadaLink.ManagementService.Tests/Controllers/AuditControllerTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `GET /api/audit/query` with valid params returns JSON page of audit events.
|
||
2. Failing test: `GET /api/audit/export` streams CSV/JSONL/Parquet without buffering.
|
||
3. Failing test: a request without `OperationalAudit` returns 403.
|
||
4. Failing test: `/export` without `AuditExport` returns 403.
|
||
5. Implement.
|
||
6. Run: pass.
|
||
7. Commit: `feat(mgmt): /api/audit/{query,export} endpoints with permission gates`.
|
||
|
||
#### M8-T6: Output formatters (JSON + table)
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/Output/` — add an `AuditEventTableFormatter` that renders results as an aligned table with sensible defaults (truncate long fields with `…`).
|
||
- The JSON formatter follows existing CLI patterns (one event per line for streaming, or array for paged results — confirm during M8 brainstorm).
|
||
- Create: `tests/ScadaLink.CLI.Tests/Output/AuditEventFormatterTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: table format includes columns: OccurredAtUtc, Channel, Kind, Status, Target, Actor, DurationMs.
|
||
2. Failing test: JSON format is one event per line.
|
||
3. Implement.
|
||
4. Run: pass.
|
||
5. Commit: `feat(cli): JSON + table formatters for audit events`.
|
||
|
||
#### M8-T7: Rename existing `audit-log query` → `audit-config query` with deprecation alias
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs` — rename the top-level command from `audit-log` to `audit-config` (clearer disambiguation from the new `audit` group). Add an alias `audit-log` that prints a deprecation warning and forwards to `audit-config` for one minor version.
|
||
- Modify: `src/ScadaLink.CLI/README.md` and CLI help text to document the rename and the deprecation timeline.
|
||
- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditConfigDeprecationTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: `scadalink audit-config query --user alice` works.
|
||
2. Failing test: `scadalink audit-log query --user alice` works but emits a deprecation warning to stderr.
|
||
3. Failing test: `scadalink audit query --since ...` (the NEW operational command) and `scadalink audit-config query --user ...` (the renamed config command) are clearly different surfaces and do not conflict.
|
||
4. Implement.
|
||
5. Run: pass.
|
||
6. Commit: `refactor(cli): rename audit-log → audit-config with deprecation alias`.
|
||
|
||
#### M8-T8: CLI README + help text updates
|
||
|
||
**Files:**
|
||
- Modify: `src/ScadaLink.CLI/README.md` — document the new `audit` group, the renamed `audit-config` group, the permission requirements, the `verify-chain` no-op note, and the CLI ↔ UI filter parity.
|
||
- Modify: each subcommand's `--help` description for clarity.
|
||
|
||
**Steps:**
|
||
1. Inline doc edits.
|
||
2. Verify `scadalink audit --help` and `scadalink audit-config --help` produce the documented output.
|
||
3. Commit: `docs(cli): document new scadalink audit group and audit-config rename`.
|
||
|
||
#### M8-T9: CLI integration test — end-to-end query + export
|
||
|
||
**Files:**
|
||
- Create: `tests/ScadaLink.IntegrationTests/Cli/AuditCliEndToEndTests.cs` — boots central with a populated AuditLog table; invokes `scadalink audit query --since ...` against the running ManagementService; asserts results match the database. Same for export.
|
||
|
||
**Steps:**
|
||
1. Sketch test using existing IntegrationTests harness.
|
||
2. Iterate until all flag combinations work end-to-end.
|
||
3. Commit: `test(cli): scadalink audit end-to-end against running ManagementService`.
|
||
|
||
### M8 — Risk callouts
|
||
|
||
- **Operator script breakage from the `audit-log` rename:** the deprecation alias is the safety net but only for one minor version; document the deprecation timeline clearly in the CLI README. Coordinate with anyone running `audit-log` in CI/cron.
|
||
- **Parquet output:** requires a Parquet writer library. If one isn't already in `Directory.Packages.props`, add the smallest viable dependency (`ParquetSharp` or `Parquet.Net`). Decide during M8 brainstorm.
|
||
- **Streaming export from CLI:** the CLI invokes the ManagementService HTTP endpoint, which itself streams. Confirm `HttpClient.SendAsync` with `HttpCompletionOption.ResponseHeadersRead` is used so the CLI doesn't buffer the whole response.
|
||
- **Permission model parity:** ensure the CLI's permission errors mirror the UI's (HTTP 403 → CLI exit code 2 with a clear message).
|
||
|
||
---
|
||
|
||
## Cross-cutting concerns (apply at every milestone)
|
||
|
||
- **Branching:** every milestone gets its own `feature/audit-log-mN-<slice>` branch; merged with `--no-ff` to `main` on milestone completion. No pushes without explicit user authorization.
|
||
- **Tests:** Every task adds tests first (failing test → impl → passing test). Existing tests must keep passing.
|
||
- **Commits:** small and frequent. Bite-sized per writing-plans skill.
|
||
- **Reviews:** per the bundling cadence in user memory — group small adjacent tasks into a single implementer dispatch, run one combined spec+quality review per bundle, then a final cross-bundle review at end of milestone.
|
||
- **Docs:** if implementation reveals a design gap, fix the design doc FIRST (in `docs/requirements/Component-AuditLog.md` and/or `alog.md`), commit, then implement. Don't let the code and docs drift.
|
||
- **Infra:** the 3 `infra/*` working-tree modifications still uncommitted on `main` are unrelated and stay that way unless the user explicitly addresses them. Use explicit `git add <path>` throughout, never `git commit -am`.
|
||
|
||
---
|
||
|
||
## Per-milestone execution flow (template)
|
||
|
||
When a milestone is about to start, run this sequence:
|
||
|
||
1. **Brainstorm**: short skill invocation to nail any code-level decisions not fixed in the spec (test fixture placement, migration helper choice, etc.).
|
||
2. **Writing-plans**: produce a milestone-specific plan with TDD detail per task — saved to `docs/plans/2026-XX-XX-auditlog-mN-<slice>.md` + peer `.tasks.json`.
|
||
3. **Subagent-driven execution**: bundle small tasks per cadence preference; per-bundle implementer + combined reviewer; cross-milestone review at end; merge to `main` with `--no-ff`.
|
||
|
||
The roadmap is the contract for what each milestone ships; the per-milestone plan is the contract for how it gets built.
|