diff --git a/docs/plans/2026-05-20-audit-log-code-roadmap.md b/docs/plans/2026-05-20-audit-log-code-roadmap.md index 575e83c..33005b7 100644 --- a/docs/plans/2026-05-20-audit-log-code-roadmap.md +++ b/docs/plans/2026-05-20-audit-log-code-roadmap.md @@ -262,22 +262,179 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Integration test in `tests/ScadaLink.IntegrationTests/` boots a site + central pair, executes a sync script that calls an external system, and asserts a corresponding row appears in the central `AuditLog` within N seconds. - No regressions in existing ExternalSystemGateway or Communication tests. -**Task headlines** (each expanded to TDD detail in its own writing-plans pass before execution): -1. Site-local `SqliteAuditWriter` implementing `IAuditWriter` — schema bootstrap, hot-path INSERT, write lock, ring-buffer fallback. Pattern from `SiteEventLogger.cs:28–98`. -2. Bounded in-memory `RingBufferFallback` that drains into the SQLite writer when health returns. -3. `SiteAuditTelemetryActor` actor — periodic drain loop (5s busy / 30s idle), batch INSERT-IF-NOT-EXISTS via gRPC, `ForwardState` transitions. -4. Extend `sitestream.proto`: add `IngestAuditEvents(stream AuditEventBatch) returns (IngestAck)`. Regenerate. Update `SiteStreamGrpcServer.cs` to handle the new RPC. -5. `AuditLogIngestActor` (central singleton) — handles ingest message, calls `IAuditLogRepository.InsertIfNotExistsAsync` per event in a single transaction. -6. Host wiring: register `SiteAuditTelemetryActor` as a site singleton on a **dedicated dispatcher** (per `alog.md` §6.2); register `AuditLogIngestActor` as a central singleton. Reference pattern at `AkkaHostedService.cs:272–280`. -7. ESG sync `Call()` emission hook — add `IAuditWriter` injection; emit `AuditEvent` (channel=ApiOutbound, kind=SyncCall) before returning. Audit-write failures never throw to the script. -8. End-to-end integration test in `IntegrationTests/AuditLog/SyncCallEmissionTests.cs` — site + central wired, script invokes ESG `Call()`, central row appears. -9. Health metric `SiteAuditWriteFailures` (this milestone defines it; M6 surfaces the tile). -10. Update `docker/deploy.sh` / `infra/reseed.sh` if needed so dev clusters can verify locally. +### M2 — Tasks (TDD-detail) -**Risk callouts:** -- Site SQLite write throughput under load — bench against existing SiteEventLogger numbers. -- gRPC additive evolution: the existing proto uses a `oneof`. Adding a new top-level RPC is safe; embedding new oneof variants is also safe. Confirm message-ordering guarantees aren't violated. -- Don't accidentally bind `SiteAuditTelemetryActor` to the same dispatcher used by script blocking I/O; that's a real perf issue (per spec). +#### M2-T1: `SqliteAuditWriter` — schema + connection bootstrap + +**Files:** +- Create: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — implements `IAuditWriter`. Constructor takes a `SqliteOptions` (path); single `SqliteConnection` per instance gated by `SemaphoreSlim(1,1)`. Calls `InitializeSchema()` on first use. Pattern from `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:28–98`. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs`. + +**Steps:** +1. Failing test: opening a writer against a `:memory:` SQLite produces an `AuditLog` table with the documented columns (the 20 central columns minus `IngestedAtUtc`, plus `ForwardState`). +2. Run: fail (class doesn't exist). +3. Implement `InitializeSchema()` with `CREATE TABLE IF NOT EXISTS AuditLog (...)`. Use SQLite column types matching the EF mapping where reasonable (`TEXT` for IDs, `INTEGER` for status enums, `BLOB` not used). +4. Run: pass. +5. Commit: `feat(auditlog): SqliteAuditWriter schema bootstrap`. + +#### M2-T2: `SqliteAuditWriter` — hot-path `WriteAsync` + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs`. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs`. + +**Steps:** +1. Failing test: `WriteAsync(event)` inserts one row with `ForwardState = Pending`. +2. Failing test: 1,000 concurrent `WriteAsync` calls all complete without exception and produce exactly 1,000 rows (write-lock correctness). +3. Run: fail. +4. Implement using a parameterized `INSERT` under `SemaphoreSlim` lock. +5. Run: pass. +6. Commit: `feat(auditlog): SqliteAuditWriter hot-path INSERT with write lock`. + +#### M2-T3: `RingBufferFallback` — in-memory fallback + +**Files:** +- Create: `src/ScadaLink.AuditLog/Site/RingBufferFallback.cs` — `Channel` with `BoundedChannelFullMode.DropOldest`, default capacity 1024. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs`. + +**Steps:** +1. Failing test: enqueueing 1,025 events into a 1,024-cap ring drops the oldest and emits a `RingBufferOverflow` notification (incrementing a passed-in counter). +2. Failing test: `DrainTo(writer)` writes all buffered events in FIFO order and clears the ring. +3. Implement. +4. Run: pass. +5. Commit: `feat(auditlog): RingBufferFallback with drop-oldest overflow`. + +#### M2-T4: `FallbackAuditWriter` — compose primary + ring behind `IAuditWriter` + +**Files:** +- Create: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` — primary writer is `SqliteAuditWriter`; on transient exception, enqueues into `RingBufferFallback` and increments `SiteAuditWriteFailures` (M2-T11). On the next successful primary write, drains the ring back through the primary. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs`. + +**Steps:** +1. Failing test: when the primary throws, the event lands in the ring and the call returns successfully. +2. Failing test: when primary writes succeed again, the ring drains in FIFO order. +3. Implement. +4. Run: pass. +5. Commit: `feat(auditlog): FallbackAuditWriter composing SQLite + ring`. + +#### M2-T5: Extend `sitestream.proto` with `IngestAuditEvents` RPC + +**Files:** +- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `message AuditEventDto { string event_id = 1; google.protobuf.Timestamp occurred_at_utc = 2; ... }` (all 20 central fields), `message AuditEventBatch { repeated AuditEventDto events = 1; }`, `message IngestAck { repeated string accepted_event_ids = 1; }`, and `rpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);` on `SiteStreamService`. +- Build: `dotnet build src/ScadaLink.Communication/` regenerates the C# stubs. +- Create: `tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs`. + +**Steps:** +1. Failing test: round-trip serialize/deserialize a populated `AuditEventDto`; assert all fields survive. +2. Edit proto; rebuild. +3. Run: pass. +4. Commit: `feat(comms): add IngestAuditEvents RPC + AuditEvent proto messages`. + +#### M2-T6: `AuditEvent` ↔ `AuditEventDto` mapper + +**Files:** +- Create: `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs` — static `ToDto(AuditEvent)` and `FromDto(AuditEventDto)`. +- Create: `tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs`. + +**Steps:** +1. Failing test: round-trip a populated `AuditEvent` through `ToDto` → `FromDto`; assert equality on all 20 columns. +2. Implement. +3. Run: pass. +4. Commit: `feat(auditlog): AuditEvent ↔ proto Dto mapper`. + +#### M2-T7: `SiteAuditTelemetryActor` — drain loop + +**Files:** +- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs` — `ReceiveActor` with a `Drain` self-tick. On `Drain`: read up to `BatchSize` `Pending` rows from SQLite; send via gRPC; mark accepted rows `Forwarded`. +- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs` — `BatchSize = 256`, `BusyIntervalSeconds = 5`, `IdleIntervalSeconds = 30`. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.cs` using `TestKit` + NSubstitute for the gRPC client. + +**Steps:** +1. Failing test: when SQLite has 50 pending rows, a `Drain` tick sends one batch via the mocked gRPC client. +2. Failing test: on ack, the corresponding rows flip to `Forwarded` in SQLite. +3. Failing test: when gRPC throws, rows stay `Pending` and the next tick retries. +4. Failing test: cadence is 5s after a tick that drained ≥1 row, 30s after a tick that drained 0. +5. Implement. +6. Run: pass. +7. Commit: `feat(auditlog): SiteAuditTelemetryActor drain loop`. + +#### M2-T8: `AuditLogIngestActor` + gRPC server handler + +**Files:** +- Create: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — `ReceiveActor` accepting `IngestAuditEventsCommand(batch)`; calls `IAuditLogRepository.InsertIfNotExistsAsync` for each event inside a single `DbContext` transaction; replies with `IngestAck(acceptedEventIds)`. +- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — implement the new `IngestAuditEvents` method as a thin gRPC↔Akka adapter (`Ask` against the central singleton's proxy, mapped to the gRPC reply). +- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs`. + +**Steps:** +1. Failing test: actor receives a batch of 5 events; repo is called 5 times; reply lists all 5 EventIds as accepted. +2. Failing test: when 2 of 5 events already exist (repo returns `Inserted = false`), the reply still lists all 5 as accepted (idempotent semantics). +3. Failing test: gRPC handler routes to actor and returns its reply. +4. Implement. +5. Run: pass. +6. Commit: `feat(auditlog): AuditLogIngestActor + gRPC server handler`. + +#### M2-T9: Host registration with dedicated dispatcher + +**Files:** +- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — alongside the existing wiring at `:272–280`, register `AuditLogIngestActor` as central singleton and `SiteAuditTelemetryActor` as site singleton bound to `audit-telemetry-dispatcher`. Manager + proxy pair for both. +- Modify: Host HOCON (likely `src/ScadaLink.Host/Configuration/akka.conf` or similar) — add `audit-telemetry-dispatcher { type = ForkJoinDispatcher; parallelism-min = 1; parallelism-max = 2; }`. +- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register actor `Props` factories so Host can resolve them. +- Create: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs`. + +**Steps:** +1. Failing test: starting the host with the audit module loaded produces healthy `IActorRef` proxies for both singletons. +2. Failing test: `SiteAuditTelemetryActor` is bound to `audit-telemetry-dispatcher` (assert via Akka actor cell inspection or via a known-good dispatcher-tagged behaviour). +3. Implement. +4. Run: pass. +5. Commit: `feat(host): register AuditLog singletons with dedicated dispatcher`. + +#### M2-T10: ESG `ExternalSystemClient.CallAsync` audit emission + +**Files:** +- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs` (sync `CallAsync` around line 45–70) — inject `IAuditWriter` via constructor. After the call completes (success OR exception), build an `AuditEvent` (channel=`ApiOutbound`, kind=`SyncCall`, status from outcome, `DurationMs`, `HttpStatus`, target = system+method, provenance from `ScriptExecutionContext`). Call `_auditWriter.WriteAsync(evt)` inside a `try`/`catch` that swallows + logs + increments `SiteAuditWriteFailures`. +- Modify: `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs` — accept `IAuditWriter` from DI. +- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientAuditEmissionTests.cs`. + +**Steps:** +1. Failing test: sync `CallAsync` success → exactly one event with `Status=Success`, `Channel=ApiOutbound`, `Kind=SyncCall`. +2. Failing test: sync `CallAsync` HTTP 500 → `Status=TransientFailure`, `HttpStatus=500`. +3. Failing test: sync `CallAsync` HTTP 400 → `Status=PermanentFailure`, `HttpStatus=400`. +4. Failing test: when `IAuditWriter.WriteAsync` throws, the script call still completes normally and the script sees the original (non-audit) result. +5. Implement. +6. Run: pass. +7. Commit: `feat(esg): emit ApiOutbound.SyncCall audit event on every sync call`. + +#### M2-T11: `SiteAuditWriteFailures` health metric + +**Files:** +- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add a `SiteAuditWriteFailures` counter; expose it in the site health report payload. +- Modify: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` (M2-T4) — accept `IHealthMetrics` (or the project's existing health counter abstraction) and increment per failed primary write. +- Create: `tests/ScadaLink.AuditLog.Tests/Site/SiteAuditWriteFailuresMetricTests.cs`. + +**Steps:** +1. Failing test: 3 simulated SQLite failures → counter reports 3 in the next snapshot. +2. Implement. +3. Run: pass. +4. Commit: `feat(health): SiteAuditWriteFailures metric`. + +#### M2-T12: End-to-end integration test + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/SyncCallEmissionTests.cs` — boots a site + central pair via the existing IntegrationTests harness; deploys a tiny script that calls a stub external system; asserts the central `AuditLog` table has exactly one row with the expected channel/kind/status within 10s. +- Possibly modify: `infra/reseed.sh` if integration tests need a fresh AuditLog table per run. + +**Steps:** +1. Sketch the test using existing IntegrationTests fixtures. +2. Run: fail somewhere (gaps in earlier tasks surface here). +3. Iterate fixes back through M2-T1..M2-T11 until end-to-end passes. +4. Commit: `test(auditlog): end-to-end sync call emission integration test`. + +### M2 — Risk callouts + +- **SiteStream proto evolution:** adding a new top-level RPC is wire-compatible; confirm generated `Sitestream.cs` rebuilds cleanly and existing tests still pass. +- **Dedicated dispatcher misconfiguration:** if `SiteAuditTelemetryActor` lands on the script blocking-I/O dispatcher, scripts will starve during telemetry bursts. Add a runtime assertion in `M2-T9` that the actor's dispatcher matches expectation. +- **Script execution context plumbing:** ESG emission (M2-T10) needs `SourceInstanceId` / `SourceScript`; confirm these are reachable via the existing `ScriptExecutionContext` (or equivalent in SiteRuntime) before starting M2-T10. +- **Integration-test DB isolation:** target an isolated MS SQL database (or a dedicated schema) so the test doesn't clash with other integration tests. --- @@ -297,21 +454,250 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Retry attempt → `Kind = CachedAttempt` audit row + `SiteCalls` status transition. Terminal → `Kind = CachedTerminal` audit row + `SiteCalls` terminal status. - Integration test asserts: triggering a `CachedCall` that fails transient-then-succeeds produces 3 AuditLog rows + 1 SiteCalls row with `Status = Delivered`, all sharing the same `TrackedOperationId` correlation key. -**Task headlines:** -1. `TrackedOperationId` GUID newtype in Commons. -2. Site-local SQLite operation-tracking table + repo (matches `alog.md` cached-call tracking design). -3. `CachedCallTelemetry` Commons message carrying both operational fields and `AuditEvent` payload. -4. `SiteCalls` MS SQL table + EF mapping + migration + `ISiteCallAuditRepository` + repo impl. -5. `SiteCallAuditActor` skeleton (singleton, central) — receives telemetry, owns `SiteCalls` upsert via repo. -6. Extend `AuditLogIngestActor` to detect combined telemetry and execute both writes (`AuditLog` insert + `SiteCalls` upsert) in a single `DbContext` transaction. -7. ESG `CachedCall()` emission — produce combined telemetry on every lifecycle transition (enqueue, attempt, terminal). -8. Extend gRPC proto with the combined-telemetry RPC if it's distinct from `IngestAuditEvents`, or fold it into the existing one with a discriminator field (decision in milestone brainstorm). -9. Integration test in `IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs`. +### M3 — Tasks (TDD-detail) -**Risk callouts:** -- Combined telemetry packet evolution: design the packet so future cached audit-kind additions are non-breaking (oneof or open-field map). -- Single transaction at central spans two tables; ensure connection retry behaviour is correct. -- Idempotency: AuditLog dedups on `EventId`; SiteCalls dedups on `TrackedOperationId`. If telemetry retries and AuditLog already has the row, ensure SiteCalls upsert still runs (no short-circuit). +#### M3-T1: `TrackedOperationId` strong-typed ID + +**Files:** +- Create: `src/ScadaLink.Commons/Types/TrackedOperationId.cs` — readonly record struct wrapping `Guid`; `New()` / `Parse(string)` / `ToString()`. +- Create: `tests/ScadaLink.Commons.Tests/Types/TrackedOperationIdTests.cs`. + +**Steps:** +1. Failing test: round-trip via `ToString()` / `Parse()` and equality semantics. +2. Implement. +3. Run: pass. +4. Commit: `feat(commons): TrackedOperationId strong type`. + +#### M3-T2: Site-local operation-tracking SQLite table + repo + +**Files:** +- Create: `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs` — SQLite-backed store with columns: `TrackedOperationId`, `Kind`, `TargetSummary`, `Status`, `RetryCount`, `LastError`, `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`, source provenance. Schema bootstrap on first use; uses the same write-lock pattern as `SqliteAuditWriter`. Implements `IOperationTrackingStore` (interface in Commons). +- Create: `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs` — `RecordEnqueueAsync`, `RecordAttemptAsync`, `RecordTerminalAsync`, `GetStatusAsync(TrackedOperationId)`, `PurgeTerminalAsync(olderThanUtc)`. +- Create: `tests/ScadaLink.SiteRuntime.Tests/Tracking/OperationTrackingStoreTests.cs`. + +**Steps:** +1. Failing test: schema bootstrap creates the table. +2. Failing test: `RecordEnqueueAsync` inserts a `Pending` row; `RecordAttemptAsync` updates `Status`/`RetryCount`/`LastError`; `RecordTerminalAsync` finalises. +3. Failing test: `GetStatusAsync` returns the latest snapshot (answers `Tracking.Status(id)` site-locally). +4. Failing test: `PurgeTerminalAsync` removes terminal rows older than threshold; non-terminal rows are kept regardless of age. +5. Implement. +6. Run: pass. +7. Commit: `feat(siteruntime): site-local operation tracking SQLite store`. + +#### M3-T3: `Tracking.Status(id)` API surface in SiteRuntime + +**Files:** +- Modify: `src/ScadaLink.SiteRuntime/Scripting/TrackingApi.cs` (new or existing — confirm via repo) — public `Status(TrackedOperationId)` method routed through `IOperationTrackingStore`. +- Modify: script trust-model allow-list to include the new `Tracking.*` surface (confirm via grep). +- Create: `tests/ScadaLink.SiteRuntime.Tests/Scripting/TrackingApiTests.cs`. + +**Steps:** +1. Failing test: `Tracking.Status(unknownId)` returns a documented "not found" sentinel. +2. Failing test: `Tracking.Status(knownId)` returns the latest snapshot. +3. Implement. +4. Run: pass. +5. Commit: `feat(siteruntime): Tracking.Status(id) script API`. + +#### M3-T4: `CachedCallTelemetry` Commons message — carries both operational + audit content + +**Files:** +- Create: `src/ScadaLink.Commons/Messages/Integration/CachedCallTelemetry.cs` — fields: `TrackedOperationId`, `Kind` (`CachedEnqueued`/`CachedAttempt`/`CachedTerminal` audit kind), operational status, retry count, last error, timestamps, and a nested `AuditEvent` carrying the audit row content. Documented as additive-only per Commons REQ-COM-5a. +- Create: `tests/ScadaLink.Commons.Tests/Messages/Integration/CachedCallTelemetryTests.cs`. + +**Steps:** +1. Failing test: construct a telemetry packet for each of the three lifecycle kinds; verify the nested AuditEvent's channel/kind alignment (e.g., a `CachedAttempt` packet must carry an `AuditEvent` with `Kind = CachedAttempt`). +2. Failing test: serialization round-trip preserves both layers. +3. Implement. +4. Run: pass. +5. Commit: `feat(commons): CachedCallTelemetry carrying combined operational + audit content`. + +#### M3-T5: `SiteCalls` MS SQL table — EF mapping + +**Files:** +- Create: `src/ScadaLink.Commons/Entities/Audit/SiteCall.cs` — POCO record per Component-SiteCallAudit.md. +- Create: `src/ScadaLink.ConfigurationDatabase/Entities/SiteCallEntityTypeConfiguration.cs` — `IEntityTypeConfiguration` with PK on `TrackedOperationId`, indexes on `(SourceSite, CreatedAtUtc)` and `(Status, UpdatedAtUtc)`. +- Modify: `ScadaLinkDbContext.cs` — `public DbSet SiteCalls => Set();`. +- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Entities/SiteCallEntityTypeConfigurationTests.cs`. + +**Steps:** +1. Failing test: model exposes `SiteCalls` table with documented columns and indexes. +2. Implement. +3. Run: pass. +4. Commit: `feat(configdb): map SiteCall to SiteCalls table`. + +#### M3-T6: `SiteCalls` migration + +**Files:** +- Create: `src/ScadaLink.ConfigurationDatabase/Migrations/_AddSiteCallsTable.cs` via `dotnet ef migrations add AddSiteCallsTable`. +- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/AddSiteCallsTableMigrationTests.cs`. + +**Steps:** +1. Failing test: applying the migration creates the `SiteCalls` table with PK + indexes. +2. Generate + adjust migration. +3. Run: pass. +4. Commit: `feat(configdb): add SiteCalls migration`. + +#### M3-T7: `ISiteCallAuditRepository` + EF impl + +**Files:** +- Create: `src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs` — `UpsertAsync(SiteCall)` (insert-if-not-exists by `TrackedOperationId`, otherwise update-on-newer-status using monotonic status progression), `GetAsync(TrackedOperationId)`, `QueryAsync(filter, paging)`, `PurgeTerminalAsync(olderThanUtc)`. +- Create: `src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs`. +- Modify: `ServiceCollectionExtensions.cs` — register. +- Create: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs`. + +**Steps:** +1. Failing test: first `UpsertAsync` inserts; second `UpsertAsync` with an advanced status updates; an `UpsertAsync` with an older status is a no-op (monotonic progression). +2. Failing test: paged query supports the documented filter set. +3. Implement. +4. Run: pass. +5. Commit: `feat(configdb): ISiteCallAuditRepository + EF impl`. + +#### M3-T8: `SiteCallAuditActor` skeleton (central singleton) + +**Files:** +- Create: `src/ScadaLink.SiteCallAudit/` (new project) — `SiteCallAuditActor.cs` + `ScadaLink.SiteCallAudit.csproj` + `ServiceCollectionExtensions.cs`. Actor handles `UpsertSiteCallCommand` messages by calling `ISiteCallAuditRepository.UpsertAsync`. Note: full reconciliation, KPIs, and Retry/Discard relay are explicitly deferred — this is the minimum-viable surface for M3. +- Modify: `ScadaLink.slnx` to include the new project. +- Create: `tests/ScadaLink.SiteCallAudit.Tests/SiteCallAuditActorTests.cs`. + +**Steps:** +1. Failing test: actor receives `UpsertSiteCallCommand`, calls repo, replies with ack. +2. Failing test: actor swallows transient DB errors and surfaces them as health metrics (does NOT crash the central singleton). +3. Implement. +4. Run: pass. +5. Commit: `feat(scaudit): SiteCallAuditActor minimum viable surface`. + +#### M3-T9: Extend `sitestream.proto` with `IngestCachedTelemetry` RPC OR extend `IngestAuditEvents` + +**Files:** +- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — preferred approach: add a new top-level RPC `rpc IngestCachedTelemetry(CachedTelemetryBatch) returns (IngestAck);` and a `message CachedTelemetryPacket { AuditEventDto audit_event = 1; SiteCallOperationalDto operational = 2; }` plus `message CachedTelemetryBatch { repeated CachedTelemetryPacket packets = 1; }`. Decision should be confirmed during M3's brainstorm. +- Build to regenerate. +- Create: `tests/ScadaLink.Communication.Tests/Protos/CachedTelemetryProtoTests.cs`. + +**Steps:** +1. Failing test: round-trip a populated `CachedTelemetryPacket`. +2. Add proto + rebuild. +3. Run: pass. +4. Commit: `feat(comms): IngestCachedTelemetry RPC + combined telemetry messages`. + +#### M3-T10: Extend `AuditLogIngestActor` for combined telemetry — dual-write transaction + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — add a handler for the cached telemetry message. Inside a **single `DbContext` transaction**: (a) call `IAuditLogRepository.InsertIfNotExistsAsync(auditEvent)`, then (b) call `ISiteCallAuditRepository.UpsertAsync(operationalState)`. Both must succeed or both must roll back. +- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — route the new RPC to the central actor. +- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorCombinedTelemetryTests.cs`. + +**Steps:** +1. Failing test: a single combined packet produces one AuditLog row AND one SiteCalls row (or upsert). +2. Failing test: when the SiteCalls upsert throws, the AuditLog insert is rolled back (no orphan rows). +3. Failing test: when the AuditLog insert is a no-op (duplicate `EventId`), the SiteCalls upsert still runs. +4. Failing test: when both rows already exist with monotonic-equal statuses, the operation is a no-op overall (full idempotency). +5. Implement. +6. Run: pass. +7. Commit: `feat(auditlog): combined telemetry dual-write transaction`. + +#### M3-T11: ESG `CachedCallAsync` — emit `CachedEnqueued` on enqueue + +**Files:** +- Modify: `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:75–136` (cached call) — at the moment of buffering into S&F: build an `AuditEvent` (channel=`ApiOutbound`, kind=`CachedEnqueued`) AND a `SiteCallOperationalDto` (status=`Pending`); package as a `CachedTelemetryPacket`; hand to the combined-telemetry forwarder. +- Modify: `src/ScadaLink.ExternalSystemGateway/Cached/CachedCallTelemetryForwarder.cs` (new) — accumulates packets and posts to `SiteAuditTelemetryActor` (or a sibling actor — decision in milestone brainstorm). +- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedCallEnqueueEmissionTests.cs`. + +**Steps:** +1. Failing test: an enqueued cached call produces exactly one packet with `kind=CachedEnqueued`. +2. Implement. +3. Run: pass. +4. Commit: `feat(esg): CachedCall emits CachedEnqueued combined telemetry on buffering`. + +#### M3-T12: ESG `CachedCallAsync` — emit `CachedAttempt` per retry + +**Files:** +- Modify: `src/ScadaLink.StoreAndForward/` retry loop (locate the per-attempt callback site) to emit a `CachedAttempt` packet on each attempt (success OR transient failure). +- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallAttemptEmissionTests.cs`. + +**Steps:** +1. Failing test: an attempt that returns HTTP 500 produces a packet with `kind=CachedAttempt`, `status=TransientFailure`, `HttpStatus=500`. +2. Failing test: a successful attempt produces a packet with `kind=CachedAttempt`, `status=Success`, `HttpStatus=200`. +3. Implement. +4. Run: pass. +5. Commit: `feat(snf): CachedCall emits CachedAttempt per retry`. + +#### M3-T13: ESG `CachedCallAsync` — emit `CachedTerminal` on terminal state + +**Files:** +- Modify: same retry-loop terminal-transition site — on `Delivered` / `Failed` / `Parked` / `Discarded`, emit one final `CachedTerminal` packet. +- Create: `tests/ScadaLink.StoreAndForward.Tests/CachedCallTerminalEmissionTests.cs`. + +**Steps:** +1. Failing test: a cached call that succeeds on attempt 3 produces (in order): 1 `CachedEnqueued`, 3 `CachedAttempt`, 1 `CachedTerminal` (with `status=Delivered`). +2. Failing test: a cached call that exhausts retries produces a final `CachedTerminal` with `status=Parked`. +3. Implement. +4. Run: pass. +5. Commit: `feat(snf): CachedCall emits CachedTerminal on lifecycle terminal`. + +#### M3-T14: `Database.CachedWrite` — mirror the three-lifecycle emission for DB cached writes + +**Files:** +- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or equivalent — confirm via repo) — same three-event emission pattern as ESG cached calls, but `channel=DbOutbound`. +- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/CachedWriteLifecycleEmissionTests.cs`. + +**Steps:** +1. Failing test: a `CachedWrite` that succeeds first try produces `CachedEnqueued` + `CachedAttempt(Success)` + `CachedTerminal(Delivered)`. +2. Failing test: a `CachedWrite` with transient retry mirrors the ESG pattern. +3. Implement. +4. Run: pass. +5. Commit: `feat(esg): Database.CachedWrite emits three-lifecycle combined telemetry`. + +#### M3-T15: Host registration — `SiteCallAuditActor` central singleton + +**Files:** +- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — register `SiteCallAuditActor` central singleton + proxy alongside `AuditLogIngestActor`. +- Modify: `src/ScadaLink.SiteCallAudit/ServiceCollectionExtensions.cs` — register actor props. +- Modify: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs` — extend to assert `SiteCallAuditActor` proxy resolves. + +**Steps:** +1. Failing test: starting host produces the new singleton's proxy. +2. Implement. +3. Run: pass. +4. Commit: `feat(host): register SiteCallAuditActor central singleton`. + +#### M3-T16: Integration test — cached external call audit (end-to-end) + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedCallCombinedTelemetryTests.cs` — site + central; stub external system returns 500 twice then 200; script invokes `ExternalSystem.CachedCall("System","Method", args)`; assert AuditLog has 5 rows (Enqueued + 3 Attempts + Terminal) AND SiteCalls has 1 row with `Status=Delivered` AND `Tracking.Status(id)` reports the same. + +**Steps:** +1. Sketch test against IntegrationTests harness. +2. Run: fail (likely surfacing earlier-task gaps). +3. Iterate fixes until pass. +4. Commit: `test(auditlog): cached call combined telemetry end-to-end`. + +#### M3-T17: Integration test — cached DB write audit (end-to-end) + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CachedWriteCombinedTelemetryTests.cs` — mirror M3-T16 against the DB cached path. + +**Steps:** +1. Sketch. +2. Iterate. +3. Commit: `test(auditlog): cached DB write combined telemetry end-to-end`. + +#### M3-T18: Idempotency test — duplicate telemetry doesn't double-insert / double-upsert + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/CombinedTelemetryIdempotencyTests.cs` — force the same packet to arrive twice (simulated telemetry retry); assert AuditLog still has exactly one row and SiteCalls upsert is monotonic. + +**Steps:** +1. Sketch. +2. Pass. +3. Commit: `test(auditlog): combined telemetry idempotency on retried packets`. + +### M3 — Risk callouts + +- **Combined telemetry packet evolution:** design the proto so future cached audit-kind additions are non-breaking (avoid `oneof` for fields you'll extend; use sparse field numbers). +- **Dual-write transaction failure modes:** the single `DbContext` transaction at central spans two tables; ensure retry behaviour on transient connection errors works as expected (existing `IDbExecutionStrategy` patterns may apply). +- **Idempotency cross-table:** AuditLog dedups on `EventId`, SiteCalls dedups on `TrackedOperationId` with status-monotonic update. A retried packet whose AuditLog row exists must still upsert SiteCalls (no short-circuit). +- **Scope discipline:** M3 inlines the *minimum* surface for #22 and cached-call tracking. Full #22 reconciliation, KPIs, and Retry/Discard relay are deferred. Note in the milestone brainstorm whether any extra #22 surface is genuinely needed for M3 acceptance criteria — if not, defer aggressively. +- **`Tracking.Status` semantics:** confirmed authoritative site-locally per design; no central round-trip. Ensure the test in M3-T3 reflects this. --- @@ -327,17 +713,159 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Notification Outbox dispatcher: every delivery attempt writes `Notification.Attempt`; terminal writes `Notification.Terminal`. Site-emitted `Notification.Enqueued` flows through the standard site→central audit path. Audit-write failure never affects delivery. - Inbound API middleware writes one `ApiInbound.Completed` row per request, before `await next()` returns. API key NAME captured (never material). Audit-write failure does NOT change the HTTP response. -**Task headlines:** -1. ESG `Database.Connection()` execute hook — wrap `Execute*` / `ExecuteScalar` / `ExecuteReader` to emit before/after audit events. -2. `Database.CachedWrite` combined-telemetry emission (mirror M3's ESG cached path). -3. NotificationOutboxActor extension — inject `ICentralAuditWriter`; write `Notification.Attempt` per dispatcher attempt; write `Notification.Terminal` on terminal transitions; never abort on failure. -4. Site-emitted `Notification.Enqueued` — when a script calls `Notify.To().Send()` (site-side via Store-and-Forward), emit a site audit row (`Notification.Enqueued`); telemetry forwards as usual. -5. Inbound API middleware: new `AuditWriteMiddleware` in `src/ScadaLink.InboundAPI/Middleware/` writing `ApiInbound.Completed` before response flush; register in the ASP.NET pipeline. -6. Tests: emission unit tests per call mode, plus 4 integration tests (one per channel). +### M4 — Tasks (TDD-detail) -**Risk callouts:** -- Inbound API: correlation-id generation needs to be consistent with any upstream tracing headers (W3C `traceparent` if present). -- Notification dispatcher: confirm `ICentralAuditWriter` errors are logged but don't block the dispatch loop. +#### M4-T1: ESG `Database.Connection().ExecuteAsync` audit emission — `DbOutbound.SyncWrite` + +**Files:** +- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` (or wherever the script-facing `Execute*` lives — confirm via repo) — wrap the call site to emit an `AuditEvent` (channel=`DbOutbound`, kind=`SyncWrite`) on every `Execute`/`ExecuteScalar`. Capture statement text, parameter values (default; redaction in M5), `DurationMs`, `rowsAffected` in `Extra`. +- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncWriteEmissionTests.cs`. + +**Steps:** +1. Failing test: `Execute("INSERT INTO ...", new {...})` emits one event with `Channel=DbOutbound`, `Kind=SyncWrite`, statement text + parameter values captured. +2. Failing test: `ExecuteScalar` emits the same kind. +3. Failing test: execute that throws → emission with `Status=PermanentFailure`, `ErrorMessage` populated. +4. Failing test: audit-write failure does NOT abort the SQL call (script sees the original outcome). +5. Implement. +6. Run: pass. +7. Commit: `feat(esg): emit DbOutbound.SyncWrite on script-initiated Execute*`. + +#### M4-T2: ESG `Database.Connection().ExecuteReaderAsync` audit emission — `DbOutbound.SyncRead` + +**Files:** +- Modify: `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs` — wrap `ExecuteReader` to emit `DbOutbound.SyncRead`. Capture statement, parameter values, `DurationMs`, `rowsReturned` in `Extra`. Response body capture defaults to NOT including rows; opt-in via per-connection config (M5). +- Create: `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseSyncReadEmissionTests.cs`. + +**Steps:** +1. Failing test: `Query("SELECT ...")` emits one event with `Channel=DbOutbound`, `Kind=SyncRead`. +2. Failing test: `rowsReturned` appears in `Extra`. +3. Implement. +4. Run: pass. +5. Commit: `feat(esg): emit DbOutbound.SyncRead on script-initiated reads`. + +#### M4-T3: NotificationOutboxActor — inject `ICentralAuditWriter` + +**Files:** +- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:22–68` — constructor accepts `ICentralAuditWriter`. Wire into DI in `ServiceCollectionExtensions.cs`. +- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAuditInjectionTests.cs`. + +**Steps:** +1. Failing test: actor's `Props` factory accepts an `ICentralAuditWriter`; constructor stores it. +2. Implement. +3. Run: pass. +4. Commit: `feat(notif): NotificationOutboxActor accepts ICentralAuditWriter`. + +#### M4-T4: NotificationOutboxActor — emit `Notification.Attempt` per dispatcher attempt + +**Files:** +- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` dispatcher attempt branch (after each delivery attempt resolves) — emit `Notification.Attempt` row with `Status` mapped from attempt result (`Success`, `TransientFailure`, `PermanentFailure`). +- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorAttemptEmissionTests.cs`. + +**Steps:** +1. Failing test: a successful attempt → exactly one event with `Kind=Attempt`, `Status=Success`. +2. Failing test: a transient-failure attempt → `Status=TransientFailure`, `ErrorMessage` populated. +3. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the dispatcher's per-attempt `Notifications` row update STILL succeeds (audit must never block delivery). +4. Implement. +5. Run: pass. +6. Commit: `feat(notif): emit Notification.Attempt per dispatcher attempt`. + +#### M4-T5: NotificationOutboxActor — emit `Notification.Terminal` on terminal transition + +**Files:** +- Modify: `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs` terminal branches (`Delivered` / `Parked` / `Discarded` transitions) — emit `Notification.Terminal` row. +- Create: `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorTerminalEmissionTests.cs`. + +**Steps:** +1. Failing test: a notification that succeeds emits one `Terminal` event with `Status=Delivered`. +2. Failing test: a Parked transition emits `Status=Parked`. +3. Failing test: an operator Discard emits `Status=Discarded`. +4. Implement. +5. Run: pass. +6. Commit: `feat(notif): emit Notification.Terminal on terminal transitions`. + +#### M4-T6: Site-emitted `Notification.Enqueued` + +**Files:** +- Modify: `src/ScadaLink.NotificationService/` (or wherever the site-side `Notify.To().Send()` runs — confirm via repo) — at the moment of buffering into the site S&F: emit a site-side `AuditEvent` (channel=`Notification`, kind=`Enqueued`) via `IAuditWriter`. Telemetry forwards as usual. +- Create: `tests/ScadaLink.NotificationService.Tests/NotifyEnqueueAuditEmissionTests.cs`. + +**Steps:** +1. Failing test: `Notify.To("list").Send("subject", "body")` emits one event with `Channel=Notification`, `Kind=Enqueued`, target=list name, body captured (subject too). +2. Failing test: audit-write failure does not abort `Send()`. +3. Implement. +4. Run: pass. +5. Commit: `feat(notif): emit Notification.Enqueued from site-side Notify.Send`. + +#### M4-T7: Inbound API — `AuditWriteMiddleware` + +**Files:** +- Create: `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs` — ASP.NET Core middleware. After `await next()` (so the response is fully resolved but BEFORE flush — using `HttpResponse.OnStarting` or buffered body), build an `AuditEvent` (channel=`ApiInbound`, kind=`Completed`, `Actor`=API key NAME from request context, `Target`=method name, `HttpStatus`, `DurationMs`, `RequestSummary`/`ResponseSummary`). Call `ICentralAuditWriter.WriteAsync` inside `try`/`catch` — failures never affect the response. +- Modify: `src/ScadaLink.InboundAPI/Startup.cs` (or wherever the pipeline is configured) — register middleware. +- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/AuditWriteMiddlewareTests.cs`. + +**Steps:** +1. Failing test: a successful POST to `/api/{method}` produces one `ApiInbound.Completed` event with `HttpStatus=200`. +2. Failing test: a 400/401/500 response produces an event with the matching `HttpStatus` and `Status` mapped (`PermanentFailure` for 4xx, `TransientFailure` for 5xx). +3. Failing test: `Actor` carries the API key NAME (never the key material). +4. Failing test: when `ICentralAuditWriter.WriteAsync` throws, the HTTP response is unchanged (success stays success). +5. Failing test: request remote IP and User-Agent appear in `Extra`. +6. Implement. +7. Run: pass. +8. Commit: `feat(inbound): AuditWriteMiddleware emitting ApiInbound.Completed per request`. + +#### M4-T8: Register middleware in the ASP.NET pipeline + +**Files:** +- Modify: `src/ScadaLink.InboundAPI/Startup.cs` / `Program.cs` — `app.UseMiddleware()` placed AFTER auth (so `Actor` resolves) and BEFORE the script-execution handler. +- Create: `tests/ScadaLink.InboundAPI.Tests/Middleware/MiddlewareOrderTests.cs`. + +**Steps:** +1. Failing test: pipeline ordering puts AuditWrite after auth, before script execution. +2. Implement. +3. Run: pass. +4. Commit: `feat(inbound): register AuditWriteMiddleware in pipeline`. + +#### M4-T9: Integration test — DB sync emission + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/DatabaseSyncEmissionTests.cs` — script invokes `Database.Connection().Execute("INSERT ...")` and `Query("SELECT ...")`; assert central AuditLog has one `DbOutbound.SyncWrite` row and one `DbOutbound.SyncRead` row. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): DB sync emission integration test`. + +#### M4-T10: Integration test — Notify dispatcher audit trail + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/NotifyDispatcherAuditTrailTests.cs` — script calls `Notify.To(list).Send(...)`; stub SMTP returns transient then success; assert AuditLog has `Enqueued` + 2 `Attempt` (one transient, one success) + 1 `Terminal(Delivered)`. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): Notify dispatcher audit trail end-to-end`. + +#### M4-T11: Integration test — Inbound API request audit + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/InboundApiAuditTests.cs` — POST to `/api/{method}` with a valid API key; assert one `ApiInbound.Completed` row with the expected `Actor` (key name), `HttpStatus=200`, request/response bodies captured. +- Also test: POST with a bad API key → row with `Actor=NULL` (or ""), `HttpStatus=401`, `Extra` carries `remoteIp`. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): Inbound API request audit end-to-end`. + +#### M4-T12: Integration test — audit-write failure never aborts the action + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/AuditWriteFailureSafetyTests.cs` — inject a broken `ICentralAuditWriter` (always throws) for one test; assert that ESG sync calls, ESG cached calls, DB writes, Inbound API calls, and Notification dispatch all still complete successfully and the script/caller sees the normal outcome. + +**Steps:** +1. Sketch test with broken-writer DI override per scenario. +2. Run, fix any spots where audit-write exceptions leak. +3. Commit: `test(auditlog): audit failures never abort user-facing actions`. + +### M4 — Risk callouts + +- **Inbound API correlation IDs:** if upstream tracing headers (W3C `traceparent`) are present, prefer them as `CorrelationId`; otherwise generate. Confirm whether existing middleware sets a request ID we can reuse. +- **`AuditWriteMiddleware` placement:** must run AFTER authentication so the API key NAME is in `HttpContext.User`. Verify with the middleware-order test in M4-T8. +- **Notification dispatcher loop hot-path:** audit emission must NOT extend per-attempt latency materially. Bench in M4-T10 if there's any concern. +- **DB parameter capture:** parameter values are captured verbatim by default (per design); redaction is opt-in (M5). For M4, just capture — don't try to second-guess what's sensitive. --- @@ -352,16 +880,145 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Configuration test: changing `appsettings.json` redactors changes runtime behaviour (no rebuild needed for regex changes). - Bench: 95th-percentile audit emission latency on the hot path stays under N µs at default cap (target to be set during M5 brainstorm). -**Task headlines:** -1. `IAuditPayloadFilter` + default implementation (header redaction, body regex, SQL parameter redaction, safety net). -2. Wire the filter into the emission paths (M2, M3, M4 emitters all call through the filter before handing the `AuditEvent` to the writer). -3. `appsettings.json` schema for the filter (already prepared in M1-T9; M5 plugs the runtime in). -4. Tests: redaction unit tests with known-bad payloads (passwords in JSON, `Authorization` headers, SQL params named `@apikey`). -5. Performance test in `tests/ScadaLink.PerformanceTests/` for the hot-path latency budget. +### M5 — Tasks (TDD-detail) -**Risk callouts:** -- Regex performance — pre-compile and cache patterns; reject patterns that take too long to compile. -- Don't redact post-truncation if the truncation cut a redaction target in half. +#### M5-T1: `IAuditPayloadFilter` interface + +**Files:** +- Create: `src/ScadaLink.AuditLog/Payload/IAuditPayloadFilter.cs` — single method `AuditEvent Apply(AuditEvent rawEvent)` that returns a filtered copy (truncation + redaction applied). +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/PayloadFilterContractTests.cs`. + +**Steps:** +1. Failing test: interface exists, method signature matches. +2. Implement. +3. Run: pass. +4. Commit: `feat(auditlog): IAuditPayloadFilter contract`. + +#### M5-T2: `DefaultAuditPayloadFilter` — truncation (default + error cap) + +**Files:** +- Create: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — composes `TruncationStage` + redactors (M5-T3/T4/T5). Truncation rule: default cap = `AuditLogOptions.DefaultCapBytes` (8 KB); error cap = `ErrorCapBytes` (64 KB) applied when `Status` is NOT in {`Success`, `Delivered`, `Enqueued`}. UTF-8 byte-safe boundary (no mid-character cuts). Set `PayloadTruncated = true` when applied. +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/TruncationTests.cs`. + +**Steps:** +1. Failing test: 10 KB success body → truncated to 8 KB; `PayloadTruncated = true`. +2. Failing test: 10 KB body on `Status=TransientFailure` → not truncated (under 64 KB cap); `PayloadTruncated = false`. +3. Failing test: 70 KB body on `Status=PermanentFailure` → truncated to 64 KB; `PayloadTruncated = true`. +4. Failing test: multi-byte UTF-8 character that would straddle the cap is not split mid-character. +5. Implement. +6. Run: pass. +7. Commit: `feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety`. + +#### M5-T3: HTTP header redaction + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add header-redaction stage. Strips header values for names in `AuditLogOptions.HeaderRedactList` (default: `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`) and any matching configured regex. Replacement: ``. +- Headers travel in `RequestSummary` / `ResponseSummary` (JSON of headers + body) OR in `Extra` — confirm format during M5 brainstorm and document. +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/HeaderRedactionTests.cs`. + +**Steps:** +1. Failing test: `Authorization: Bearer xyz` in `RequestSummary` becomes `Authorization: `. +2. Failing test: case-insensitive match (`authorization` redacted too). +3. Failing test: custom redact-list extension works (operator adds `X-Custom-Token`). +4. Implement. +5. Run: pass. +6. Commit: `feat(auditlog): HTTP header redaction`. + +#### M5-T4: Body regex redaction with safety net + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — add body-regex stage. Global redactors apply to all bodies; per-target redactors apply to matching `Target`. Patterns precompiled at startup; rejected if compile takes >100ms. +- Safety net: if a regex throws at runtime, replace the body with `` and increment `AuditRedactionFailure` (M5-T7). +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/BodyRegexRedactionTests.cs`. + +**Steps:** +1. Failing test: `"password":"hunter2"` in a JSON body → `"password":""` when the default global redactor pattern matches. +2. Failing test: per-target redactor only applies to matching `Target`. +3. Failing test: a redactor that throws → body becomes `` AND the counter increments. +4. Failing test: catastrophic backtracking regex rejected at startup. +5. Implement. +6. Run: pass. +7. Commit: `feat(auditlog): body regex redaction with over-redaction safety net`. + +#### M5-T5: SQL parameter redaction (per-connection opt-in) + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — for `Channel=DbOutbound` events, parse `Extra.params` and redact parameter VALUES whose NAME matches the connection's configured regex (from `AuditLogOptions.PerTargetOverrides[].RedactSqlParamsMatching`). +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/SqlParamRedactionTests.cs`. + +**Steps:** +1. Failing test: no opt-in config → params captured verbatim (default behaviour). +2. Failing test: opt-in regex `@apikey|@token` redacts those param VALUES but keeps OTHER param values intact. +3. Failing test: regex applies to parameter NAMES (not values) and is case-insensitive. +4. Implement. +5. Run: pass. +6. Commit: `feat(auditlog): per-connection SQL parameter redaction (opt-in)`. + +#### M5-T6: Wire filter into emission paths + +**Files:** +- Modify: ESG (M2-T10, M3-T11/12/13, M4-T1/T2), InboundAPI middleware (M4-T7), NotificationOutbox (M4-T4/T5), NotificationService site path (M4-T6) — every emission site receives `IAuditPayloadFilter` from DI and calls `filter.Apply(rawEvent)` before handing to the writer. +- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs` — register `DefaultAuditPayloadFilter` as `IAuditPayloadFilter` singleton. +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/FilterIntegrationTests.cs` — assert each emitter calls through the filter before the writer. + +**Steps:** +1. Failing test: ESG emission writes the filter-applied event (not the raw one). +2. Failing test: same for each other emitter. +3. Implement by injecting the filter into each emitter and routing through it. +4. Run: pass. +5. Commit: `feat(auditlog): wire payload filter into all emission paths`. + +#### M5-T7: `AuditRedactionFailure` health metric + +**Files:** +- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` (or equivalent) — add `AuditRedactionFailure` counter. +- Modify: `src/ScadaLink.AuditLog/Payload/DefaultAuditPayloadFilter.cs` — increment on every redactor exception. +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/AuditRedactionFailureMetricTests.cs`. + +**Steps:** +1. Failing test: 5 redactor exceptions → counter shows 5. +2. Implement. +3. Run: pass. +4. Commit: `feat(health): AuditRedactionFailure metric`. + +#### M5-T8: Configuration test — `appsettings.json` round-trip + +**Files:** +- Create: `tests/ScadaLink.AuditLog.Tests/Configuration/AuditLogOptionsBindingTests.cs` — bind a realistic `appsettings.json` block (with header-redact list, body redactors, per-target overrides, retention) and assert values appear in `IOptions`. Re-bind with a hot-reload simulation and assert filter behaviour changes accordingly. + +**Steps:** +1. Failing test: bind + read → matches. +2. Failing test: change config → filter behaviour updates without restart (`IOptionsMonitor` pattern). +3. Implement (likely needs adjusting M1-T9 from `IOptions` to `IOptionsMonitor`). +4. Run: pass. +5. Commit: `feat(auditlog): hot-reloadable AuditLogOptions`. + +#### M5-T9: Performance test — hot-path latency budget + +**Files:** +- Create: `tests/ScadaLink.PerformanceTests/AuditLog/HotPathLatencyTests.cs` — bench `filter.Apply(event)` for a 4 KB JSON body with the default redactor set; target P95 < 50 µs (number set during M5 brainstorm based on baseline measurements). Also bench `SqliteAuditWriter.WriteAsync` end-to-end target P95 < 500 µs. + +**Steps:** +1. Sketch test using BenchmarkDotNet or the existing performance test harness. +2. Run baseline; if over budget, profile + optimise. +3. Commit: `test(auditlog): hot-path latency budget`. + +#### M5-T10: Safety-net test — bad regex over-redacts + +**Files:** +- Create: `tests/ScadaLink.AuditLog.Tests/Payload/RedactionSafetyNetTests.cs` — register a deliberately bad regex that throws; assert the body is over-redacted (``) rather than under-redacted (passing through unmodified). + +**Steps:** +1. Failing test. +2. Verify the safety net from M5-T4 covers this. +3. Commit: `test(auditlog): redaction safety net over-redacts on regex failure`. + +### M5 — Risk callouts + +- **Regex catastrophic backtracking:** validate patterns at startup with a short-running compile test; reject patterns that exceed a timeout. Document the rejection behaviour. +- **Order of stages matters:** truncation BEFORE redaction means a redaction target halfway through the cap could get cut. Confirm the chosen order during M5 brainstorm; current draft applies redaction FIRST, then truncation — that way the redacted-replacement text is what gets truncated, not a half-secret. +- **Body capture format:** decide whether headers travel in `RequestSummary`/`ResponseSummary` or `Extra`. Affects M5-T3's redaction strategy. Lock during the M5 brainstorm. +- **Hot-reload semantics:** `IOptionsMonitor` snapshots — ensure pre-compiled regex cache invalidates when config changes. --- @@ -378,17 +1035,157 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - 5 new health metrics published per site: `SiteAuditBacklog` (count + oldest + bytes), `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`; and per central node: `CentralAuditWriteFailures`, `AuditRedactionFailure`. - Integration test: simulated 5-minute central outage → telemetry catches up after recovery via reconciliation, no rows lost; site backlog metric reflects the queue depth and drops as it drains. -**Task headlines:** -1. `PullAuditEvents` RPC on the existing `SiteStream` gRPC server. -2. `SiteAuditReconciliationActor` actor with timer + per-site `LastReconciledAt` cursor. -3. `AuditLogPurgeActor` actor with daily schedule, partition-switch logic via `IAuditLogRepository.SwitchOutPartitionAsync`. -4. Partition-roll-forward helper (raw SQL `migrationBuilder.Sql` equivalent at runtime — likely a `HostedService` that runs once at startup and once per month). -5. Health metric publishing per emitter; integrate with the existing `SiteHealthState` / `CentralHealthAggregator` plumbing. -6. Integration tests for outage/recovery + purge. +### M6 — Tasks (TDD-detail) -**Risk callouts:** -- Partition switch on an active table — ensure online schema operations don't block ingest; document the window if a brief lock is unavoidable. -- Reconciliation can produce duplicate `Forwarded` ↔ `Reconciled` state flips; ensure idempotency at site SQLite layer. +#### M6-T1: Extend `sitestream.proto` with `PullAuditEvents` RPC + +**Files:** +- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add `rpc PullAuditEvents(PullAuditEventsRequest) returns (PullAuditEventsResponse);` and the corresponding request/response messages (`sinceUtc`, `batchSize`, `events`, `more_available`). +- Build: regenerate stubs. +- Create: `tests/ScadaLink.Communication.Tests/Protos/PullAuditEventsProtoTests.cs`. + +**Steps:** +1. Failing test: round-trip request and response messages. +2. Add proto + rebuild. +3. Run: pass. +4. Commit: `feat(comms): PullAuditEvents RPC for audit reconciliation`. + +#### M6-T2: Site-side handler for `PullAuditEvents` + +**Files:** +- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` (the site-side server inside each site cluster) — handle `PullAuditEvents` by reading `Pending` rows older than `SinceUtc` from `SqliteAuditWriter` (read-only path) and streaming them back. After ack, mark them `Reconciled`. +- Create: `tests/ScadaLink.Communication.Tests/SiteStreamPullAuditEventsTests.cs`. + +**Steps:** +1. Failing test: a pull request with N pending rows returns those rows; rows flip to `Reconciled` after the response is acked. +2. Implement. +3. Run: pass. +4. Commit: `feat(comms): site-side PullAuditEvents handler`. + +#### M6-T3: `SiteAuditReconciliationActor` — central, timer-driven + +**Files:** +- Create: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — central singleton; on a 5-minute timer (configurable), for each known site, ask: "what's your oldest `Pending` row?" If the site reports a non-draining backlog (compared with the previous tick), issue a `PullAuditEvents` and ingest the returned rows via `IAuditLogRepository.InsertIfNotExistsAsync`. Keeps a per-site `LastReconciledAt` cursor. +- Create: `tests/ScadaLink.AuditLog.Tests/Central/SiteAuditReconciliationActorTests.cs`. + +**Steps:** +1. Failing test: actor's timer fires every 5 minutes (test via `TestKit` virtual scheduler). +2. Failing test: when site reports non-draining backlog over two consecutive ticks, the actor issues a pull and ingests results. +3. Failing test: idempotency — re-running the pull doesn't double-insert (relies on AuditLog PK). +4. Implement. +5. Run: pass. +6. Commit: `feat(auditlog): SiteAuditReconciliationActor`. + +#### M6-T4: `AuditLogPurgeActor` — daily partition-switch purge + +**Files:** +- Create: `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs` — central singleton; daily timer. For each partition whose latest `OccurredAtUtc` is older than `AuditLogOptions.RetentionDays`, call `IAuditLogRepository.SwitchOutPartitionAsync(partitionBoundary)`. Emit an `AuditLogPurged` event (logged + metricked) with partition range, row count, and duration. +- Create: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogPurgeActorTests.cs`. + +**Steps:** +1. Failing test: with retention = 30 days, partitions older than 30 days are switched out; newer partitions are kept. +2. Failing test: purge emits the `AuditLogPurged` event with correct row count. +3. Failing test: partition switch under the `scadalink_audit_purger` role completes successfully. +4. Implement. +5. Run: pass. +6. Commit: `feat(auditlog): AuditLogPurgeActor with partition-switch purge`. + +#### M6-T5: `AuditLogPartitionMaintenanceService` — monthly roll-forward + +**Files:** +- Create: `src/ScadaLink.AuditLog/Central/AuditLogPartitionMaintenanceService.cs` — `IHostedService` that runs on startup AND every month: ensures the next month's partition range exists on `pf_AuditLog_Month` and the partition scheme has a destination filegroup. Implemented via raw SQL (`ALTER PARTITION FUNCTION ... SPLIT RANGE`). +- Create: `tests/ScadaLink.AuditLog.Tests/Central/PartitionMaintenanceServiceTests.cs` (integration; runs against a temp DB). + +**Steps:** +1. Failing test: after service runs, the partition function has ranges covering "current month + next month". +2. Implement. +3. Run: pass. +4. Commit: `feat(auditlog): partition maintenance HostedService`. + +#### M6-T6: Health metric `SiteAuditBacklog` + +**Files:** +- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — expose `GetBacklogStatsAsync()` returning `(pendingCount, oldestPendingUtc, onDiskBytes)`. +- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add `SiteAuditBacklog` metric (3-tuple), populated per site-health-report tick. +- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditBacklogMetricTests.cs`. + +**Steps:** +1. Failing test: with 100 pending rows in SQLite, the metric reports `pendingCount=100`. +2. Failing test: oldest pending age is reported in seconds since `OccurredAtUtc`. +3. Failing test: on-disk bytes ≈ SQLite file size. +4. Implement. +5. Run: pass. +6. Commit: `feat(health): SiteAuditBacklog metric (count + age + bytes)`. + +#### M6-T7: Health metric `SiteAuditTelemetryStalled` + +**Files:** +- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add boolean `SiteAuditTelemetryStalled`. +- Modify: `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs` — set the flag when reconciliation detects a non-draining backlog over two consecutive cycles. +- Create: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditTelemetryStalledTests.cs`. + +**Steps:** +1. Failing test: two consecutive non-draining cycles → flag set. +2. Failing test: a subsequent draining cycle → flag cleared. +3. Implement. +4. Run: pass. +5. Commit: `feat(health): SiteAuditTelemetryStalled flag`. + +#### M6-T8: Health metric `CentralAuditWriteFailures` + +**Files:** +- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — add `CentralAuditWriteFailures` counter. +- Modify: every `ICentralAuditWriter` call site (Inbound API middleware M4-T7, NotificationOutboxActor M4-T4/T5) — increment on caught exceptions. +- Create: `tests/ScadaLink.HealthMonitoring.Tests/CentralAuditWriteFailuresTests.cs`. + +**Steps:** +1. Failing test: 3 forced central direct-write failures → counter reports 3. +2. Implement. +3. Run: pass. +4. Commit: `feat(health): CentralAuditWriteFailures metric`. + +#### M6-T9: Surface `AuditRedactionFailure` in central health + +**Files:** +- Modify: `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs` — register the counter created in M5-T7 so it appears in the central health report payload. +- Create: `tests/ScadaLink.HealthMonitoring.Tests/AuditRedactionFailureSurfacingTests.cs`. + +**Steps:** +1. Failing test: incrementing the counter is visible in the next central health snapshot. +2. Implement. +3. Run: pass. +4. Commit: `feat(health): surface AuditRedactionFailure in central health`. + +#### M6-T10: Integration test — central outage + reconciliation recovery + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/OutageReconciliationTests.cs` — site + central; simulate a 5-minute central gRPC outage; during outage, site emits 200 events; restore central; assert reconciliation pulls catch up within one cycle and all 200 events land in central AuditLog with no duplicates. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): outage + reconciliation recovery end-to-end`. + +#### M6-T11: Integration test — partition switch purge + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionPurgeTests.cs` — pre-populate AuditLog with rows in three monthly partitions (one older than retention, two newer); trigger `AuditLogPurgeActor`; assert the oldest partition's rows are gone and newer partitions are untouched. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): partition-switch purge end-to-end`. + +#### M6-T12: Integration test — partition maintenance roll-forward + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/AuditLog/PartitionMaintenanceTests.cs` — assert that after `AuditLogPartitionMaintenanceService` runs, the partition function covers the next month's range. + +**Steps:** +1. Sketch, iterate, commit: `test(auditlog): partition maintenance roll-forward end-to-end`. + +### M6 — Risk callouts + +- **Partition switch on a live table:** SQL Server `ALTER TABLE ... SWITCH PARTITION` is metadata-only when source and target match in structure and filegroup; verify with a load test that ingest isn't paused during purge. +- **Pull cadence vs ingest rate:** a site producing >`BatchSize`/5s sustained may never let telemetry catch up — reconciliation must close the gap. The non-draining detection in M6-T3 is the safety net. +- **Site SQLite `ForwardState` flip after reconciliation:** must be atomic with the central ack; otherwise a site crash mid-flip can re-send rows (idempotent at central, harmless but worth noting). +- **HostedService scheduling:** ensure the partition maintenance service runs on the ACTIVE central node only (not both — would cause SQL errors trying to add the same range twice). --- @@ -406,20 +1203,220 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Playwright tests cover: filter narrowing, drilldown drawer, "Copy as cURL" on `ApiInbound` rows, drill-in from Notifications to filtered Audit Log. - `OperationalAudit` read permission gating + `AuditExport` for the Export button. -**Task headlines:** -1. New `Components/Pages/Audit/AuditLogPage.razor` + matching `.razor.cs` code-behind + `.razor.css`. -2. Custom Blazor `` component (multi-select chips for Channel/Kind/Status, autocomplete for Instance/Script). -3. Custom Blazor `` component — keyset paging via `QueryAsync` repository method (M1-T8). -4. `` component — JSON pretty-print, SQL syntax highlight, "Copy as cURL", "Show all events" CorrelationId filter. -5. Rename existing `AuditLog.razor` → `ConfigurationAuditLog.razor` + update routes + update internal links. -6. Drill-in additions to 6 existing pages. -7. 3 KPI tile components on Health dashboard. -8. Server-side CSV export (streaming) with `AuditExport` permission check. -9. Playwright E2E tests. +### M7 — Tasks (TDD-detail) -**Risk callouts:** -- Permission check at the page level needs to align with the existing role/permission infrastructure (Security #10). -- Keyset paging across partitioned table needs the right index; M1's `IX_AuditLog_OccurredAtUtc` is the supporting index. +#### M7-T1: New `AuditLogPage.razor` scaffold + route + Audit nav group + +**Files:** +- Create: `src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor` + `.razor.cs` + `.razor.css`. Route `/audit/log`. Empty body for now beyond `

Audit Log

`. +- Modify: `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor` (or equivalent) — add a new top-level **Audit** nav group sibling to Notifications, containing this page. +- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPageScaffoldTests.cs` — Blazor component test (bUnit if it's used in the codebase; else Playwright). + +**Steps:** +1. Failing test: navigating to `/audit/log` renders the page (heading present). +2. Failing test: nav menu shows the Audit group. +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): scaffold Audit Log page + Audit nav group`. + +#### M7-T2: `` component + +**Files:** +- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor` + `.razor.cs` — 10 filter elements per `Component-AuditLog.md` §10. Multi-select chips for Channel/Kind/Status/Site (Bootstrap custom; NO third-party UI library). Time-range relative dropdown + custom date picker. Text search for Instance/Script/Target/Actor/CorrelationId. "Errors only" toggle. +- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditFilterBarTests.cs`. + +**Steps:** +1. Failing test: rendering shows all 10 elements. +2. Failing test: selecting filters and clicking "Apply" raises a `FilterChanged` event with the right `AuditQuery` payload. +3. Failing test: Kind options narrow when Channels are selected. +4. Implement. +5. Run: pass. +6. Commit: `feat(ui): AuditFilterBar component`. + +#### M7-T3: `` component with keyset paging + +**Files:** +- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor` + `.razor.cs` — custom Bootstrap table (no third-party grid). 10 columns per `Component-AuditLog.md`. Resizable + reorderable + persistable-per-user (persistence via existing user-settings store). +- Keyset paging via `(OccurredAtUtc desc, EventId desc)` cursor; default page 100. +- Data source: server-side via `IAuditLogRepository.QueryAsync` (M1-T8). Wire through a `IAuditLogQueryService` (new) that the page injects. +- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditResultsGridTests.cs`. + +**Steps:** +1. Failing test: grid renders rows from a stub query service; columns match the documented set. +2. Failing test: clicking "next page" calls the service with the keyset cursor of the last row. +3. Failing test: column reordering persists across navigations (user-settings). +4. Failing test: row click emits a `RowSelected` event with the selected `AuditEvent`. +5. Implement. +6. Run: pass. +7. Commit: `feat(ui): AuditResultsGrid with keyset paging`. + +#### M7-T4: `` — JSON pretty-print + +**Files:** +- Create: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` + `.razor.cs` — slide-in drawer triggered by `RowSelected`. Renders all fields of the selected `AuditEvent`. JSON detection: if `RequestSummary` or `ResponseSummary` is valid JSON, pretty-print with indentation. +- Create: `tests/ScadaLink.CentralUI.Tests/Components/Audit/AuditDrilldownDrawerJsonTests.cs`. + +**Steps:** +1. Failing test: opening drawer with an event whose `RequestSummary` is valid JSON renders an indented version. +2. Failing test: non-JSON body renders verbatim. +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): drilldown drawer JSON pretty-print`. + +#### M7-T5: Drilldown — SQL syntax highlighting + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel=DbOutbound` events, treat `RequestSummary` as SQL; apply syntax highlighting via a lightweight client-side library (Prism.js or Highlight.js if already in the project; else a small custom highlighter — confirm during M7 brainstorm). +- Modify: `src/ScadaLink.CentralUI/wwwroot/` — add the highlighter assets if needed. + +**Steps:** +1. Failing test: a `DbOutbound` event's `RequestSummary` is rendered inside a `` block. +2. Implement. +3. Run: pass. +4. Commit: `feat(ui): drilldown SQL syntax highlighting`. + +#### M7-T6: Drilldown — "Copy as cURL" for ApiOutbound / ApiInbound + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — for `Channel ∈ {ApiOutbound, ApiInbound}` events, render a "Copy as cURL" button. Clicking generates a cURL command from the event's URL/headers/body and copies to clipboard via `IJSRuntime`. + +**Steps:** +1. Failing test: button appears only for HTTP-bearing events. +2. Failing test: clicking generates the correct cURL string (verified against a known event fixture). +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): drilldown Copy as cURL action`. + +#### M7-T7: Drilldown — "Show all events for this operation" + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor.cs` — when the event has a non-null `CorrelationId`, render a link "Show all events for this operation" that re-applies the page's filter set with `CorrelationId = ` (other filters cleared). + +**Steps:** +1. Failing test: link appears only when CorrelationId is non-null. +2. Failing test: clicking re-navigates to the Audit Log page with the filter applied. +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): drilldown "Show all events" by CorrelationId`. + +#### M7-T8: Drilldown — redaction indicators + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Audit/AuditDrilldownDrawer.razor` — wherever a payload contains the string `` or ``, render a small badge indicating the field was redacted. Show a tooltip linking to "Payload Capture Policy" in the Component-AuditLog docs. + +**Steps:** +1. Failing test: a payload with `` shows the badge. +2. Implement. +3. Run: pass. +4. Commit: `feat(ui): drilldown redaction indicators`. + +#### M7-T9: Rename `AuditLog.razor` → `ConfigurationAuditLog.razor` + +**Files:** +- Rename: `src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor` → `Components/Pages/Audit/ConfigurationAuditLog.razor`. +- Update: the file's `@page` directive to `/audit/configuration`. +- Update: all `` and any other inbound references to the old path. +- Update: tests referencing the old name. +- Modify: nav menu — sit `ConfigurationAuditLog` under the Audit group as a sibling to the new Audit Log page. + +**Steps:** +1. Failing test: navigating to `/audit/configuration` renders the (renamed) page. +2. Failing test: the old `/monitoring/auditlog` returns 404 (or a redirect — choose during M7 brainstorm; redirect is safer for any external bookmarks). +3. Implement rename + path updates. +4. Run: pass. +5. Commit: `refactor(ui): rename Audit Log Viewer to Configuration Audit Log Viewer`. + +#### M7-T10: Drill-in from Notifications page + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor` (or row-action panel) — add "View audit history" action to each row. Navigates to `/audit/log?correlationId={NotificationId}`. + +**Steps:** +1. Failing test: row action exists. +2. Failing test: click navigates with the right query string. +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): drill-in from Notifications to Audit Log`. + +#### M7-T11: Drill-in from Site Calls page + +**Files:** +- Modify: the Site Calls listing page (or create one if missing — defer to a follow-up if it doesn't exist yet — Site Call Audit #22 UI work is mostly out of scope here). For M7 acceptance: drill-in only required from pages that exist. +- If the page exists, mirror M7-T10's pattern with `?correlationId={TrackedOperationId}`. + +**Steps:** +1. Conditional on page existence — confirm during M7 brainstorm. +2. Implement. +3. Commit: `feat(ui): drill-in from Site Calls to Audit Log`. + +#### M7-T12: Drill-in from External Systems / Inbound API Keys / Sites / Instances detail pages + +**Files:** +- Modify (per page): External Systems detail, Inbound API Keys detail, Sites detail, Instances detail. Each gets a "Recent activity" / "Recent calls" / "Audit feed" link or tab navigating to `/audit/log` with the appropriate pre-filter (`target=` / `actor= AND channel=ApiInbound` / `site=` / `instance=`). +- Tests: one per drill-in. + +**Steps:** +1. Failing tests per page. +2. Implement. +3. Run: pass. +4. Commit: `feat(ui): drill-ins from detail pages to Audit Log`. + +#### M7-T13: 3 KPI tiles on the Health dashboard + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Pages/Health/HealthDashboard.razor` (or equivalent) — add three tiles under a new "Audit" group: Audit volume, Audit error rate, Audit backlog. Data fed from the metrics defined in M5-T7 and M6-T6/T7/T8/T9. +- Create: `tests/ScadaLink.CentralUI.Tests/Pages/Health/AuditKpiTilesTests.cs`. + +**Steps:** +1. Failing test: tiles render with stub data; clicking each navigates to the relevant Audit Log filtered view (or to a per-site breakdown for the backlog tile). +2. Implement. +3. Run: pass. +4. Commit: `feat(ui): Audit KPI tiles on Health dashboard`. + +#### M7-T14: Server-side CSV export streaming + +**Files:** +- Create: `src/ScadaLink.CentralUI/Services/AuditLogExportService.cs` — accepts the current filter, streams server-side CSV via `IAuditLogRepository.QueryAsync` paged enumeration; writes to the HTTP response without buffering the whole result in memory. +- Modify: `AuditLogPage.razor` — Export button calls the service. Requires `AuditExport` permission (M7-T15). +- Create: `tests/ScadaLink.CentralUI.Tests/Services/AuditLogExportServiceTests.cs`. + +**Steps:** +1. Failing test: exporting 10,000 rows streams as CSV; memory usage stays bounded. +2. Failing test: default cap of 100k rows enforced; larger requests get a "use the CLI" error. +3. Implement. +4. Run: pass. +5. Commit: `feat(ui): server-side streaming CSV export of Audit Log`. + +#### M7-T15: `OperationalAudit` + `AuditExport` permission gating + +**Files:** +- Modify: `src/ScadaLink.Security/` (or wherever the role/permission model lives) — add `OperationalAudit` and `AuditExport` permissions; map them to the Audit role (existing) by default. +- Modify: `AuditLogPage.razor` — gate page access on `OperationalAudit`; gate the Export button on `AuditExport`. +- Create: `tests/ScadaLink.CentralUI.Tests/Pages/AuditLogPagePermissionTests.cs`. + +**Steps:** +1. Failing test: a user without `OperationalAudit` gets a 403 / hidden page. +2. Failing test: a user with `OperationalAudit` but no `AuditExport` can read but Export button is hidden. +3. Implement permission checks. +4. Run: pass. +5. Commit: `feat(security): OperationalAudit + AuditExport permissions for the Audit Log surface`. + +#### M7-T16: Playwright E2E tests + +**Files:** +- Create: `tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs` — covers: filter narrowing, drilldown drawer JSON pretty-print, "Copy as cURL" on ApiInbound, drill-in from Notifications to filtered Audit Log, CSV export end-to-end, permission gating. + +**Steps:** +1. Sketch tests using the existing Playwright harness. +2. Iterate until all green. +3. Commit: `test(ui): Audit Log Playwright E2E coverage`. + +### M7 — Risk callouts + +- **Custom data grid scope:** keyset paging + reorderable columns + per-user persistence is non-trivial. Bench the existing `NotificationReport.razor` grid to see whether it can be generalised vs forking it. Decision during M7 brainstorm. +- **SignalR + large drawer payloads:** the drilldown payload (up to 64 KB on errors) is rendered server-side via SignalR. Confirm `MaxRecvMessageSize` is large enough; bump if needed. +- **Permission infrastructure assumptions:** confirm during M7 brainstorm that the codebase already supports per-permission gates at the page level, not just role-level. If only role-level, fall back to gating via the existing Audit role with a feature flag for the export. +- **The rename to `ConfigurationAuditLog.razor`** breaks any external bookmarks. Decide redirect vs 404 explicitly during M7 brainstorm. --- @@ -436,16 +1433,135 @@ The design for both is merged on `main` (`alog.md` cached-call tracking section; - Existing `audit-log query` (IAuditService config-change viewer) **renamed** in code to `audit-config query` to disambiguate; old name kept as a deprecated alias for one minor version. - Permissions: `audit query` and `audit verify-chain` require `OperationalAudit`; `audit export` additionally requires `AuditExport`. -**Task headlines:** -1. New `AuditCommands.cs` (separate file from `AuditLogCommands.cs` — the latter stays for the renamed config audit). -2. Build the three subcommands with their flag sets (per CLI doc & `alog.md` §15.1, post-Bundle-D fix). -3. ManagementService HTTP endpoints backing each subcommand. -4. Output formatters (JSON, table) reused from existing CLI patterns. -5. CLI integration tests in `tests/ScadaLink.CLI.Tests/` + `tests/ScadaLink.IntegrationTests/`. -6. Update CLI README + help text. +### M8 — Tasks (TDD-detail) -**Risk callouts:** -- The CLI rename (`audit-log query` → `audit-config query`) breaks any operator scripts; provide a deprecation alias and document the migration. +#### M8-T1: Create `AuditCommands.cs` (separate from existing `AuditLogCommands.cs`) + +**Files:** +- Create: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — `static AuditCommands { public static Command Build() }` following the System.CommandLine pattern from `AuditLogCommands.cs:1–53`. Sets up the `audit` parent command with three subcommands (T2/T3/T4). +- Modify: `src/ScadaLink.CLI/Program.cs` — register `AuditCommands.Build()` alongside the existing command groups. +- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditCommandsScaffoldTests.cs`. + +**Steps:** +1. Failing test: `scadalink audit --help` lists three subcommands (query, export, verify-chain). +2. Implement. +3. Run: pass. +4. Commit: `feat(cli): scaffold scadalink audit command group`. + +#### M8-T2: `audit query` subcommand + +**Files:** +- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `query` subcommand with the flag set matching the Central UI Audit Log filter set (post-Bundle-D fix): `--since`, `--until`, `--channel`, `--kind`, `--status`, `--site`, `--instance`, `--target`, `--actor`, `--correlation-id`, `--errors-only`, `--page`, `--page-size`. Output JSON by default; `--format table` opt-in. +- Create: `src/ScadaLink.Commons/Messages/Cli/QueryAuditLogCommand.cs` (or wherever the CLI↔Management messages live — confirm via repo). +- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs`. + +**Steps:** +1. Failing test: parsing the documented flag set produces a `QueryAuditLogCommand` with the expected fields. +2. Failing test: `--format table` switches the output formatter. +3. Failing test: unknown flag returns non-zero exit code with a helpful error. +4. Implement. +5. Run: pass. +6. Commit: `feat(cli): scadalink audit query subcommand`. + +#### M8-T3: `audit export` subcommand + +**Files:** +- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `export` subcommand with flags `--since` (required), `--until` (required), `--format csv|jsonl|parquet` (required), `--output ` (required), `--channel`, `--kind`, `--status`, `--site`, `--target`, `--actor`. +- Create: `src/ScadaLink.Commons/Messages/Cli/ExportAuditLogCommand.cs`. +- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditExportCommandTests.cs`. + +**Steps:** +1. Failing test: missing required flag returns helpful error. +2. Failing test: valid invocation creates an `ExportAuditLogCommand` with all fields. +3. Failing test: streams results to `--output`; doesn't buffer entire export in memory (test with 100k+ rows). +4. Implement. +5. Run: pass. +6. Commit: `feat(cli): scadalink audit export subcommand (csv|jsonl|parquet)`. + +#### M8-T4: `audit verify-chain` subcommand (no-op stub) + +**Files:** +- Modify: `src/ScadaLink.CLI/Commands/AuditCommands.cs` — add `verify-chain --month ` subcommand. In v1, returns a documented "hash chain not yet enabled in this release; see Component-AuditLog.md Security & Tamper-Evidence for the v1.x roadmap" message with exit code 0. +- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditVerifyChainCommandTests.cs`. + +**Steps:** +1. Failing test: `scadalink audit verify-chain --month 2026-05` exits 0 with the documented message. +2. Failing test: malformed month string (e.g., `2026-13`) exits non-zero with a parse error. +3. Implement. +4. Run: pass. +5. Commit: `feat(cli): scadalink audit verify-chain subcommand (v1 no-op)`. + +#### M8-T5: ManagementService HTTP endpoints + +**Files:** +- Modify: `src/ScadaLink.ManagementService/Controllers/AuditController.cs` (new) — REST endpoints `GET /api/audit/query` (paged) and `GET /api/audit/export` (streaming). Both gated on `OperationalAudit` / `AuditExport` permissions (matching the UI's permission split from M7-T15). +- Create: `tests/ScadaLink.ManagementService.Tests/Controllers/AuditControllerTests.cs`. + +**Steps:** +1. Failing test: `GET /api/audit/query` with valid params returns JSON page of audit events. +2. Failing test: `GET /api/audit/export` streams CSV/JSONL/Parquet without buffering. +3. Failing test: a request without `OperationalAudit` returns 403. +4. Failing test: `/export` without `AuditExport` returns 403. +5. Implement. +6. Run: pass. +7. Commit: `feat(mgmt): /api/audit/{query,export} endpoints with permission gates`. + +#### M8-T6: Output formatters (JSON + table) + +**Files:** +- Modify: `src/ScadaLink.CLI/Output/` — add an `AuditEventTableFormatter` that renders results as an aligned table with sensible defaults (truncate long fields with `…`). +- The JSON formatter follows existing CLI patterns (one event per line for streaming, or array for paged results — confirm during M8 brainstorm). +- Create: `tests/ScadaLink.CLI.Tests/Output/AuditEventFormatterTests.cs`. + +**Steps:** +1. Failing test: table format includes columns: OccurredAtUtc, Channel, Kind, Status, Target, Actor, DurationMs. +2. Failing test: JSON format is one event per line. +3. Implement. +4. Run: pass. +5. Commit: `feat(cli): JSON + table formatters for audit events`. + +#### M8-T7: Rename existing `audit-log query` → `audit-config query` with deprecation alias + +**Files:** +- Modify: `src/ScadaLink.CLI/Commands/AuditLogCommands.cs` — rename the top-level command from `audit-log` to `audit-config` (clearer disambiguation from the new `audit` group). Add an alias `audit-log` that prints a deprecation warning and forwards to `audit-config` for one minor version. +- Modify: `src/ScadaLink.CLI/README.md` and CLI help text to document the rename and the deprecation timeline. +- Create: `tests/ScadaLink.CLI.Tests/Commands/AuditConfigDeprecationTests.cs`. + +**Steps:** +1. Failing test: `scadalink audit-config query --user alice` works. +2. Failing test: `scadalink audit-log query --user alice` works but emits a deprecation warning to stderr. +3. Failing test: `scadalink audit query --since ...` (the NEW operational command) and `scadalink audit-config query --user ...` (the renamed config command) are clearly different surfaces and do not conflict. +4. Implement. +5. Run: pass. +6. Commit: `refactor(cli): rename audit-log → audit-config with deprecation alias`. + +#### M8-T8: CLI README + help text updates + +**Files:** +- Modify: `src/ScadaLink.CLI/README.md` — document the new `audit` group, the renamed `audit-config` group, the permission requirements, the `verify-chain` no-op note, and the CLI ↔ UI filter parity. +- Modify: each subcommand's `--help` description for clarity. + +**Steps:** +1. Inline doc edits. +2. Verify `scadalink audit --help` and `scadalink audit-config --help` produce the documented output. +3. Commit: `docs(cli): document new scadalink audit group and audit-config rename`. + +#### M8-T9: CLI integration test — end-to-end query + export + +**Files:** +- Create: `tests/ScadaLink.IntegrationTests/Cli/AuditCliEndToEndTests.cs` — boots central with a populated AuditLog table; invokes `scadalink audit query --since ...` against the running ManagementService; asserts results match the database. Same for export. + +**Steps:** +1. Sketch test using existing IntegrationTests harness. +2. Iterate until all flag combinations work end-to-end. +3. Commit: `test(cli): scadalink audit end-to-end against running ManagementService`. + +### M8 — Risk callouts + +- **Operator script breakage from the `audit-log` rename:** the deprecation alias is the safety net but only for one minor version; document the deprecation timeline clearly in the CLI README. Coordinate with anyone running `audit-log` in CI/cron. +- **Parquet output:** requires a Parquet writer library. If one isn't already in `Directory.Packages.props`, add the smallest viable dependency (`ParquetSharp` or `Parquet.Net`). Decide during M8 brainstorm. +- **Streaming export from CLI:** the CLI invokes the ManagementService HTTP endpoint, which itself streams. Confirm `HttpClient.SendAsync` with `HttpCompletionOption.ResponseHeadersRead` is used so the CLI doesn't buffer the whole response. +- **Permission model parity:** ensure the CLI's permission errors mirror the UI's (HTTP 403 → CLI exit code 2 with a clear message). ---