Files
scadalink-design/docs/plans/2026-05-21-audit-log-followups.md

24 KiB
Raw Blame History

Audit Log #23 — Deferred Follow-ups Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task (bundled cadence — one implementer + one review pass per task).

Goal: Close the five deferred implementation follow-ups from the Audit Log #23 roadmap so site audit events actually reach central, the audit/SiteCall surfaces are complete, and known tech debt is paid down.

Architecture: Five independent-ish workstreams against the existing ScadaLink codebase. The headline change: site→central audit forwarding moves from the production NoOpSiteStreamAuditClient stub to a real ClusterClient-based push — the same transport notifications already use (SiteCommunicationActorClusterClient.Send("/user/central-communication", …)CentralCommunicationActor), avoiding a new central-hosted gRPC server. The remaining four follow-ups are scoped tech-debt / UI / contract changes.

Tech Stack: .NET 10, Akka.NET (ClusterClient, ClusterClientReceptionist, cluster singletons, TestKit), EF Core 10 (MS SQL + SQLite providers), Blazor Server + Bootstrap CSS (no third-party UI libs), System.CommandLine, xUnit + Akka.TestKit.Xunit2 + bUnit + NSubstitute, Playwright.

Spec sources: alog.md, docs/requirements/Component-AuditLog.md, docs/requirements/Component-SiteCallAudit.md, docs/plans/2026-05-20-audit-log-code-roadmap.md (header lines 1419 enumerate the deferred items).

Ground rules (carry into every task):

  • Branch off main before any code change; never commit on main.
  • Edit in place. Never touch infra/*. The docker/* cluster config is touched only if a task explicitly says so (none here do).
  • Stage with explicit git add <path> — never git add ., never git commit -am.
  • TDD: failing test → minimal code → green → commit. Full solution stays green (dotnet build ScadaLink.slnx, dotnet test ScadaLink.slnx).
  • Additive message-contract evolution where possible; where a contract shape must change (Task 8), update every call site in the same task.
  • Do not push to origin — the user authorizes pushes separately.

Task 0: Prep — feature branch

Files: none (git only).

Step 1: From a clean main, create the working branch:

git checkout main && git status --porcelain   # expect clean
git checkout -b feature/audit-log-followups

Step 2: Confirm baseline green:

dotnet build ScadaLink.slnx

Expected: build succeeds. (A full dotnet test baseline is optional but recommended.)

Acceptance: on branch feature/audit-log-followups, solution builds.


Task 1: Audit push — central ingest routing over ClusterClient

What: Make the receptionist-registered CentralCommunicationActor accept IngestAuditEventsCommand (and IngestCachedTelemetryCommand) from a site ClusterClient, forward to the AuditLogIngestActor cluster-singleton proxy, and pipe the reply back. Mirror the existing NotificationSubmit / RegisterNotificationOutbox pattern exactly.

Files:

  • Modify: src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs — add Receive<IngestAuditEventsCommand> + Receive<IngestCachedTelemetryCommand> handlers; add a RegisterAuditIngest registration message handler holding the AuditLogIngestActor proxy IActorRef (mirror RegisterNotificationOutbox at line ~120 / HandleNotificationSubmit at line ~130).
  • Create: src/ScadaLink.Commons/Messages/Audit/RegisterAuditIngest.cspublic sealed record RegisterAuditIngest(IActorRef AuditIngestActor); (mirror RegisterNotificationOutbox).
  • Modify: src/ScadaLink.Host/Actors/AkkaHostedService.cs — after the central AuditLogIngestActor singleton + proxy are created (~lines 355379), Tell the RegisterAuditIngest to the CentralCommunicationActor (mirror how the Notification Outbox proxy is registered).
  • Test: tests/ScadaLink.Communication.Tests/Actors/CentralCommunicationActorAuditTests.cs (new).

Approach:

  • Handler Asks the registered audit-ingest proxy and PipeTos the IngestAuditEventsReply back to the original Sender (the ClusterClient round-trips it to the site). Use the existing audit-ingest Ask-timeout convention (30s — see SiteStreamGrpcServer AuditIngestAskTimeout); add a bound option if no constant is reachable.
  • If no audit-ingest proxy is registered yet (startup race), reply with an empty IngestAuditEventsReply([]) — the site keeps the rows Pending and retries, exactly as the gRPC handler does today.
  • IngestCachedTelemetryCommand is routed the same way (its reply type is the same IngestAuditEventsReply per AuditLogIngestActor).

Tests (TestKit + NSubstitute):

  1. IngestAuditEventsCommand with an audit-ingest probe registered → probe receives the command, actor replies the probe's IngestAuditEventsReply to the sender.
  2. IngestAuditEventsCommand with no audit-ingest registered → sender gets IngestAuditEventsReply with empty AcceptedEventIds.
  3. IngestCachedTelemetryCommand routes to the same proxy.

Steps: write failing tests → run (fail) → implement record + handlers + Host registration → run (pass) → dotnet build ScadaLink.slnx → commit.

Commit: feat(communication): route audit ingest commands through CentralCommunicationActor


Task 2: Audit push — real site client, Host wiring, integration test

What: Replace NoOpSiteStreamAuditClient (production binding) with a real ISiteStreamAuditClient that pushes over ClusterClient via the site's SiteCommunicationActor. After this task the site auditlog.db Pending backlog drains to central.

Files:

  • Create: src/ScadaLink.AuditLog/Site/Telemetry/ClusterClientSiteAuditClient.cs — implements ISiteStreamAuditClient; ctor takes the SiteCommunicationActor IActorRef + an Ask timeout.
  • Modify: src/ScadaLink.Communication/Actors/SiteCommunicationActor.cs — ensure IngestAuditEventsCommand / IngestCachedTelemetryCommand are forwarded over ClusterClient.Send("/user/central-communication", …) with the reply routed back to the Ask (mirror the NotificationSubmit forward at lines ~190/214/224).
  • Modify: src/ScadaLink.Host/Actors/AkkaHostedService.cs — in the site telemetry wiring (~lines 648681), construct ClusterClientSiteAuditClient with the SiteCommunicationActor ref and pass it to SiteAuditTelemetryActor instead of the DI-resolved NoOpSiteStreamAuditClient.
  • Modify: src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs (line ~124129) — keep NoOpSiteStreamAuditClient as the DI default (it remains correct for central/test composition roots that have no SiteCommunicationActor); update the stale comment that says "M6's reconciliation work brings the real implementation".
  • Test: tests/ScadaLink.AuditLog.Tests/Site/Telemetry/ClusterClientSiteAuditClientTests.cs (new); extend tests/ScadaLink.IntegrationTests/AuditLog/ with a ClusterClient-push end-to-end test.

Approach:

  • IngestAuditEventsAsync(AuditEventBatch, ct) maps the batch to IngestAuditEventsCommand(IReadOnlyList<AuditEvent>), Asks the SiteCommunicationActor for IngestAuditEventsReply, maps the reply's AcceptedEventIds back into the IngestAck the SiteAuditTelemetryActor expects.
  • An Ask timeout / failure must throwSiteAuditTelemetryActor's drain loop already treats a thrown exception as transient (rows stay Pending, retried next tick). Keep that contract.
  • IngestCachedTelemetryAsync does the same with IngestCachedTelemetryCommand. (CachedCallTelemetryForwarder already resolves ISiteStreamAuditClient — no change there.)
  • AuditEvent already crosses the wire as the NotificationSubmit records do; confirm the Akka serializer handles IReadOnlyList<AuditEvent> (notification messages prove the pattern).

Tests:

  1. IngestAuditEventsAsync → batch becomes one IngestAuditEventsCommand; mocked actor reply's accepted ids map onto IngestAck.
  2. Partial ack (3 of 5 ids) → IngestAck lists only the 3.
  3. Ask timeout → method throws (drain loop keeps rows Pending).
  4. Integration: boot a site+central pair via the IntegrationTests harness, write an audit event on the site hot-path, assert a central AuditLog row appears within ~10s and the site row flips to Forwarded.

Commit: feat(auditlog): real ClusterClient-based site audit push client


Task 3: Consolidate the duplicated audit DTO mappers

What: Collapse the 4 near-duplicate AuditEventAuditEventDto mapping copies into one canonical mapper. The project-reference cycle (AuditLog → Communication, never the reverse) is resolved by hosting the canonical mapper in ScadaLink.Communication — it owns the generated AuditEventDto and references Commons for AuditEvent, and AuditLog already references Communication.

Files:

  • Create: src/ScadaLink.Communication/Grpc/AuditEventDtoMapper.cspublic static class with ToDto(AuditEvent) → AuditEventDto and FromDto(AuditEventDto) → AuditEvent (lift the canonical logic from AuditLog/Telemetry/AuditEventMapper.cs).
  • Modify: src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs — replace the inlined IngestAuditEvents loop (~lines 265295), AuditEventToDto (~490517) and MapAuditEventFromDto (~537561) with calls to AuditEventDtoMapper.
  • Delete: src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs; update its callers in ScadaLink.AuditLog to use Communication's AuditEventDtoMapper.
  • Leave untouched: SqliteAuditWriter.MapRow (SQLite DataReaderAuditEvent, not a DTO mapper — different source type) and MapSiteCallFromDto (SiteCall, not audit). Note this in the commit body.
  • Test: move/merge tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs into tests/ScadaLink.Communication.Tests/Grpc/AuditEventDtoMapperTests.cs; keep round-trip coverage (FromDto(ToDto(x)) == x).

Approach: Pure refactor — no behaviour change. Verify field-by-field parity against all 3 inlined copies before deleting them (null handling, enum parsing, Int32Value/Timestamp wrapping).

Steps: create mapper + tests → run → swap call sites → delete old copies → dotnet build + dotnet test ScadaLink.slnx (all green, no behaviour drift) → commit.

Commit: refactor(auditlog): consolidate AuditEvent DTO mappers into Communication


Task 4: Site Call Audit — query / KPI / detail backend

What: Build the missing read-side backend for the Site Calls UI: Commons message contracts, SiteCallAuditActor query/KPI/detail handlers, and CommunicationService methods. Mirror NotificationOutboxQueries.cs + the Notification Outbox actor/service shape. Spec: Component-SiteCallAudit.md §KPIs and §queryable list.

Files:

  • Create: src/ScadaLink.Commons/Messages/Audit/SiteCallQueries.cs — records mirroring NotificationOutboxQueries.cs:
    • SiteCallQueryRequest (CorrelationId, status/site/kind/target filters, date range, page cursor fields, PageSize)
    • SiteCallSummary (TrackedOperationId, SourceSite, Kind, TargetSummary, Status, RetryCount, LastError, provenance, CreatedAtUtc, UpdatedAtUtc, TerminalAtUtc)
    • SiteCallQueryResponse (CorrelationId, Success, ErrorMessage, IReadOnlyList, next-cursor fields)
    • SiteCallKpiRequest / SiteCallKpiResponse (BufferedCount, ParkedCount, FailedLastInterval, DeliveredLastInterval, OldestPendingAge, StuckCount — mirror the Notification Outbox KPI shape; also a per-site variant)
    • SiteCallDetailRequest / SiteCallDetailResponse / SiteCallDetail (full row incl. LastError, all timestamps).
  • Modify: src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs — add ReceiveAsync handlers for the query / KPI / detail requests; query handler calls ISiteCallAuditRepository.QueryAsync (keyset paging on (CreatedAtUtc DESC, TrackedOperationId DESC)); KPI handler computes point-in-time counts from the SiteCalls table (stuck = Pending/Retrying older than the configurable threshold, default 10 min). Use the per-message DI scope pattern already in the actor.
  • Add repo support if needed: src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs may need a KPI-count method + a detail GetAsync (a GetAsync(TrackedOperationId) already exists).
  • Modify: src/ScadaLink.Communication/CommunicationService.cs — add QuerySiteCallsAsync, GetSiteCallKpisAsync, GetPerSiteSiteCallKpisAsync, GetSiteCallDetailAsync (mirror QueryNotificationOutboxAsync etc.: Ask the SiteCallAuditActor proxy with _options.QueryTimeout).
  • Test: tests/ScadaLink.SiteCallAudit.Tests/ (actor handlers), tests/ScadaLink.Commons.Tests/ (contract shape), tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/SiteCallAuditRepositoryTests.cs (extend for KPI counts).

Commit: feat(sitecallaudit): query, KPI and detail backend for the Site Calls page


Task 5: Site Call Audit — Retry/Discard relay to owning site

What: Central UI Retry/Discard on a parked Site Call must relay RetryParkedOperation / DiscardParkedOperation to the owning site (sites are the source of truth — central never mutates the SiteCalls row directly; the corrected row arrives back via telemetry). Spec: Component-SiteCallAudit.md §actions-on-parked-rows.

Files:

  • Create: src/ScadaLink.Commons/Messages/Audit/SiteCallRelayMessages.csRetryParkedOperationRequest/Response, DiscardParkedOperationRequest/Response (carry TrackedOperationId, SourceSite, CorrelationId; response carries Success + a "site unreachable" error case).
  • Modify: src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs (or a small relay collaborator) — on a relay request, look up the owning site and forward RetryParkedOperation/DiscardParkedOperation to that site over the central→site ClusterClient (the central side already maintains one ClusterClient per site; reuse the CentralCommunicationActor site-addressing path). On no/late reply → respond "site unreachable".
  • Modify: src/ScadaLink.Communication/Actors/SiteCommunicationActor.cs — receive RetryParkedOperation/DiscardParkedOperation and hand to the site operation-tracking subsystem.
  • Modify the site operation-tracking owner (S&F operation-tracking store / ParkedMessageHandlerActor in src/ScadaLink.StoreAndForward/) — Retry resets a parked tracked operation to Pending for the retry loop; Discard marks it Discarded. Reuse the parked-message handling that already backs notification Retry/Discard.
  • Modify: src/ScadaLink.Communication/CommunicationService.cs — add RetrySiteCallAsync / DiscardSiteCallAsync.
  • Test: tests/ScadaLink.SiteCallAudit.Tests/ (relay routing + unreachable path), tests/ScadaLink.StoreAndForward.Tests/ (site-side parked op reset/discard), tests/ScadaLink.Communication.Tests/.

Note for implementer: this is the meatiest backend task — the central→site relay direction and the site-side parked-operation mutation are both required. If the site operation-tracking Retry/Discard primitive already exists for cached calls, reuse it; only add the message plumbing.

Commit: feat(sitecallaudit): central→site Retry/Discard relay for parked operations


Task 6: Site Calls UI page + nav + Audit drill-in

What: Build the Central UI Site Calls page — a near-mirror of NotificationReport.razor. Spec: Component-SiteCallAudit.md.

Files:

  • Create: src/ScadaLink.CentralUI/Components/Pages/SiteCalls/SiteCallsReport.razor (+ .razor.cs) — route @page "/site-calls/report", RequireDeployment (or OperationalAudit) auth to match the Notifications report gating. Structure (per the form-layout memory: header, filter card, results table, paging, modal):
    • Filter card: Status, Kind, Source site, Target keyword, date range, "Stuck only" checkbox, Clear/Query.
    • Results table columns: TrackedOperationId, Source site, Kind, Target, Status (badge + Stuck indicator), Retries, Last error, Created, Updated, Actions.
    • Actions column: a "View audit history" link href="/audit/log?correlationId=@row.TrackedOperationId" (the TrackedOperationId is the audit CorrelationId) — mirror NotificationReport.razor:172; plus Retry/Discard buttons shown only on Parked rows (none on Failed).
    • Keyset Previous/Next paging; double-click row → detail modal (body shows full row + LastError; reuse the Notifications detail-modal idiom — never MarkupString).
  • Modify: src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor — register the Site Calls page (own "Site Calls" section, or under an existing group, consistent with the Notifications / Audit section pattern at lines ~65129).
  • Modify: src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor.cs — confirm ?correlationId= drill-in already covers this (it does); no change expected — just verify.
  • Test: tests/ScadaLink.CentralUI.Tests/Pages/ (bUnit — scaffold, paging, parked-only actions, drill-in link), tests/ScadaLink.CentralUI.PlaywrightTests/SiteCalls/SiteCallsPageTests.cs (new).

Use the frontend-design skill for page/component styling guidance. Blazor Server + Bootstrap only; custom components; clean corporate aesthetic.

Commit: feat(centralui): Site Calls page with Retry/Discard and Audit drill-in


Task 7: Site Call KPI tiles + Health dashboard integration

What: Surface Site Call Audit KPIs on the Health dashboard, mirroring the Notification Outbox tiles + AuditKpiTiles.

Files:

  • Create: src/ScadaLink.CentralUI/Components/Health/SiteCallKpiTiles.razor (+ .razor.cs) — mirror Components/Health/AuditKpiTiles.razor; tiles for Buffered, Parked (danger border if >0), Stuck (warning border if >0); each tile navigates to /site-calls/report with a query-string filter.
  • Modify: src/ScadaLink.CentralUI/Components/Pages/Monitoring/Health.razor (+ code-behind) — add a "Site Calls" KPI section between the Notification Outbox and Audit Log sections; load via CommunicationService.GetSiteCallKpisAsync (Task 4).
  • Test: tests/ScadaLink.CentralUI.Tests/ (bUnit — tile rendering, threshold borders, navigation targets).

Commit: feat(centralui): Site Call KPI tiles on the Health dashboard


Task 8: Multi-value AuditLogQueryFilter — contract + repository

What: Widen AuditLogQueryFilter from single-value to multi-value on the Channel, Kind, Status, SourceSiteId dimensions, and translate them to IN (...) in the repository. Target, Actor, CorrelationId, FromUtc, ToUtc stay as-is. Keyset paging must not change.

Files:

  • Modify: src/ScadaLink.Commons/Types/Audit/AuditLogQueryFilter.cs — change Channel/Kind/Status/SourceSiteId to IReadOnlyList<…>? (e.g. IReadOnlyList<AuditChannel>? Channels). Keep the record's other params. This is a breaking shape change — update every call site in this task.
  • Modify: src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs (QueryAsync, ~lines 119165) — each widened dimension becomes if (filter.Channels is { Count: > 0 }) query = query.Where(e => filter.Channels.Contains(e.Channel));. Empty/null list = no filter. Keyset predicate + OrderByDescending untouched.
  • Update all other AuditLogQueryFilter constructors in this task so the solution compiles (ManagementService ParseFilter, CentralUI AuditQueryModel.ToFilter, CLI helpers, tests) — the deep behaviour of those consumers is Task 9; here just make them compile (e.g. wrap a single value in a one-element list).
  • Test: tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs — add QueryAsync_FilterByMultipleChannels_ReturnsUnion, multi-status, multi-site; keep the existing single-value and keyset tests green.

Commit: feat(auditlog): multi-value AuditLogQueryFilter dimensions


Task 9: Multi-value filters — ManagementService, CLI, Central UI

What: Make the three consumers actually emit/accept multiple values per dimension instead of collapsing to the first.

Files:

  • Modify: src/ScadaLink.ManagementService/AuditEndpoints.cs (ParseFilter, ~lines 369414) — read repeated query params with .ToArray() (not .ToString()); parse each into the enum list; unparseable values silently dropped (keep the existing lax contract).
  • Modify: src/ScadaLink.CentralUI/Components/Audit/AuditQueryModel.cs (ToFilter, ~lines 110126) — stop collapsing to .First(); pass the full Channels/Kinds/Statuses/SiteIdentifiers sets. Adjust the ErrorsOnly logic (lines ~128145) for multi-value Status. The chip UI already supports multi-select — no .razor change expected; verify.
  • Modify: src/ScadaLink.CentralUI/Components/Pages/Audit/AuditLogPage.razor.cs export-URL builder (~lines 175227) — emit repeated query-string params per selected value.
  • Modify: src/ScadaLink.CLI/Commands/AuditCommands.cs (~lines 2941) — make --channel/--kind/--status/--site accept multiple values (System.CommandLine multi-arity options; keep AcceptOnlyFromAmong for the enum-like ones). Modify src/ScadaLink.CLI/Commands/AuditQueryHelpers.csAuditQueryArgs fields become arrays; BuildQueryString emits one key per value.
  • Test: extend tests/ScadaLink.ManagementService.Tests/AuditEndpointsTests.cs, tests/ScadaLink.CLI.Tests/Commands/AuditQueryCommandTests.cs, tests/ScadaLink.CentralUI.Tests/ filter-model tests for multi-value round-trips.

Commit: feat(audit): multi-value filters across ManagementService, CLI and Central UI


Task 10: Audit results grid — column resize + reorder UX

What: Add drag-to-resize and drag-to-reorder column UX to AuditResultsGrid, persisted in sessionStorage. Blazor + Bootstrap + minimal JS interop only (no third-party libs).

Files:

  • Create: src/ScadaLink.CentralUI/wwwroot/js/audit-grid.js — a window.auditGrid namespace: column-resize drag handlers, header drag-reorder handlers, and save(key,json) / load(key) over sessionStorage (mirror treeview-storage.js).
  • Modify: src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor (+ .razor.cs) — render a resize handle in each <th>; make headers draggable; apply persisted widths (inline style/CSS var) and column order (the ColumnOrder parameter + OrderedColumns() already exist — wire it to persisted state); IJSRuntime calls to load on first render and save on change.
  • Create: src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor.css — resize-handle styling, drag-over feedback (mirror AuditDrilldownDrawer.razor.css / TreeView.razor.css idioms).
  • Reference the script from the host page (App.razor / _Host / layout — match where monaco-init.js / session-expiry.js are referenced).
  • Test: extend tests/ScadaLink.CentralUI.PlaywrightTests/Audit/AuditLogPageTests.cs (or new AuditGridColumnTests.cs) — resize changes a column width, reorder changes header order, both survive a reload via sessionStorage.

Use the frontend-design skill for the resize-handle / drag-feedback visual treatment.

Commit: feat(centralui): column resize and reorder for the audit results grid


Final review

After Task 10: dispatch a final cross-cutting code review of the whole branch against this plan, then run the full solution build + test once more. Update docs/plans/2026-05-20-audit-log-code-roadmap.md header lines 1419 to strike the five now-completed follow-ups (leaving the three v1.x items). Hand back to the user for the push decision (do not push).


Task dependency summary

  • Task 0 blocks everything.
  • Task 2 blocked by Task 1.
  • Task 3 independent (after Task 0).
  • Task 5 blocked by Task 4.
  • Task 6 blocked by Tasks 4 and 5.
  • Task 7 blocked by Task 4.
  • Task 9 blocked by Task 8.
  • Task 10 independent (after Task 0).

Execution order: 0 → 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9 → 10 → final review.