Commit Graph

789 Commits

Author SHA1 Message Date
Joseph Doherty
8fd0cf355b Merge branch 'feature/notification-report-detail-modal': row double-click detail modal
Double-clicking a row on /notifications/report opens a Bootstrap modal
showing the notification's full detail (untruncated ID, full LastError,
SourceInstanceId, exact timestamps) — fields the grid truncates or omits.
Parked notifications also get Retry/Discard buttons in the modal footer.
Inline no-JS modal, in-memory NotificationSummary, no extra query.
2026-05-21 02:40:07 -04:00
Joseph Doherty
ef5cf76026 feat(ui): notification report row double-click opens detail modal 2026-05-21 02:39:41 -04:00
Joseph Doherty
80076a3951 Merge branch 'chore/dev-cluster-dispatch-tuning': raise dev-cluster notification dispatch throughput 2026-05-21 02:35:22 -04:00
Joseph Doherty
1c9b2445ad chore(dev-cluster): raise NotificationOutbox dispatch throughput
Both central nodes ran on the NotificationOutboxOptions code defaults
(100 / 10s = 600/min) because the mounted per-node appsettings.Central.json
had no ScadaLink:NotificationOutbox section. Add the section with
DispatchBatchSize 1000 + DispatchInterval 5s — measured ~6,000/min after
restart (sweep duration becomes the binding constraint, which is fine:
the no-overlap guard self-regulates). Dev-cluster tuning only.
2026-05-21 02:35:22 -04:00
Joseph Doherty
163446948d Merge branch 'feature/smtp-config-tls-credentials': make SMTP TlsMode + Credentials configurable
Closes the gap found while debugging notification delivery: SmtpConfiguration
has TlsMode + Credentials fields, but no non-SQL path could set them.

- UpdateSmtpConfigCommand carries optional TlsMode + Credentials (nullable,
  preserve-if-null in HandleUpdateSmtpConfig — non-breaking for existing
  5-arg callers).
- CLI 'notification smtp update' gains optional --tls-mode (validated
  None/StartTLS/SSL) and --credentials.
- Central UI SMTP form gains a TLS Mode select (None/StartTLS/SSL) —
  previously the form had no TlsMode field at all.
- docs/test_infra/test_infra_smtp.md: replaced the invalid AuthMode:None
  example with a working Basic config (TlsMode None, dummy credentials);
  corrected the prose (delivery requires Basic/OAuth2, no anonymous mode)
  and noted the scadalink-smtp container hostname for in-cluster use.

4 commits, 13 new tests. Full solution green.
2026-05-21 02:16:23 -04:00
Joseph Doherty
e58e038db9 docs(test-infra): correct SMTP example — Basic auth, TlsMode None, container hostname
The appsettings example used AuthMode 'None', which the delivery code
(MailKitSmtpClientWrapper) rejects — only Basic and OAuth2 are valid.
Switch to a working Basic config with Credentials and TlsMode None, and
document that Server must be the container name scadalink-smtp when the
Notification Service runs inside the docker cluster.
2026-05-21 02:13:19 -04:00
Joseph Doherty
c66ef71017 feat(ui): SMTP config form TlsMode field
Add a TlsMode read-only row and a None/StartTLS/SSL select to the SMTP
Configuration page edit form. New configs default to None; edits load
and persist the chosen mode through the repository.
2026-05-21 02:13:02 -04:00
Joseph Doherty
399b4aac92 feat(cli): notification smtp update --tls-mode / --credentials options
Expose the two previously-unreachable SmtpConfiguration fields on the
CLI. Both flags are optional — omitting them sends null so the server
preserves the existing value. --tls-mode is constrained to the canonical
{None, StartTLS, SSL} set via AcceptOnlyFromAmong for fast-fail.
2026-05-21 02:11:51 -04:00
Joseph Doherty
ec92d55ebf feat(smtp): UpdateSmtpConfigCommand carries TlsMode + Credentials
Add two optional nullable fields (TlsMode, Credentials) to the
UpdateSmtpConfigCommand record. The handler applies preserve-if-null
semantics: an update that omits a field leaves the existing value
intact, so existing 5-arg callers remain non-breaking.
2026-05-21 02:11:03 -04:00
Joseph Doherty
932fda5594 infra(seed): dump encrypted secret columns as NULL, restore via CLI
ASP.NET Data Protection ciphertext is non-deterministic and bound to the
source key ring, so encrypted secret columns (ExternalSystemDefinitions
.AuthConfiguration, SmtpConfigurations.Credentials, DatabaseConnection
Definitions.ConnectionString) cannot be replayed from a static SQL dump —
the app would fail to decrypt them. dump_seed.py now emits those columns
as NULL; reseed.sh adds a post-seed stage that recreates the values
through the ScadaLink CLI so the EF value converter re-encrypts against
the target cluster's key ring.
2026-05-21 01:29:51 -04:00
Joseph Doherty
5492c94e2f docs(audit): roadmap closeout — all 8 milestones complete (#23)
Audit Log #23 implementation complete. M1-M8 merged to main. Full
solution 2,993 tests green, 0 failures. Records final state, the
v1.x deferrals (hash chain, Parquet, per-channel retention), and the
follow-ups noted during implementation (real gRPC push client, mapper
consolidation, Site Calls UI, multi-value filters, grid drag UX).
2026-05-20 22:16:53 -04:00
Joseph Doherty
7a1c974839 Merge branch 'feature/audit-log-m8-cli': Audit Log #23 M8 CLI
M8 — the final milestone — ships the operator CLI surface:
- ManagementService /api/audit/query + /api/audit/export endpoints
  (minimal-API, HTTP Basic + LDAP auth, OperationalAudit / AuditExport
  role gates reusing M7's AuthorizationPolicies role sets).
- scadalink audit command group:
  - audit query: full UI-parity filter set, keyset-cursor paging
    (--all follows nextCursor), JSON (default) / table output,
    AcceptOnlyFromAmong fast-fail validation on channel/kind/status.
  - audit export: streams server CSV / JSONL to --output (parquet → 501
    per v1.x deferral), required-flag enforcement.
  - audit verify-chain: v1 no-op stub (hash-chain deferred to v1.x);
    validates --month, prints the deferral message, exits 0.
- Table output formatter for audit events.
- Pre-existing audit-log config-change command renamed audit-config;
  audit-log retained as a deprecation alias (stderr warning).
- CLI README documents the audit group, the rename, permissions.

Review fix: corrected --channel/--kind/--status help + README to the
real enum names (ApiOutbound/DbOutbound/Notification/ApiInbound, etc.)
and added AcceptOnlyFromAmong so a bad value fails fast instead of
silently returning unfiltered results; removed the dead --instance flag
(AuditLogQueryFilter has no instance column).

Shipped: 10 commits, ~62 net new tests.

=== Audit Log #23 — implementation complete (M1-M8) ===
Full solution: 24 test projects, 2,993 tests, 0 failures, 0 skipped.
dotnet build ScadaLink.slnx clean. infra/* never touched on any of the
8 milestone branches. alog.md changed exactly once (M1 vocabulary
reconciliation, committed before the dependent code merge per the
ordering invariant).
2026-05-20 22:16:23 -04:00
Joseph Doherty
ff004e2e48 fix(cli): correct audit query channel/kind/status enum names + drop dead --instance flag (#23 M8) 2026-05-20 22:13:26 -04:00
Joseph Doherty
36d58e8988 docs(cli): document scadalink audit group + audit-config rename (#23 M8) 2026-05-20 22:03:32 -04:00
Joseph Doherty
ba8ddcc032 refactor(cli): rename audit-log to audit-config with deprecation alias (#23 M8) 2026-05-20 22:02:19 -04:00
Joseph Doherty
d40ee85e14 feat(cli): table output formatter for audit events (#23 M8) 2026-05-20 22:00:57 -04:00
Joseph Doherty
4b3a692170 feat(cli): scadalink audit verify-chain subcommand v1 no-op (#23 M8) 2026-05-20 21:57:16 -04:00
Joseph Doherty
91682cd862 feat(cli): scadalink audit export subcommand (#23 M8) 2026-05-20 21:56:20 -04:00
Joseph Doherty
2fa46ed400 feat(cli): scadalink audit query subcommand (#23 M8) 2026-05-20 21:55:38 -04:00
Joseph Doherty
3263b39477 feat(cli): scaffold scadalink audit command group (#23 M8) 2026-05-20 21:52:37 -04:00
Joseph Doherty
a1bdd94d4c feat(mgmt): /api/audit/{query,export} endpoints with permission gates (#23 M8) 2026-05-20 21:49:14 -04:00
Joseph Doherty
263884fa63 docs(audit): add M8 CLI implementation plan (#23)
3 bundles: CLI audit command group (scaffold/query/export/verify-chain),
ManagementService endpoints, formatters + audit-config rename + README.
verify-chain is a v1 no-op stub; hash chain deferred to v1.x.
2026-05-20 21:39:29 -04:00
Joseph Doherty
9ba453191b Merge branch 'feature/audit-log-m7-central-ui': Audit Log #23 M7 Central UI
M7 ships the user-visible Audit Log surface in the Central UI
(Blazor Server + Bootstrap, no third-party UI libraries):
- AuditLogPage at /audit/log under a new Audit nav group.
- Pre-existing config-change viewer renamed AuditLog.razor ->
  ConfigurationAuditLog.razor at /audit/configuration.
- AuditFilterBar: Channel/Kind/Status/Site chips (Channel narrows Kind),
  time-range presets + custom range, Instance/Script/Target/Actor text
  search, Errors-only toggle.
- AuditResultsGrid: 10-column custom Bootstrap table, keyset paging
  (OccurredAtUtc desc, EventId desc), status badges, row-select.
- AuditDrilldownDrawer: Bootstrap offcanvas; JSON pretty-print, SQL
  code block, Copy-as-cURL (ApiOutbound/ApiInbound), Show-all-events
  by CorrelationId, redaction badges.
- Drill-ins: Notifications row link + External Systems / Sites / API
  Keys / Instances detail-page header links. (Site Calls drill-in
  deferred — no Site Calls UI page exists yet.)
- AuditLogPage query-string filters (correlationId/target/actor/site/
  channel/instance) with auto-load.
- 3 Health-dashboard KPI tiles: Audit volume, error rate, backlog.
- Server-side streaming CSV export via minimal-API endpoint.
- OperationalAudit + AuditExport role-claim policies; Audit + new
  AuditReadOnly roles; page + export + nav gated.
- 7 Playwright E2E + bUnit coverage throughout.

Fix: AuditLogQueryService now uses scope-per-query (IServiceScopeFactory)
so the drill-in auto-load no longer races AuditFilterBar's site query
on the shared circuit-scoped DbContext (EF 'second operation' error).

Known M7-scope limitations (documented): AuditLogQueryFilter is
single-value per dimension, so multi-select chips collapse to the first
value; column resize/reorder ships as model + parameter only (no drag
UX); SQL highlighting is CSS-class-only (no JS highlighter library).

Shipped: 14 commits, ~95 net new tests. CentralUI.Tests 418, Playwright
52. Full solution green (one isolated Host.Tests parallel-runner flake,
passes 200/200 in isolation). infra/* untouched on any branch commit.
2026-05-20 21:39:00 -04:00
Joseph Doherty
fac31c6018 fix(ui): AuditLogQueryService uses scope-per-query to avoid DbContext race (#23 M7) 2026-05-20 21:33:38 -04:00
Joseph Doherty
9c955da2e7 test(ui): Audit Log Playwright E2E coverage (#23 M7) 2026-05-20 21:24:19 -04:00
Joseph Doherty
6dea84cd28 feat(security): OperationalAudit + AuditExport permissions for Audit Log surface (#23 M7)
Bundle G (#23 M7-T15): replace the temporary Admin-only gate on the Audit
Log surface with two new permission policies — OperationalAudit (read) and
AuditExport (bulk-export) — so the read path and the forensic-export path
can be delegated independently.

ScadaLink.Security
- AuthorizationPolicies: add OperationalAudit + AuditExport policy
  constants; register them via RequireClaim with an explicit role allow-list
  (OperationalAuditRoles, AuditExportRoles) so the role-to-permission
  mapping is documented in one place.
- Default mapping: Admin and Audit roles grant both policies; AuditReadOnly
  grants OperationalAudit only (read access without bulk export); Design
  and Deployment grant neither.

ScadaLink.CentralUI
- AuditLogPage: switch the page-level [Authorize] to the OperationalAudit
  policy and wrap the Export-CSV button in an AuthorizeView gated on
  AuditExport so an OperationalAudit-only operator still sees the page +
  filters but cannot trigger the CSV pull.
- ConfigurationAuditLog: switch from RequireAdmin to OperationalAudit so
  both pages under the Audit nav group share the same gate.
- NavMenu: the Audit nav group now gates on OperationalAudit so the
  section header + both child links match the per-page policies.
- AuditExportEndpoints: switch RequireAuthorization from RequireAdmin to
  AuditExport — this is the authoritative gate; the AuthorizeView on the
  button is just a UX affordance.

Tests
- New AuditLogPagePermissionTests covers the 5 brief-mandated cases plus
  defence-in-depth for Admin-alone and AuditReadOnly users on the endpoint.
- SecurityTests: add policy-level coverage for the new role→permission
  matrix (Theory rows pin every role/policy combination).
- AuditExportEndpointsTests: switch to AddScadaLinkAuthorization() so the
  test host exercises the real production wiring under the new gate.
- AuditLogPageScaffoldTests: wrap the page render in a
  CascadingAuthenticationState so the new in-page AuthorizeView resolves
  the principal.
2026-05-20 21:09:42 -04:00
Joseph Doherty
8744630adb feat(ui): server-side streaming CSV export of Audit Log (#23 M7) 2026-05-20 20:57:01 -04:00
Joseph Doherty
943c2ced39 feat(ui): Audit KPI tiles on Health dashboard (#23 M7)
Adds three KPI tiles to the central Health dashboard for the Audit channel:
volume (rows in the last hour), error rate (Failed/Parked/Discarded over
total), and backlog (sum of SiteAuditBacklog.PendingCount across all sites).

Repo + service:
- IAuditLogRepository.GetKpiSnapshotAsync(window, nowUtc) — single aggregate
  SELECT over the trailing window returning total + error counts; nowUtc is
  optional for production callers and pinned by integration tests against the
  shared MSSQL fixture so the global counts are deterministic.
- AuditLogQueryService.GetKpiSnapshotAsync() — composes the repo aggregate
  with a sum of SiteAuditBacklog.PendingCount read from ICentralHealthAggregator.
- AuditLogKpiSnapshot record in Commons/Types/.

UI:
- New AuditKpiTiles Blazor component (Components/Health/) — three Bootstrap
  card-tiles, click navigates to /audit/log with the matching pre-filter.
- Health.razor wires the tiles in alongside the existing Notification Outbox
  KPIs; LoadAuditKpis() runs on every 10s refresh tick and degrades to em
  dashes + inline error if the query fails.
- AuditLogPage extended to parse ?status= so the error-rate tile drill-in
  (?status=Failed) auto-loads the grid.

Tests:
- AuditLogRepositoryTests: GetKpiSnapshotAsync mixed-status + empty-window
  cases against the MSSQL migration fixture.
- AuditLogQueryServiceTests: forwarding + backlog composition; sites with
  null SiteAuditBacklog contribute zero.
- AuditKpiTilesTests: 9 bUnit tests covering tile render, error-rate maths
  with safe zero-events handling, em-dash unavailable path, click-through
  navigation, and warning/danger border thresholds.
- HealthPageTests: new Renders_AuditKpiTiles_WithValues plus IAuditLogQueryService
  stub registration in the constructor so existing outbox tests still pass.
- AuditLogPageScaffoldTests: ?status=Failed auto-load + unknown status drop.
2026-05-20 20:43:57 -04:00
Joseph Doherty
38fc9b4102 feat(ui): drill-ins from detail pages to Audit Log (#23 M7)
Adds "Recent audit activity" deep links from four edit/detail pages into
the central Audit Log, each with a pre-filter encoded in the query string
that the Audit Log page (Bundle D0) now parses on initialization:

  - External Systems (Design/ExternalSystemForm)      → ?target={Name}
  - API Keys         (Admin/ApiKeyForm)                → ?actor={Name}&channel=ApiInbound
  - Sites            (Admin/SiteForm)                  → ?site={SiteIdentifier}
  - Instances        (Deployment/InstanceConfigure)    → ?instance={UniqueName}

The link is suppressed on create/new flows where there is nothing to
drill into yet. Instance is UI-only on the filter bar (the repository
filter contract has no instance column), so the page-side prefill threads
through the InitialInstanceSearch seam on AuditFilterBar.

Site Calls (#22 M7-T11) drill-in is DEFERRED: the Central UI does not
yet host a Site Calls listing page, per M3 reality notes. Add the
drill-in when that page lands.

#23 M7-T12
2026-05-20 20:26:28 -04:00
Joseph Doherty
1c20e81d77 feat(ui): drill-in from Notifications to Audit Log (#23 M7) 2026-05-20 20:20:54 -04:00
Joseph Doherty
450f8bca28 feat(ui): AuditLogPage parses query-string filters for drill-ins (#23 M7) 2026-05-20 20:19:47 -04:00
Joseph Doherty
ae4480e7aa feat(ui): AuditDrilldownDrawer with JSON/SQL render, cURL, drill-back, redaction badges (#23 M7)
Implements Bundle C (M7-T4 through M7-T8) of the Audit Log #23 M7
Central UI work: a right-side off-canvas drawer that opens from
AuditResultsGrid row clicks and renders one AuditEvent in full.

Cohesive single-component delivery:
- Read-only fields stacked (form-layout memory): Channel/Kind, Status,
  HttpStatus, Target, Actor, Source* provenance, CorrelationId,
  OccurredAtUtc, IngestedAtUtc, DurationMs.
- Channel-aware body renderer: DbOutbound {sql, parameters} payloads
  render a code-block with CSS-only .language-sql class plus a
  parameter <dl>; other channels JSON-pretty-print when parseable and
  fall back to verbatim <pre>.
- Redaction badges on Request/Response when the body contains the
  <redacted> or <redacted: redactor error> sentinels.
- Copy-as-cURL (API channels only) builds a curl command from Target
  + optional {method, headers, body} RequestSummary JSON and writes
  it via navigator.clipboard.writeText.
- Show-all-events drill-back navigates to /audit/log?correlationId={id}
  when the event carries a CorrelationId.
- Close button + backdrop-click both raise OnClose.

AuditLogPage wires Event/IsOpen/OnClose; row clicks now open the
drawer (HandleRowSelected pins _selectedEvent + _drawerOpen=true).

11 bUnit tests cover field rendering, JSON pretty-print, verbatim
fallback, SQL block, conditional buttons, redaction badges,
navigation drill-back, and clipboard interop. No third-party UI
libraries: Bootstrap offcanvas + scoped razor.css only.
2026-05-20 20:13:33 -04:00
Joseph Doherty
e052aa4ff8 feat(ui): AuditResultsGrid + AuditLogQueryService with keyset paging (#23 M7)
Adds the results grid + query facade for the central Audit Log page
(#23 M7-T3):

* IAuditLogQueryService / AuditLogQueryService — CentralUI facade over
  IAuditLogRepository.QueryAsync so the grid can be tested with a stubbed
  query source. Default page size is 100; callers can override per call.

* AuditResultsGrid.razor + .razor.cs — Blazor Server component (Bootstrap
  only, no third-party UI libs). Renders the 10 columns from
  Component-AuditLog.md §10 (OccurredAtUtc, Site, Channel, Kind, Status,
  Target, Actor, DurationMs, HttpStatus, ErrorMessage). Keyset-paged via
  the last visible row's (OccurredAtUtc, EventId) as the cursor; Next-page
  button disabled when the current page is short (no count query). Row
  clicks emit OnRowSelected(AuditEvent) for Bundle C's drilldown drawer.
  Status badges are colour-coded (Delivered=green; Failed/Parked/Discarded
  =red; other=gray). Error messages truncated to 80 chars with full text
  on hover.

* Column model framework: a ColumnOrder [Parameter] reorders columns by
  stable string keys; unknown keys are dropped. M7 scope decision (in the
  class doc): the framework is in place but drag-reorder / resize UX is
  not implemented — M7.x can add persisted-per-user reordering without
  rewriting the column model.

* AuditLogPage wired: hosts AuditFilterBar + AuditResultsGrid, threads
  the filter through and stubs OnRowSelected for Bundle C.

* AuditLogQueryService registered as scoped in AddCentralUI.

* Tests: 4 grid bUnit tests (10 columns rendered, next-page cursor
  carries last row, row click raises callback, badge classes for
  Failed vs Delivered), 2 service tests (filter+paging pass-through,
  default page size of 100). AuditLogPageScaffoldTests updated to
  provide the new ISiteRepository + IAuditLogQueryService stubs the
  page now resolves.
2026-05-20 20:02:46 -04:00
Joseph Doherty
13e84a76a7 feat(ui): AuditFilterBar component (#23 M7)
Adds the filter bar for the central Audit Log page (#23 M7-T2):

* AuditQueryModel — UI binding model with chip-style multi-select state for
  Channel/Kind/Status/Site, a Channel→Kind narrowing map (CachedSubmit and
  CachedResolve appear under both ApiOutbound and DbOutbound per
  Component-AuditLog.md §4), time-range presets (5min/1h/24h/Custom),
  free-text Instance/Script/Target/Actor searches and an Errors-only toggle.
  Collapses to the single-value AuditLogQueryFilter on ToFilter(utcNow);
  multi-select chips take the first selected per dimension and the
  Errors-only toggle pins Failed when Status chips are empty (chip-set wins
  otherwise) — documented Bundle B scope decision.

* AuditFilterBar.razor + .razor.cs — Blazor Server component (Bootstrap
  only, no third-party UI libs). Renders the 10 spec elements plus the
  Errors-only toggle, populates Site chips from ISiteRepository at
  initialisation, exposes [Parameter] EventCallback<AuditLogQueryFilter>
  OnFilterChanged and an optional NowUtcProvider seam for time-window tests.

* AuditFilterBarTests — 5 bUnit tests pinning element presence, Apply
  callback payload, Channel→Kind narrowing, Errors-only toggle precedence
  and the LastHour time-window collapse.
2026-05-20 19:56:49 -04:00
Joseph Doherty
12b86bea7a feat(ui): scaffold Audit Log page + Audit nav group (#23 M7)
Adds the central-side Audit Log page scaffold at /audit/log (M7-T1) and
introduces a new Audit nav group (M7-T9) that hosts both the new page and
the renamed Configuration Audit Log. The page body is intentionally a
heading + two placeholders — Bundle B will land the AuditFilterBar and
AuditResultsGrid behind them.

The Audit nav group sits between Monitoring and the per-user footer; both
items inside are Admin-only, so the section header lives inside the
RequireAdmin AuthorizeView (non-admins see no orphan header).

bUnit scaffold tests pin the page heading, the section header order, and
the two child links; the existing 338 CentralUI tests continue to pass.
2026-05-20 19:49:11 -04:00
Joseph Doherty
a9f45b0861 refactor(ui): rename AuditLog viewer to ConfigurationAuditLog under /audit/configuration (#23 M7)
The pre-M1 IAuditService config-change viewer moves out of the Monitoring
nav group to make room for the new Audit nav group (issue #23 M7). The
old route /monitoring/audit-log returns 404 (no redirect, per plan) — the
viewer is now reachable at /audit/configuration and labelled
"Configuration Audit Log" to disambiguate from the new Audit Log page
(arriving in #23 M7-T9). Inbound references in NavMenu, Dashboard, and
the Playwright nav tests are updated to the new route and label.
2026-05-20 19:46:09 -04:00
Joseph Doherty
2d13886286 docs(audit): add M7 Central UI implementation plan (#23)
8 bundles: page scaffold + rename, filter bar + grid, drilldown drawer,
drill-ins, KPI tiles, CSV export, permissions, Playwright. UI memory
constraints locked: Blazor Server + Bootstrap, no third-party libs.
2026-05-20 19:43:30 -04:00
Joseph Doherty
8c2382c2bc docs(audit): roadmap corrections after M6
M7 head records M6 realities:
- IAuditCentralHealthSnapshot exists; M7 dashboard reads it.
- SiteHealthReport.SiteAuditBacklog ready for per-site tiles.
- IAuditLogRepository.QueryAsync is the page's data source.
- Pre-existing AuditLog.razor rename to ConfigurationAuditLog.razor
  needs verification.
- OperationalAudit + AuditExport permission strings need to exist.
- Real gRPC pull client still deferred; doesn't gate M7.
2026-05-20 19:42:54 -04:00
Joseph Doherty
6d7a03e099 Merge branch 'feature/audit-log-m6-reconciliation-purge': Audit Log #23 M6 Reconciliation + Purge + Partition Maintenance + Health Metrics
M6 ships the self-healing + lifecycle-maintenance layer:
- PullAuditEvents RPC + site-side handler (sitestream.proto extended;
  ISiteAuditQueue.ReadPendingSinceAsync + MarkReconciledAsync).
- SiteAuditReconciliationActor central singleton: per-site 5-min cursor,
  pulls via mockable IPullAuditEventsClient seam (real gRPC client deferred
  to a follow-up), ingests via existing AuditLogIngestActor path.
- AuditLogPurgeActor + repository fix: SwitchOutPartitionAsync replaced
  with DROP INDEX → SWITCH PARTITION → DROP staging → CREATE INDEX dance.
  M1 NotSupportedException stub retired.
- AuditLogPartitionMaintenanceService IHostedService: monthly SPLIT
  RANGE roll-forward; explicit ALTER PARTITION SCHEME NEXT USED before
  each SPLIT (critical fix — ALL TO PRIMARY auto-populates NEXT USED
  only on the first SPLIT).
- Health metrics: SiteAuditBacklog (count + age + bytes) per site;
  SiteAuditTelemetryStalledTracker subscribes to EventStream;
  CentralAuditWriteFailures counter + IAuditCentralHealthSnapshot
  aggregator; central-side AuditRedactionFailure routed to the snapshot.

Site-side AuditRedactionFailure counter (M5) stays on the site bridge;
central uses the new CentralAuditRedactionFailureCounter.

Integration tests: outage+reconciliation (200 events buffered, drained
on recovery, no duplicates), partition-switch purge (drop-and-rebuild
verified), partition maintenance roll-forward (idempotent across two
ticks).

Shipped: 16 commits, ~75 net new tests. Full solution 24/24 test
projects green. Pre-existing M5-era Host.Tests CTS-disposal flake
incidentally fixed by the Bundle E lock-guarded _trackedDisposables
enumeration.

Latent EventStream-after-await bug in SiteAuditReconciliationActor
fixed in Bundle D (D0). Production wiring of real gRPC pull client
deferred to a follow-up bundle — actor + abstractions are testable
via the mockable seam. infra/* untouched on any branch commit.
2026-05-20 19:42:26 -04:00
Joseph Doherty
eb5fa8f2bc test(auditlog): partition maintenance roll-forward end-to-end (#23 M6) 2026-05-20 19:38:07 -04:00
Joseph Doherty
2138534581 test(auditlog): partition-switch purge end-to-end (#23 M6) 2026-05-20 19:36:17 -04:00
Joseph Doherty
66f6724c5d test(auditlog): outage + reconciliation recovery end-to-end (#23 M6) 2026-05-20 19:32:01 -04:00
Joseph Doherty
ef49b55cf6 fix(health): decouple AuditCentralHealthSnapshot from ActorSystem (#23 M6)
The snapshot's per-site stalled latch now lives on the snapshot itself
and is fed by SiteAuditTelemetryStalledTracker via ApplyStalled, removing
the chain that required ActorSystem at DI composition time. The tracker
is now constructed by AkkaHostedService once ActorSystem.Create returns,
with a lock-guarded auxiliary-disposable list so concurrent host
start/stop in tests cannot race the enumeration.
2026-05-20 19:25:28 -04:00
Joseph Doherty
2744011ce9 feat(health): surface AuditRedactionFailure in central snapshot (#23 M6) 2026-05-20 19:13:19 -04:00
Joseph Doherty
70ed8d4557 feat(health): CentralAuditWriteFailures + AuditCentralHealthSnapshot (#23 M6) 2026-05-20 19:11:52 -04:00
Joseph Doherty
42333a72ed feat(health): SiteAuditTelemetryStalledTracker subscribes to EventStream (#23 M6) 2026-05-20 19:07:44 -04:00
Joseph Doherty
e93f655ce4 feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6) 2026-05-20 19:02:01 -04:00
Joseph Doherty
75b060e0a8 feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6) 2026-05-20 18:51:43 -04:00
Joseph Doherty
cc2d6e91f1 fix(auditlog): SiteAuditReconciliationActor captures EventStream before await (#23 M6) 2026-05-20 18:39:19 -04:00
Joseph Doherty
660fdc4e93 feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
2026-05-20 18:36:31 -04:00