# Component: Site Call Audit ## Purpose Provides central, queryable audit and operational visibility for cached calls made by site scripts — `ExternalSystem.CachedCall()` and `Database.CachedWrite()`. Each such call carries a `TrackedOperationId`; sites report lifecycle telemetry to this component, which maintains a central audit record, computes KPIs, and relays Retry/Discard actions back to the owning site. This is the second centrally-hosted observability component for site store-and-forward activity (the Notification Outbox is the first). Unlike the Notification Outbox, Site Call Audit is **not a dispatcher** — it never delivers anything. Cached calls are delivered by the site's Store-and-Forward Engine against site-local external systems and databases, which central cannot reach. ## Location Central cluster only. A singleton actor (`SiteCallAuditActor`) on the active central node. Registered as component #22 in the Host role configuration. ## Responsibilities - Ingest cached-call lifecycle telemetry from sites into the central `SiteCalls` table. - Run periodic per-site reconciliation pulls so missed telemetry self-heals. - Compute point-in-time KPIs (global and per-site) from the `SiteCalls` table. - Relay operator Retry/Discard actions for parked cached calls to the owning site over the command/control channel. - Purge terminal audit rows after a configurable retention window. ## The `SiteCalls` Table Lives in the central MS SQL configuration database — a sibling of the `Notifications` table. One row per `TrackedOperationId`: - **TrackedOperationId** — GUID, primary key. Generated site-side at call time. - **SourceSite** — site that issued the call. - **Kind** — `TrackedOperationKind` enum: `ExternalCall` or `DatabaseWrite`. - **TargetSummary** — external system + method name for an `ExternalCall`; for a `DatabaseWrite`, just the database connection name — intentionally not the SQL statement or table, a deliberate scoping choice. - **Status** — `Pending`, `Retrying`, `Delivered`, `Parked`, `Failed`, `Discarded`. - **RetryCount** — attempts so far. - **LastError** — most recent error detail, if any. - **Provenance** — source instance / script. - **CreatedAtUtc**, **UpdatedAtUtc**, **TerminalAtUtc** — key timestamps. ## Status Lifecycle `Pending → Retrying → Delivered / Parked / Failed / Discarded` - **Pending** — non-terminal: buffered after a transient failure, awaiting its first retry. - **Retrying** — non-terminal: undergoing retry attempts. - **Delivered** — terminal, success. A cached call that succeeds on its first immediate attempt is recorded directly as `Delivered`. - **Parked** — non-terminal: transient retries exhausted; awaiting manual action. - **Failed** — terminal: permanent failure (e.g. HTTP 4xx). The error was also returned synchronously to the calling script; the record captures it. `Failed` rows are **not operator-actionable** — see Retry / Discard Relay. - **Discarded** — terminal, reached **only by operator action** on a `Parked` row. The row is kept (not deleted) so the table remains a complete audit record. The site is the source of truth. The `SiteCalls` row is an eventually-consistent mirror — never queried by scripts (`Tracking.Status()` is answered site-locally). ## Ingest & Idempotency Telemetry ingestion is **insert-if-not-exists** keyed on `TrackedOperationId`, then **upsert-on-newer-status**. The lifecycle is monotonic, so status only advances and never regresses; at-least-once and out-of-order telemetry are therefore harmless. ## Reconciliation Because telemetry is best-effort, `SiteCallAuditActor` periodically — and on site reconnect — pulls "all tracking rows changed since cursor X" from each site. Gaps left by lost telemetry self-heal. Central converges to the site; the site never depends on central. ## Retry / Discard Relay Parked cached calls live in the owning site's S&F buffer. Operator Retry/Discard from the Central UI is relayed to that site as a `RetryParkedOperation` / `DiscardParkedOperation` command over the command/control channel. The site applies the change and emits telemetry reflecting the new state; central never mutates the `SiteCalls` row directly. If the site is offline the command fails fast and the UI surfaces a "site unreachable" message. Only `Parked` rows are operator-actionable. `Failed` rows offer no Retry or Discard: a permanent failure (e.g. HTTP 4xx) would simply fail again, and the error was already returned synchronously to the calling script — there is nothing for an operator to recover. ## KPIs Point-in-time, computed from the `SiteCalls` table, global and per-source-site, mirroring the Notification Outbox KPI shape: - Buffered count (`Pending` + `Retrying`) - Parked count - Failed-last-interval - Delivered-last-interval - Oldest-pending age - Stuck count — `Pending`/`Retrying` older than a configurable threshold (default 10 minutes); display-only, no escalation. ## Retention Daily purge of terminal rows (`Delivered`, `Failed`, `Discarded`) after a configurable window (default 365 days), matching the `Notifications` purge. ## Dependencies - **Configuration Database**: hosts the `SiteCalls` table and its repository. - **Central–Site Communication**: receives cached-call telemetry and reconciliation responses; sends Retry/Discard commands. - **Store-and-Forward Engine**: the site-side origin of cached-call telemetry and the executor of relayed Retry/Discard commands. - **Commons**: `TrackedOperationId`, status enum, telemetry message contracts. ## Interactions - **Central UI**: the Site Calls page queries this component and issues Retry/Discard actions. - **Health Monitoring**: surfaces Site Call Audit KPI tiles on the dashboard. - **Cluster Infrastructure**: hosts the `SiteCallAuditActor` singleton with active/standby failover.