# Cached Call Tracking Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. **Goal:** Give cached external system calls and cached database writes a trackable `TrackedOperationId`, backed by a site-local tracking table and a new central `Site Call Audit` component, under a tracking model unified with `Notify.Send`. **Architecture:** Approach B from the design doc — a sibling central component (`Site Call Audit`), not a merged outbox. The site stays the source of truth for cached-call status; central audit is an eventually-consistent mirror fed by best-effort telemetry plus a reconciliation pull. Delivery of cached calls remains site-local. **Tech Stack:** This is a design-documentation change. "Implementation" means editing Markdown design documents under `docs/requirements/`, plus `README.md` and `CLAUDE.md`. No source code is touched. The authoritative design is `docs/plans/2026-05-19-cached-call-tracking-design.md` — read it before starting. **Working conventions (from `CLAUDE.md`):** - Edit documents in place; no copies or backups. - Component docs follow: Purpose, Location, Responsibilities, design sections, Dependencies, Interactions. - Keep cross-references accurate across all docs. - Use `git diff` to review before committing. **Per-task workflow (replaces TDD for this docs project):** 1. Read the target file in full first. 2. Make the edits described. 3. **Verify**: run `git diff ` and confirm the change reads correctly and matches the design doc. 4. **Cross-reference check**: run the grep given in the task; confirm no stale references. 5. **Commit** with the given message. --- ### Task 1: Create the Site Call Audit component document **Files:** - Create: `docs/requirements/Component-SiteCallAudit.md` **Step 1: Write the new component doc** Create the file following the standard component structure. Content: ```markdown # Component: Site Call Audit ## Purpose Provides central, queryable audit and operational visibility for cached calls made by site scripts — `ExternalSystem.CachedCall()` and `Database.CachedWrite()`. Each such call carries a `TrackedOperationId`; sites report lifecycle telemetry to this component, which maintains a central audit record, computes KPIs, and relays Retry/Discard actions back to the owning site. This is the second centrally-hosted observability component for site store-and-forward activity (the Notification Outbox is the first). Unlike the Notification Outbox, Site Call Audit is **not a dispatcher** — it never delivers anything. Cached calls are delivered by the site's Store-and-Forward Engine against site-local external systems and databases, which central cannot reach. ## Location Central cluster only. A singleton actor (`SiteCallAuditActor`) on the active central node. Registered as component #22 in the Host role configuration. ## Responsibilities - Ingest cached-call lifecycle telemetry from sites into the central `SiteCalls` table. - Run periodic per-site reconciliation pulls so missed telemetry self-heals. - Compute point-in-time KPIs (global and per-site) from the `SiteCalls` table. - Relay operator Retry/Discard actions for parked cached calls to the owning site over the command/control channel. - Purge terminal audit rows after a configurable retention window. ## The `SiteCalls` Table Lives in the central MS SQL configuration database — a sibling of the `Notifications` table. One row per `TrackedOperationId`: - **TrackedOperationId** — GUID, primary key. Generated site-side at call time. - **SourceSite** — site that issued the call. - **Kind** — `ExternalCall` or `DatabaseWrite`. - **TargetSummary** — external system + method name, or database connection name. - **Status** — `Pending`, `Retrying`, `Delivered`, `Parked`, `Failed`, `Discarded`. - **RetryCount** — attempts so far. - **LastError** — most recent error detail, if any. - **Provenance** — source instance / script. - **CreatedAtUtc**, **UpdatedAtUtc**, **TerminalAtUtc** — key timestamps. ## Status Lifecycle `Pending → Retrying → Delivered / Parked / Failed / Discarded` - **Delivered** — succeeded. A cached call that succeeds on its first immediate attempt is recorded directly as `Delivered`. - **Parked** — transient retries exhausted; awaiting manual action. - **Failed** — permanent failure (e.g. HTTP 4xx). The error was also returned synchronously to the calling script; the record captures it. - **Discarded** — an operator discarded a parked operation. The site is the source of truth. The `SiteCalls` row is an eventually-consistent mirror — never queried by scripts (`Tracking.Status()` is answered site-locally). ## Ingest & Idempotency Telemetry ingestion is **insert-if-not-exists** keyed on `TrackedOperationId`, then **upsert-on-newer-status**. The lifecycle is monotonic, so status only advances and never regresses; at-least-once and out-of-order telemetry are therefore harmless. ## Reconciliation Because telemetry is best-effort, `SiteCallAuditActor` periodically — and on site reconnect — pulls "all tracking rows changed since cursor X" from each site. Gaps left by lost telemetry self-heal. Central converges to the site; the site never depends on central. ## Retry / Discard Relay Parked cached calls live in the owning site's S&F buffer. Operator Retry/Discard from the Central UI is relayed to that site as a `RetryParkedOperation` / `DiscardParkedOperation` command over the command/control channel. The site applies the change and emits telemetry reflecting the new state; central never mutates the `SiteCalls` row directly. If the site is offline the command fails fast and the UI surfaces a "site unreachable" message. ## KPIs Point-in-time, computed from the `SiteCalls` table, global and per-source-site, mirroring the Notification Outbox KPI shape: - Buffered count (`Pending` + `Retrying`) - Parked count - Failed-last-interval - Delivered-last-interval - Oldest-pending age - Stuck count — `Pending`/`Retrying` older than a configurable threshold (default 10 minutes); display-only, no escalation. ## Retention Daily purge of terminal rows (`Delivered`, `Failed`, `Discarded`) after a configurable window (default 365 days), matching the `Notifications` purge. ## Dependencies - **Configuration Database**: hosts the `SiteCalls` table and its repository. - **Central–Site Communication**: receives cached-call telemetry and reconciliation responses; sends Retry/Discard commands. - **Store-and-Forward Engine**: the site-side origin of cached-call telemetry and the executor of relayed Retry/Discard commands. - **Commons**: `TrackedOperationId`, status enum, telemetry message contracts. ## Interactions - **Central UI**: the Site Calls page queries this component and issues Retry/Discard actions. - **Health Monitoring**: surfaces Site Call Audit KPI tiles on the dashboard. - **Cluster Infrastructure**: hosts the `SiteCallAuditActor` singleton with active/standby failover. ``` **Step 2: Verify** Run: `git diff --stat` and open the new file. Expected: structure matches other `Component-*.md` files (Purpose → Interactions). **Step 3: Commit** ```bash git add docs/requirements/Component-SiteCallAudit.md git commit -m "docs(requirements): add Site Call Audit component (#22)" ``` --- ### Task 2: Add shared tracking contracts to Commons **Files:** - Modify: `docs/requirements/Component-Commons.md` — sections `REQ-COM-1` (data types), `REQ-COM-5` (message contracts) **Step 1: Edit the doc** In `### REQ-COM-1: Shared Data Type System`, add `TrackedOperationId` as a shared type: a GUID identifying any tracked store-and-forward operation (`CachedCall`, `CachedWrite`, `Notify.Send`), generated caller-side at the site at call time, doubling as the telemetry idempotency key. Note that the existing `NotificationId` is the notification-domain name for this same concept. Add a shared `TrackedOperationStatus` enum: `Pending`, `Retrying`, `Delivered`, `Parked`, `Failed`, `Discarded`. In `### REQ-COM-5: Cross-Component Message Contracts`, add the cached-call telemetry and command contracts (additive-only, per REQ-COM-5a): - `CachedCallTelemetry` — `TrackedOperationId`, source site, `Kind`, target summary, status, retry count, last error, timestamps, provenance. - `CachedCallReconcileRequest` / `CachedCallReconcileResponse` — cursor-based per-site pull of changed tracking rows. - `RetryParkedOperation` / `DiscardParkedOperation` — central→site commands keyed by `TrackedOperationId` (generalize naming so they cover cached calls, not only legacy "parked message" wording). **Step 2: Verify** Run: `git diff docs/requirements/Component-Commons.md` Expected: additive only; no existing type or contract removed/renamed. **Step 3: Commit** ```bash git add docs/requirements/Component-Commons.md git commit -m "docs(requirements): add TrackedOperationId and cached-call contracts to Commons" ``` --- ### Task 3: Update the Store-and-Forward Engine doc **Files:** - Modify: `docs/requirements/Component-StoreAndForward.md` — `Responsibilities`, `Message Lifecycle`, `Persistence`, `Parked Message Management`, `Message Format` **Step 1: Edit the doc** - **Responsibilities / Persistence**: introduce the **site-local operation tracking table** — a SQLite table alongside the S&F buffer DB, holding one row per `TrackedOperationId` for cached calls regardless of outcome. It is the status record; the S&F buffer remains only the retry mechanism. State that `Tracking.Status(id)` reads this table, that it is the source of truth, and that terminal rows are purged after a configurable window (default 7 days). - **Message Lifecycle**: a cached call that succeeds on its first immediate attempt is written directly as a terminal `Delivered` tracking row and never enters the S&F buffer. A buffered cached-call message references its `TrackedOperationId`. - Add a **telemetry emission** note: on every lifecycle transition the site emits `CachedCallTelemetry` to central (best-effort, at-least-once, idempotent on the ID) and responds to `CachedCallReconcileRequest` pulls. - **Parked Message Management**: note that Retry/Discard of parked cached calls can be driven by central via `RetryParkedOperation`/`DiscardParkedOperation`, after which the site emits telemetry reflecting the new state. - **Message Format**: add `TrackedOperationId` to the listed per-message fields. Leave the notification category behavior unchanged. **Step 2: Verify** Run: `git diff docs/requirements/Component-StoreAndForward.md` Expected: cached-call and DB-write categories gain tracking; notification flow untouched. **Step 3: Commit** ```bash git add docs/requirements/Component-StoreAndForward.md git commit -m "docs(requirements): add site-local tracking table and telemetry to Store-and-Forward" ``` --- ### Task 4: Update the External System Gateway doc **Files:** - Modify: `docs/requirements/Component-ExternalSystemGateway.md` — `Cached Write`, `External System Call Modes`, `Call Timeout & Error Handling` **Step 1: Edit the doc** - `### Cached (Store-and-Forward)` and `### Cached Write (Store-and-Forward)`: state that `CachedCall`/`CachedWrite` now return a `TrackedOperationId`. They are no longer "fire-and-forget" with no handle — replace that wording with "deferred-delivery, returns a tracking handle". Immediate success → terminal `Delivered` record; transient failure → buffered, `Pending`/`Retrying`. - Permanent failure: the error is still returned synchronously to the script (unchanged) **and** recorded as a terminal `Failed` tracking record. - Keep the idempotency note — duplicate delivery on retry is still the caller's responsibility. - Add a one-line pointer that status is observable via `Tracking.Status(id)` and centrally via the Site Call Audit component. **Step 2: Verify** Run: `grep -n "fire-and-forget\|TrackedOperationId" docs/requirements/Component-ExternalSystemGateway.md` Expected: "fire-and-forget" no longer describes cached calls; `TrackedOperationId` present. **Step 3: Commit** ```bash git add docs/requirements/Component-ExternalSystemGateway.md git commit -m "docs(requirements): cached calls return TrackedOperationId in ESG" ``` --- ### Task 5: Update the Site Runtime Script Runtime API **Files:** - Modify: `docs/requirements/Component-SiteRuntime.md` — `### External Systems`, `### Notifications`, `### Database Access` under `## Script Runtime API` **Step 1: Edit the doc** - `### External Systems`: `ExternalSystem.CachedCall(...)` now returns a `TrackedOperationId`; drop "fire-and-forget", say it returns a tracking handle. - `### Database Access`: `Database.CachedWrite(...)` now returns a `TrackedOperationId`. - Add the unified accessor `Tracking.Status("trackedOperationId")` — returns a status record (status, retry count, last error, key timestamps) for any tracked operation, answered site-locally and authoritatively for cached calls. - `### Notifications`: note that `Notify.Status(...)` is retained as a thin alias of `Tracking.Status(...)`; `Notify.Send` returns a `TrackedOperationId` (the value historically called `NotificationId`). **Step 2: Verify** Run: `git diff docs/requirements/Component-SiteRuntime.md` Expected: all three cached/async producers return `TrackedOperationId`; `Tracking.Status` documented. **Step 3: Commit** ```bash git add docs/requirements/Component-SiteRuntime.md git commit -m "docs(requirements): add Tracking.Status and cached-call handles to Script Runtime API" ``` --- ### Task 6: Update the Central–Site Communication doc **Files:** - Modify: `docs/requirements/Component-Communication.md` — `### 8. Remote Queries`, and add a new pattern for cached-call telemetry **Step 1: Edit the doc** - Add a new communication pattern (e.g. `### 10. Cached Call Telemetry (Site → Central)`): the site S&F Engine pushes `CachedCallTelemetry` on every lifecycle transition; best-effort, at-least-once, idempotent on `TrackedOperationId`; transport is ClusterClient command/control. Also describe the reconciliation pull (`CachedCallReconcileRequest`/`Response`) initiated by `SiteCallAuditActor`. - `### 8. Remote Queries (Central → Site)`: generalize the "Retry or discard parked messages" command line to also cover cached calls keyed by `TrackedOperationId` (`RetryParkedOperation` / `DiscardParkedOperation`). **Step 2: Verify** Run: `grep -n "Telemetry\|RetryParkedOperation" docs/requirements/Component-Communication.md` Expected: new telemetry pattern and generalized command present. **Step 3: Commit** ```bash git add docs/requirements/Component-Communication.md git commit -m "docs(requirements): add cached-call telemetry pattern to Communication" ``` --- ### Task 7: Update the Configuration Database doc **Files:** - Modify: `docs/requirements/Component-ConfigurationDatabase.md` — `## Database Schema` (add a `### Site Calls` subsection), `## Scheduled Maintenance` **Step 1: Edit the doc** - Under `## Database Schema`, add a `### Site Calls` subsection describing the `SiteCalls` table (columns per Task 1's "The `SiteCalls` Table" list), noting it is populated only by Site Call Audit telemetry/reconciliation, and that ingestion is insert-if-not-exists + upsert-on-newer-status. - Under `## Scheduled Maintenance`, add a `### SiteCalls Table Purge` subsection mirroring the `### Notifications Table Purge` wording: daily purge of terminal rows after a configurable window (default 365 days). **Step 2: Verify** Run: `grep -n "SiteCalls" docs/requirements/Component-ConfigurationDatabase.md` Expected: schema subsection and purge subsection both present. **Step 3: Commit** ```bash git add docs/requirements/Component-ConfigurationDatabase.md git commit -m "docs(requirements): add SiteCalls table and purge to Configuration Database" ``` --- ### Task 8: Update the Central UI doc **Files:** - Modify: `docs/requirements/Component-CentralUI.md` — `## Workflows / Pages` **Step 1: Edit the doc** Add a `### Site Calls (Deployment Role)` page after the `### Notification Outbox (Deployment Role)` section: - Queryable list of cached calls (`ExternalCall` + `DatabaseWrite` only — notifications keep their own Notification Outbox page). - Filters: site, kind, status, time range. - Columns: timestamp, site, kind, target summary, status badge, retry count, last error. - Retry / Discard actions on `Parked` rows; "site unreachable" handling when the owning site is offline. - Custom Blazor Server + Bootstrap components, no third-party frameworks. **Step 2: Verify** Run: `grep -n "Site Calls" docs/requirements/Component-CentralUI.md` Expected: new page section present, scoped to cached calls. **Step 3: Commit** ```bash git add docs/requirements/Component-CentralUI.md git commit -m "docs(requirements): add Site Calls page to Central UI" ``` --- ### Task 9: Update the Health Monitoring doc **Files:** - Modify: `docs/requirements/Component-HealthMonitoring.md` — add a `## Site Call Audit KPIs` section after `## Notification Outbox KPIs` **Step 1: Edit the doc** Add a `## Site Call Audit KPIs` section mirroring `## Notification Outbox KPIs`: the dashboard surfaces Site Call Audit headline KPI tiles (buffered, parked, failed-last-interval, delivered-last-interval, oldest-pending age, stuck count), computed point-in-time by the Site Call Audit component, global and per-site. Stuck is display-only. **Step 2: Verify** Run: `grep -n "Site Call Audit KPIs" docs/requirements/Component-HealthMonitoring.md` Expected: section present. **Step 3: Commit** ```bash git add docs/requirements/Component-HealthMonitoring.md git commit -m "docs(requirements): add Site Call Audit KPIs to Health Monitoring" ``` --- ### Task 10: Note the shared model in Notification docs **Files:** - Modify: `docs/requirements/Component-NotificationService.md` — `## Script API` - Modify: `docs/requirements/Component-NotificationOutbox.md` — `## Purpose` or `### Status Lifecycle` **Step 1: Edit the doc** - `Component-NotificationService.md` `## Script API`: note that `Notify.Send`'s `NotificationId` is a `TrackedOperationId` (shared Commons type) and `Notify.Status` is an alias of the unified `Tracking.Status`. - `Component-NotificationOutbox.md`: add a sentence that the Notification Outbox and the Site Call Audit component share the `TrackedOperationId` tracking model and status lifecycle, but differ in delivery locality — the Notification Outbox delivers; Site Call Audit only audits. Do not change any notification behavior. **Step 2: Verify** Run: `git diff docs/requirements/Component-NotificationService.md docs/requirements/Component-NotificationOutbox.md` Expected: additive notes only, no behavior change. **Step 3: Commit** ```bash git add docs/requirements/Component-NotificationService.md docs/requirements/Component-NotificationOutbox.md git commit -m "docs(requirements): note shared TrackedOperationId model in notification docs" ``` --- ### Task 11: Update the README component table **Files:** - Modify: `README.md` — component table and any architecture diagram component count **Step 1: Edit the doc** Add row 22 — **Site Call Audit** — to the component table: "Central component auditing site cached calls (`CachedCall`/`CachedWrite`); `SiteCalls` table, telemetry ingest, reconciliation, KPIs, central→site Retry/Discard relay." Update any "21 components" count to 22. **Step 2: Verify** Run: `grep -rn "21 component\|22 component" README.md` Expected: count reads 22; no stale "21". **Step 3: Commit** ```bash git add README.md git commit -m "docs: add Site Call Audit to README component table" ``` --- ### Task 12: Update CLAUDE.md **Files:** - Modify: `CLAUDE.md` — `## Current Component List`, `## Key Design Decisions` **Step 1: Edit the doc** - Change the heading `## Current Component List (21 components)` to `(22 components)` and add item 22 — **Site Call Audit** — with a one-line description. - Under `## Key Design Decisions`, in `### Store-and-Forward` (or `### UI & Monitoring`), add bullets summarizing: cached calls return a `TrackedOperationId`; site-local tracking table is the status source of truth; new central Site Call Audit component mirrors status via best-effort telemetry + reconciliation; cached-call delivery stays site-local; unified `Tracking.Status` accessor; `Failed` terminal state for permanent failures. **Step 2: Verify** Run: `grep -n "22 components\|Site Call Audit" CLAUDE.md` Expected: count is 22; component listed; design decisions present. **Step 3: Commit** ```bash git add CLAUDE.md git commit -m "docs: record cached-call tracking in CLAUDE.md" ``` --- ### Task 13: Final cross-reference consistency pass **Files:** - Potentially any `docs/requirements/Component-*.md`, `README.md`, `CLAUDE.md` **Step 1: Sweep for stale or missing references** Run each and review: ```bash grep -rn "fire-and-forget" docs/requirements/ grep -rn "21 component" README.md CLAUDE.md grep -rln "Site Call Audit" docs/requirements/ README.md CLAUDE.md grep -rn "TrackedOperationId" docs/requirements/ ``` Expected: no "fire-and-forget" describing cached calls; no "21 component" left; Site Call Audit referenced by its dependents (Communication, Configuration Database, Central UI, Health Monitoring, Commons); `TrackedOperationId` used consistently. **Step 2: Confirm new component's Dependencies/Interactions are reciprocated** Verify each component named in `Component-SiteCallAudit.md` Dependencies/Interactions also references Site Call Audit where appropriate. **Step 3: Fix any gaps found, then commit** ```bash git add -A git commit -m "docs(requirements): reconcile cross-references for Site Call Audit" ``` If no gaps are found, skip the commit and note the plan is complete. --- ## Done All cached-call tracking design changes are recorded. The design rationale lives in `docs/plans/2026-05-19-cached-call-tracking-design.md`.