docs: record cached-call tracking in CLAUDE.md

This commit is contained in:
Joseph Doherty
2026-05-19 12:03:17 -04:00
parent d8dfbc79f4
commit 38b51ef894

View File

@@ -36,7 +36,7 @@ This project contains design documentation for a distributed SCADA system built
- Use `git diff` to review changes before committing. - Use `git diff` to review changes before committing.
- Commit related changes together with a descriptive message summarizing the design decision. - Commit related changes together with a descriptive message summarizing the design decision.
## Current Component List (21 components) ## Current Component List (22 components)
1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs.
2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle.
@@ -59,6 +59,7 @@ This project contains design documentation for a distributed SCADA system built
19. CLI — Command-line tool using HTTP Management API, System.CommandLine, JSON/table output. 19. CLI — Command-line tool using HTTP Management API, System.CommandLine, JSON/table output.
20. Traefik Proxy — Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. 20. Traefik Proxy — Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover.
21. Notification Outbox — Central component ingesting store-and-forwarded notifications, `Notifications` audit table, dispatcher loop, retry/parking, delivery KPIs. 21. Notification Outbox — Central component ingesting store-and-forwarded notifications, `Notifications` audit table, dispatcher loop, retry/parking, delivery KPIs.
22. Site Call Audit — Central component auditing site cached calls (`CachedCall`/`CachedWrite`); `SiteCalls` audit table, telemetry ingest, reconciliation, KPIs, central→site Retry/Discard relay; sites remain the source of truth.
## Key Design Decisions (for context across sessions) ## Key Design Decisions (for context across sessions)
@@ -120,6 +121,11 @@ This project contains design documentation for a distributed SCADA system built
- Site→central handoff is at-least-once: ack-after-persist plus insert-if-not-exists on `NotificationId`. - Site→central handoff is at-least-once: ack-after-persist plus insert-if-not-exists on `NotificationId`.
- No Akka replication — MS SQL is the HA store; daily purge of terminal rows after a configurable window (default 365 days). - No Akka replication — MS SQL is the HA store; daily purge of terminal rows after a configurable window (default 365 days).
- Notification Outbox retry reuses central SMTP max-retry-count and fixed interval. - Notification Outbox retry reuses central SMTP max-retry-count and fixed interval.
- Cached calls (`ExternalSystem.CachedCall`, `Database.CachedWrite`) return a `TrackedOperationId` tracking handle, unified with `Notify.Send`'s existing tracking model (`Notify.Status` retained as a thin alias).
- A site-local operation tracking table (SQLite, alongside the S&F buffer) is the source of truth for cached-call status; `Tracking.Status(id)` reads it site-locally and authoritatively; terminal rows purged after a configurable window (default 7 days).
- Unified tracking status lifecycle `Pending → Retrying → Delivered / Parked / Failed / Discarded`; `Failed` = permanent failure (also returned synchronously to the calling script). No `Forwarding` state for cached calls.
- Site Call Audit (#22): central `SiteCallAuditActor` singleton with a `SiteCalls` audit table (central MS SQL) fed by best-effort site telemetry plus periodic reconciliation pulls — an eventually-consistent mirror, NOT a dispatcher; cached-call delivery stays site-local. Ingest is insert-if-not-exists then upsert-on-newer-status.
- Central UI Site Calls page + central→site `RetryParkedOperation`/`DiscardParkedOperation` relay for parked cached calls; central never mutates the `SiteCalls` row directly.
### Security & Auth ### Security & Auth
- Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required. - Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required.
@@ -144,6 +150,7 @@ This project contains design documentation for a distributed SCADA system built
- Notification Outbox KPIs are central-computed point-in-time from the `Notifications` table (global + per-source-site): queue depth, stuck count, parked count, delivered-last-interval, oldest-pending age. - Notification Outbox KPIs are central-computed point-in-time from the `Notifications` table (global + per-source-site): queue depth, stuck count, parked count, delivered-last-interval, oldest-pending age.
- Stuck = `Pending`/`Retrying` older than a configurable age threshold (default 10 min) — display-only (KPI count + row badge), no escalation/alerting. - Stuck = `Pending`/`Retrying` older than a configurable age threshold (default 10 min) — display-only (KPI count + row badge), no escalation/alerting.
- Headline KPI tiles surface on the Health dashboard; a new Central UI Notification Outbox page offers a queryable list with Retry/Discard actions on parked notifications. - Headline KPI tiles surface on the Health dashboard; a new Central UI Notification Outbox page offers a queryable list with Retry/Discard actions on parked notifications.
- Site Call Audit KPIs are central-computed point-in-time from the `SiteCalls` table (global + per-site), mirroring the Notification Outbox KPI shape; tiles surface on the Health dashboard alongside a queryable Central UI Site Calls page with Retry/Discard on parked rows.
### Code Organization ### Code Organization
- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database. - Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.