docs(audit): align alog.md + Component-AuditLog.md vocab with M1 enums (#23)

The M1 implementation (Bundle A) committed concrete AuditChannel /
AuditKind / AuditStatus enums that reflect CLAUDE.md's locked
cached-call lifecycle decisions. The older alog.md and
Component-AuditLog.md narratives still used pre-M1 vocabulary
(Success / TransientFailure / PermanentFailure / Enqueued / Retrying /
SyncCall / CachedEnqueued / Attempt / Terminal / Completed). This
commit reconciles both docs to the M1 vocabulary:

  AuditChannel  : ApiOutbound, DbOutbound, Notification, ApiInbound
  AuditKind (10): ApiCall, ApiCallCached, DbWrite, DbWriteCached,
                  NotifySend, NotifyDeliver, InboundRequest,
                  InboundAuthFailure, CachedSubmit, CachedResolve
  AuditStatus(8): Submitted, Forwarded, Attempted, Delivered, Failed,
                  Parked, Discarded, Skipped

Updates:
  - Status column description + worked examples use the new 8 values.
  - Kind table flattened from per-channel groupings to a single flat
    list of the 10 discriminators (no more SyncCall / Cached* /
    Attempt / Terminal / Completed).
  - Cached-call lifecycle examples rewritten to the
    CachedSubmit -> Forwarded -> Attempted... -> CachedResolve shape.
  - Notification lifecycle examples rewritten to
    NotifySend(Submitted) -> NotifyDeliver(Attempted) ->
    NotifyDeliver(Delivered/Parked/Discarded).
  - Inbound API examples split into InboundRequest (success path) and
    InboundAuthFailure (401 path).
  - 'Errors only' UI toggle, audit-error-rate KPI, and payload-cap
    decision (#6 in §16) all switched from 'non-Success' to
    Status IN ('Failed', 'Parked', 'Discarded').
  - Per-site event-rate table in §13.1 renamed to the new kinds.

Pure design correction; no operational behavior change. Per the
goal-prompt invariant #6, alog.md may change when a design correction
is committed before the affected code change — this commit is that
correction, landed ahead of the M1 merge so the merge order reads
design-first, code-second.

No code, test, or infra file changes.
This commit is contained in:
Joseph Doherty
2026-05-20 11:56:34 -04:00
parent da68a2af7b
commit 3592e74085
2 changed files with 98 additions and 73 deletions

95
alog.md
View File

@@ -109,14 +109,14 @@ Single wide table, polymorphic by `Channel` + `Kind` discriminators, JSON payloa
| `OccurredAtUtc` | `datetime2` | When the event happened (call returned, retry attempted, etc.). |
| `IngestedAtUtc` | `datetime2` | When central persisted the row (lags `OccurredAtUtc` for site-originated rows). |
| `Channel` | `varchar(32)` | `ApiOutbound` \| `DbOutbound` \| `Notification` \| `ApiInbound`. |
| `Kind` | `varchar(32)` | Channel-specific event kind (see table below). |
| `Kind` | `varchar(32)` | Event kind discriminator (see kinds list below). |
| `CorrelationId` | `uniqueidentifier` NULL | Ties multi-event operations together. `TrackedOperationId` for cached calls, `NotificationId` for notifications, request-id for inbound API. NULL for sync one-shot calls. |
| `SourceSiteId` | `varchar(64)` NULL | NULL for central-originated events (inbound API, central notification dispatch). |
| `SourceInstanceId` | `varchar(128)` NULL | Instance whose script initiated the action (when applicable). |
| `SourceScript` | `varchar(128)` NULL | Script name within the instance. |
| `Actor` | `varchar(128)` NULL | Inbound API: API key name. Outbound: script identity. Central: system user. |
| `Target` | `varchar(256)` NULL | Outbound API: external system + method. DB: connection name. Notification: list name. Inbound API: method name. |
| `Status` | `varchar(32)` | Outcome of *this event*: `Success`, `TransientFailure`, `PermanentFailure`, `Enqueued`, `Retrying`, `Delivered`, `Parked`, `Discarded`. |
| `Status` | `varchar(32)` | Outcome of *this event*: `Submitted`, `Forwarded`, `Attempted`, `Delivered`, `Failed`, `Parked`, `Discarded`, `Skipped`. |
| `HttpStatus` | `int` NULL | HTTP-bearing events only. |
| `DurationMs` | `int` NULL | Call/attempt duration. |
| `ErrorMessage` | `nvarchar(1024)` NULL | Truncated; `ErrorDetail` for full text. |
@@ -135,14 +135,20 @@ Single wide table, polymorphic by `Channel` + `Kind` discriminators, JSON payloa
- `IX_AuditLog_Target_Occurred (Target, OccurredAtUtc)` — "what did we send to system X."
- Partitioning by month on `OccurredAtUtc` from day one (purge becomes a partition switch instead of a delete storm).
**`Kind` values by channel:**
**`Kind` values (flat — 10 discriminators across all channels):**
| Channel | Kinds |
| Kind | Fires when |
|---|---|
| `ApiOutbound` | `SyncCall`, `CachedEnqueued`, `CachedAttempt`, `CachedTerminal` |
| `DbOutbound` | `SyncWrite`, `SyncRead`, `CachedEnqueued`, `CachedAttempt`, `CachedTerminal` |
| `Notification` | `Enqueued`, `Attempt`, `Terminal` |
| `ApiInbound` | `Completed` (one row per request, written at request end with final status) |
| `ApiCall` | Sync `ExternalSystem.Call(...)` returns (success or permanent failure). One row per call. |
| `ApiCallCached` | A cached outbound-API attempt records its forward-ack (`Forwarded`) or each retry (`Attempted`). |
| `DbWrite` | Sync `Database.Connection().Execute*(...)` / `ExecuteReader(...)` completes. One row per call. |
| `DbWriteCached` | A cached outbound-DB attempt records its forward-ack (`Forwarded`) or each retry (`Attempted`). |
| `NotifySend` | Script's `Notify.Send(...)` is enqueued on the site — first row in a notification's lifecycle (`Status=Submitted`). |
| `NotifyDeliver` | Central Notification Outbox dispatcher records a delivery attempt (`Attempted`) or terminal outcome (`Delivered`/`Parked`/`Discarded`). |
| `InboundRequest` | An inbound API request completes — one row per request, written at request end with final status. |
| `InboundAuthFailure` | An inbound API request was rejected at the auth boundary (bad/missing key). One row, `Status=Failed`, `HttpStatus=401`. |
| `CachedSubmit` | Script-side enqueue of a cached call (`ExternalSystem.CachedCall` / `Database.CachedWrite`); first row in the cached-call lifecycle, written to site SQLite before any forward attempt. |
| `CachedResolve` | Terminal row for a cached operation — `Status` = `Delivered` / `Failed` / `Parked` / `Discarded`. |
### Site: `AuditLog` (SQLite)
@@ -215,13 +221,13 @@ This is the same self-healing pattern Site Call Audit uses for `SiteCalls`.
Events that originate at central never touch site SQLite:
- **Inbound API** — request completed at central; one `ApiInbound`/`Completed` row written via `ICentralAuditWriter` synchronously inside the request handler middleware before the HTTP response is flushed.
- **Notification Outbox dispatcher** — each delivery attempt writes a `Notification`/`Attempt` row; terminal status writes a `Notification`/`Terminal` row. (The site-originated `Notification`/`Enqueued` row arrives via §6.2.)
- **Inbound API** — request completed at central; one `ApiInbound`/`InboundRequest` row written via `ICentralAuditWriter` synchronously inside the request handler middleware before the HTTP response is flushed. Auth failures emit `ApiInbound`/`InboundAuthFailure` instead.
- **Notification Outbox dispatcher** — each delivery attempt writes a `Notification`/`NotifyDeliver` row with `Status=Attempted`; terminal status writes a `Notification`/`NotifyDeliver` row with `Status=Delivered`/`Parked`/`Discarded`. (The site-originated `Notification`/`NotifySend` row, `Status=Submitted`, arrives via §6.2.)
Central direct-writes use the same insert-if-not-exists semantics keyed on `EventId`, so a retried request handler can't produce duplicates.
### 6.5 Cached operations — site emits, central writes twice
For `ExternalSystem.CachedCall` and `Database.CachedWrite`, the **site** is the source of truth for every audit row. The site writes each lifecycle event (`CachedEnqueued`, `CachedAttempt`, `CachedTerminal`) to its local SQLite `AuditLog` on the hot path (or on the retry tick for `CachedAttempt`), then forwards via the same telemetry channel described in §6.2. The telemetry message format gains the audit-row fields additively — one packet per lifecycle transition carries both the operational state update AND the audit row content.
For `ExternalSystem.CachedCall` and `Database.CachedWrite`, the **site** is the source of truth for every audit row. The site writes each lifecycle event `CachedSubmit` (`Status=Submitted`), then `ApiCallCached`/`DbWriteCached` rows for the forward-ack (`Status=Forwarded`) and each retry (`Status=Attempted`), then a terminal `CachedResolve` row (`Status=Delivered`/`Failed`/`Parked`/`Discarded`) to its local SQLite `AuditLog` on the hot path (or on the retry tick for `Attempted` rows), then forwards via the same telemetry channel described in §6.2. The telemetry message format gains the audit-row fields additively — one packet per lifecycle transition carries both the operational state update AND the audit row content.
On receipt, central does two things in **one transaction**:
@@ -243,13 +249,13 @@ Worked examples — what each `Channel`/`Kind` row actually looks like. (Other c
```
EventId = <new guid>
Channel = ApiOutbound
Kind = SyncCall
Kind = ApiCall
CorrelationId = NULL -- one-shot, no operation to correlate
SourceSiteId = "site-01"
SourceInstance = "Plant1.Boiler"
SourceScript = "OnHourly"
Target = "Weather/GetForecast"
Status = Success
Status = Delivered
HttpStatus = 200
DurationMs = 142
RequestSummary = '{"city":"Dublin"}' -- truncated to cap
@@ -259,11 +265,12 @@ ResponseSummary= '{"tempC":11.4,...}' -- truncated to cap
**Cached call** (`ExternalSystem.CachedCall(...)`, hits a 500, retries, succeeds on attempt 3):
```
1. Kind=CachedEnqueued Status=Enqueued CorrelationId=<tracked-op-id>
2. Kind=CachedAttempt Status=TransientFailure HttpStatus=500 CorrelationId=<same>
3. Kind=CachedAttempt Status=TransientFailure HttpStatus=500 CorrelationId=<same>
4. Kind=CachedAttempt Status=Success HttpStatus=200 CorrelationId=<same>
5. Kind=CachedTerminal Status=Delivered CorrelationId=<same>
1. Kind=CachedSubmit Status=Submitted CorrelationId=<tracked-op-id>
2. Kind=ApiCallCached Status=Forwarded CorrelationId=<same>
3. Kind=ApiCallCached Status=Attempted HttpStatus=500 CorrelationId=<same>
4. Kind=ApiCallCached Status=Attempted HttpStatus=500 CorrelationId=<same>
5. Kind=ApiCallCached Status=Attempted HttpStatus=200 CorrelationId=<same>
6. Kind=CachedResolve Status=Delivered CorrelationId=<same>
```
The shadow of the `SiteCalls` row's lifecycle, but immutable and time-ordered.
@@ -274,10 +281,10 @@ The shadow of the `SiteCalls` row's lifecycle, but immutable and time-ordered.
```
Channel = DbOutbound
Kind = SyncWrite
Kind = DbWrite
Target = "PlantDB" -- connection name only, not server
CorrelationId = NULL
Status = Success
Status = Delivered
DurationMs = 9
RequestSummary = "INSERT INTO Readings(ts,val) VALUES (@p0,@p1)" -- SQL text
Extra = '{"rowsAffected":1,"params":{"p0":"2026-05-20T14:00Z","p1":42.7}}' -- values captured by default
@@ -288,23 +295,25 @@ Extra = '{"rowsAffected":1,"params":{"p0":"2026-05-20T14:00Z","p1":42.7
```
Channel = DbOutbound
Kind = SyncRead
Status = Success
Kind = DbWrite
Status = Delivered
DurationMs = 31
RequestSummary = "SELECT id, value FROM Readings WHERE ts > @p0"
Extra = '{"rowsReturned":42}'
ResponseSummary= NULL -- rows not captured by default; opt-in per connection
```
**Cached write** — same five-row lifecycle as the cached API example.
(Reads and writes share the `DbWrite` kind — the kind distinguishes the trust-boundary call shape, not the SQL verb. Distinguish by `RequestSummary` / `Extra.rowsAffected` vs `Extra.rowsReturned` when needed.)
**Cached write** — same multi-row lifecycle as the cached API example, using `Kind=DbWriteCached` for the `Forwarded` / `Attempted` rows in place of `ApiCallCached`.
### 7.3 `Notification` — outbound notifications
```
1. Kind=Enqueued Status=Enqueued CorrelationId=<NotificationId> SourceSiteId="site-01" SourceInstance="Plant1.Boiler"
2. Kind=Attempt Status=TransientFailure ErrorMessage="SMTP 451 ..." CorrelationId=<same> SourceSiteId=NULL (dispatch is central)
3. Kind=Attempt Status=Success CorrelationId=<same>
4. Kind=Terminal Status=Delivered CorrelationId=<same>
1. Kind=NotifySend Status=Submitted CorrelationId=<NotificationId> SourceSiteId="site-01" SourceInstance="Plant1.Boiler"
2. Kind=NotifyDeliver Status=Attempted ErrorMessage="SMTP 451 ..." CorrelationId=<same> SourceSiteId=NULL (dispatch is central)
3. Kind=NotifyDeliver Status=Attempted CorrelationId=<same>
4. Kind=NotifyDeliver Status=Delivered CorrelationId=<same>
Target = "OpsTeamEmail" -- notification list name
Extra = '{"resolvedTargets":["a@x.com","b@x.com"], "subject":"Boiler high temp"}'
RequestSummary = '...body, truncated...'
@@ -318,20 +327,20 @@ One row per request, written at request completion:
```
Channel = ApiInbound
Kind = Completed
Kind = InboundRequest
CorrelationId = <request-id> -- the request's correlation header (or generated)
SourceSiteId = NULL -- central-originated event
Actor = "AcmeSCADA" -- API key name (NOT the key itself)
Target = "RecordReading" -- inbound method name
Status = Success | PermanentFailure -- mapped from final HTTP outcome
HttpStatus = 200 | 400 | 401 | 500
Status = Delivered | Failed -- mapped from final HTTP outcome
HttpStatus = 200 | 400 | 500
DurationMs = 73
RequestSummary = '{"siteId":"...","value":12.4}' -- truncated; secrets/PII per redaction policy
ResponseSummary= '{"ok":true}' -- full body on 5xx
Extra = '{"remoteIp":"203.0.113.42","userAgent":"...","scriptInvoked":"RecordReading.Handle"}'
```
A bad API key → row with `Status=PermanentFailure`, `HttpStatus=401`, `Actor=NULL`, `Extra` carries `remoteIp` for abuse triage.
A bad API key → separate kind: `Kind=InboundAuthFailure`, `Status=Failed`, `HttpStatus=401`, `Actor=NULL`, `Extra` carries `remoteIp` for abuse triage.
---
@@ -339,7 +348,7 @@ A bad API key → row with `Status=PermanentFailure`, `HttpStatus=401`, `Actor=N
### 8.1 Truncation
- Default cap: **8 KB** for each of `RequestSummary` and `ResponseSummary`. Configurable globally; per-target overrides allowed (§8.4).
- On any non-`Success` row, the cap is raised to **64 KB** for that row — error context is precious.
- On any error row (`Status IN ('Failed', 'Parked', 'Discarded')`), the cap is raised to **64 KB** for that row — error context is precious.
- When a body is truncated, `PayloadTruncated = 1` and the captured prefix is preserved verbatim (UTF-8 byte-safe truncation, no mid-character cuts).
- Bodies exceeding the larger cap are still truncated; full bodies are never stored.
@@ -426,7 +435,7 @@ Lives under a new **Audit** nav group in Central UI (sibling to **Notifications*
- Target (text search — system+method, DB connection, list name).
- Actor (text search — inbound API key name).
- CorrelationId (paste a `TrackedOperationId` / `NotificationId` / request-id to see its full event sequence).
- "Errors only" toggle (`Status NOT IN (Success, Delivered, Enqueued)`).
- "Errors only" toggle (`Status IN ('Failed', 'Parked', 'Discarded')`).
**Results grid:**
- Columns (resizable, reorderable, persisted per user): `OccurredAtUtc`, `Site`, `Channel`, `Kind`, `Status`, `Target`, `Actor`, `DurationMs`, `HttpStatus`, `ErrorMessage`.
@@ -450,7 +459,7 @@ Lives under a new **Audit** nav group in Central UI (sibling to **Notifications*
### 10.3 Health dashboard tiles
Three new tiles in an "Audit" KPI group:
- **Audit volume** — events/min global + per-site sparkline.
- **Audit error rate** — % non-`Success` rows, rolling 5 min.
- **Audit error rate** — % rows where `Status IN ('Failed', 'Parked', 'Discarded')`, rolling 5 min.
- **Audit backlog** — sum of `Pending` site rows; click → per-site breakdown.
### 10.4 Export
@@ -516,12 +525,12 @@ Rough back-of-envelope; load testing will confirm.
### 13.1 Per-site event rate (assumed nominal site)
| Channel/Kind | Typ events/min | Peak events/min |
|---|---:|---:|
| `ApiOutbound.SyncCall` | 10 | 100 |
| `ApiOutbound.Cached*` (~4 rows/op) | 4 | 20 |
| `DbOutbound.SyncWrite` | 30 | 300 |
| `DbOutbound.SyncRead` | 60 | 600 |
| `DbOutbound.Cached*` (~4 rows/op) | 4 | 20 |
| `Notification.Enqueued` (site-emit) | 1 | 10 |
| `ApiOutbound.ApiCall` | 10 | 100 |
| `ApiOutbound.ApiCallCached` (~4 rows/op incl. `CachedSubmit`/`CachedResolve`) | 4 | 20 |
| `DbOutbound.DbWrite` (writes) | 30 | 300 |
| `DbOutbound.DbWrite` (reads) | 60 | 600 |
| `DbOutbound.DbWriteCached` (~4 rows/op incl. `CachedSubmit`/`CachedResolve`) | 4 | 20 |
| `Notification.NotifySend` (site-emit) | 1 | 10 |
| **Per-site total** | **~110** | **~1,050** |
### 13.2 Central total (50-site deployment)
@@ -545,7 +554,7 @@ MS SQL handles this with batched ingest and the time-aligned indexes.
### 13.6 Levers
- Reduce `DefaultCapBytes` per §8.1.
- Tighten per-channel retention per §12.1 (especially `DbOutbound.SyncRead`).
- Tighten per-channel retention per §12.1 (especially `DbOutbound.DbWrite` read traffic).
- Defer to v1.x: Parquet archival to object storage before purge (§15.2).
---
@@ -554,7 +563,7 @@ MS SQL handles this with batched ingest and the time-aligned indexes.
### 14.1 New Audit Log KPIs
- **Volume** — events/min, global + per-site.
- **Error rate** — % non-`Success` rows, rolling 5 min.
- **Error rate** — % rows where `Status IN ('Failed', 'Parked', 'Discarded')`, rolling 5 min.
- **Backlog** — sum of `Pending` site rows.
- **Top inbound callers** — top-10 `Actor` by request count, last 1h.
- **Top outbound 5xx** — top-10 `Target` by 5xx-status count, last 1h.
@@ -590,6 +599,6 @@ A monthly job dumps the closing partition to Parquet on operator-configured obje
| 3 | Hash-chain tamper evidence (§11.4) | Deferred to v1.x. v1 enforces append-only via DB grants only. |
| 4 | Parquet archival to object storage (§15.2) | Deferred to v1.x. |
| 5 | Per-channel retention overrides (§12.1) | Deferred to v1.x. v1 uses a single global `RetentionDays`. |
| 6 | Default payload cap | **8 KB** for `RequestSummary` / `ResponseSummary`; **64 KB** on non-`Success` rows. |
| 6 | Default payload cap | **8 KB** for `RequestSummary` / `ResponseSummary`; **64 KB** on error rows (`Status IN ('Failed', 'Parked', 'Discarded')`). |
All earlier design decisions (purpose, topology, scope, payload depth, lifecycle granularity, retention default, site→central path, UI shape, cached-call audit emission, SQL parameter capture, never-fail-on-audit-failure) are also locked. See §1§15.

View File

@@ -81,14 +81,14 @@ row per lifecycle event across all channels.
| `OccurredAtUtc` | `datetime2` | When the event happened (call returned, retry attempted, etc.). |
| `IngestedAtUtc` | `datetime2` | When central persisted the row (lags `OccurredAtUtc` for site-originated rows). |
| `Channel` | `varchar(32)` | `ApiOutbound` \| `DbOutbound` \| `Notification` \| `ApiInbound`. |
| `Kind` | `varchar(32)` | Channel-specific event kind (see below). |
| `Kind` | `varchar(32)` | Event kind discriminator (see kinds list below). |
| `CorrelationId` | `uniqueidentifier` NULL | Ties multi-event operations together. `TrackedOperationId` for cached calls, `NotificationId` for notifications, request-id for inbound API. NULL for sync one-shot calls. |
| `SourceSiteId` | `varchar(64)` NULL | NULL for central-originated events. |
| `SourceInstanceId` | `varchar(128)` NULL | Instance whose script initiated the action (when applicable). |
| `SourceScript` | `varchar(128)` NULL | Script name within the instance. |
| `Actor` | `varchar(128)` NULL | Inbound API: API key name. Outbound: script identity. Central: system user. |
| `Target` | `varchar(256)` NULL | Outbound API: external system + method. DB: connection name. Notification: list name. Inbound API: method name. |
| `Status` | `varchar(32)` | Outcome of *this event*`Success`, `TransientFailure`, `PermanentFailure`, `Enqueued`, `Retrying`, `Delivered`, `Parked`, `Discarded`. |
| `Status` | `varchar(32)` | Outcome of *this event*`Submitted`, `Forwarded`, `Attempted`, `Delivered`, `Failed`, `Parked`, `Discarded`, `Skipped`. |
| `HttpStatus` | `int` NULL | HTTP-bearing events only. |
| `DurationMs` | `int` NULL | Call / attempt duration. |
| `ErrorMessage` | `nvarchar(1024)` NULL | Truncated; `ErrorDetail` for full text. |
@@ -107,17 +107,24 @@ row per lifecycle event across all channels.
- `IX_AuditLog_Target_Occurred (Target, OccurredAtUtc)` — "what did we send to system X".
- Monthly partitioning on `OccurredAtUtc` from day one; purge is a partition switch (see Retention & Purge).
**`Kind` values by channel:**
**`Kind` values (flat — 10 discriminators across all channels):**
| Channel | Kinds |
| Kind | Fires when |
|---|---|
| `ApiOutbound` | `SyncCall`, `CachedEnqueued`, `CachedAttempt`, `CachedTerminal` |
| `DbOutbound` | `SyncWrite`, `SyncRead`, `CachedEnqueued`, `CachedAttempt`, `CachedTerminal` |
| `Notification` | `Enqueued`, `Attempt`, `Terminal` |
| `ApiInbound` | `Completed` — one row per request, written at request end with final status |
| `ApiCall` | Sync `ExternalSystem.Call(...)` returns (success or permanent failure). One row per call. |
| `ApiCallCached` | A cached outbound-API attempt records its forward-ack (`Forwarded`) or each retry (`Attempted`). |
| `DbWrite` | Sync `Database.Connection().Execute*(...)` / `ExecuteReader(...)` completes. One row per call. |
| `DbWriteCached` | A cached outbound-DB attempt records its forward-ack (`Forwarded`) or each retry (`Attempted`). |
| `NotifySend` | Script's `Notify.Send(...)` is enqueued on the site — first row in a notification's lifecycle (`Status=Submitted`). |
| `NotifyDeliver` | Central Notification Outbox dispatcher records a delivery attempt (`Attempted`) or terminal outcome (`Delivered`/`Parked`/`Discarded`). |
| `InboundRequest` | An inbound API request completes — one row per request, written at request end with final status. |
| `InboundAuthFailure` | An inbound API request was rejected at the auth boundary (bad/missing key). One row, `Status=Failed`, `HttpStatus=401`. |
| `CachedSubmit` | Script-side enqueue of a cached call (`ExternalSystem.CachedCall` / `Database.CachedWrite`); first row in the cached-call lifecycle, written to site SQLite before any forward attempt. |
| `CachedResolve` | Terminal row for a cached operation — `Status` = `Delivered` / `Failed` / `Parked` / `Discarded`. |
Inbound API is intentionally collapsed to a single `Completed` row per request
rather than a multi-event lifecycle.
Inbound API is intentionally collapsed to a single `InboundRequest` (or
`InboundAuthFailure` for auth rejections) row per request rather than a
multi-event lifecycle.
## The Site-Local `AuditLog` (SQLite)
@@ -178,18 +185,24 @@ pattern as Site Call Audit's reconciliation of `SiteCalls`.
### Central direct-write (central-originated events)
Events originating at central never touch site SQLite. Inbound API writes one
`ApiInbound.Completed` row via `ICentralAuditWriter` synchronously inside the
request-handler middleware, before the HTTP response is flushed. The
Notification Outbox dispatcher writes `Notification.Attempt` per delivery
attempt and `Notification.Terminal` on terminal status. Central direct-writes
use the same insert-if-not-exists semantics keyed on `EventId`.
`ApiInbound.InboundRequest` row via `ICentralAuditWriter` synchronously inside
the request-handler middleware, before the HTTP response is flushed; auth-layer
rejections emit `ApiInbound.InboundAuthFailure` (`Status=Failed`, HTTP 401)
instead. The Notification Outbox dispatcher writes
`Notification.NotifyDeliver` with `Status=Attempted` per delivery attempt and
`Notification.NotifyDeliver` with `Status=Delivered`/`Parked`/`Discarded` on
terminal status. Central direct-writes use the same insert-if-not-exists
semantics keyed on `EventId`.
## Cached Operations — Combined Telemetry
For `ExternalSystem.CachedCall` and `Database.CachedWrite`, the **site** is the
source of truth for every audit row. The site writes each lifecycle event
(`CachedEnqueued`, `CachedAttempt`, `CachedTerminal`) to its local SQLite
`AuditLog` on the hot path (or on the retry tick for `CachedAttempt`), then
source of truth for every audit row. The site writes each lifecycle event
`CachedSubmit` (`Status=Submitted`), then `ApiCallCached`/`DbWriteCached` rows
for the forward-ack (`Status=Forwarded`) and each retry (`Status=Attempted`),
then a terminal `CachedResolve` row
(`Status=Delivered`/`Failed`/`Parked`/`Discarded`) — to its local SQLite
`AuditLog` on the hot path (or on the retry tick for `Attempted` rows), then
forwards via the same telemetry channel. The telemetry message format gains the
audit-row fields additively — one packet per lifecycle transition carries both
the operational state update AND the audit row content.
@@ -207,7 +220,7 @@ operational `SiteCalls` shape for the dispatcher and UI.
## Payload Capture Policy
- **Default cap** — 8 KB for each of `RequestSummary` and `ResponseSummary`;
raised to 64 KB on any non-`Success` row.
raised to 64 KB on any error row (`Status IN ('Failed', 'Parked', 'Discarded')`).
- **Truncation** — UTF-8 byte-safe; `PayloadTruncated = 1` when applied. Full
bodies are never stored.
- **HTTP headers** — `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`, and
@@ -292,7 +305,7 @@ MS SQL for direct-write events). Unredacted secrets never persist.
Point-in-time, computed from the central `AuditLog` table; global and per-site.
- **Audit volume** — events/min landing in the central `AuditLog`; global plus per-site sparkline.
- **Audit error rate** — % of central `AuditLog` rows with `Status` NOT IN (`Success`, `Delivered`, `Enqueued`) over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit error rate** — % of central `AuditLog` rows with `Status IN ('Failed', 'Parked', 'Discarded')` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, permanent failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit backlog** — sum of `Pending` site rows across sites; click drills into a per-site breakdown.
[Notification Outbox](Component-NotificationOutbox.md) and
@@ -350,19 +363,22 @@ global value in v1; per-channel overrides are deferred to v1.x.
## Interactions
- **[External System Gateway (#7)](Component-ExternalSystemGateway.md)** —
emits `ApiOutbound.SyncCall` rows on every sync `Call()`. For `CachedCall`,
emits `ApiOutbound.ApiCall` rows on every sync `Call()`. For `CachedCall`,
emits the combined cached telemetry packet (audit row + operational update)
per Cached Operations — Combined Telemetry.
- **[External System Gateway (#7)](Component-ExternalSystemGateway.md) — Database layer** — the database access modes inside ESG emit `DbOutbound.SyncWrite` and `DbOutbound.SyncRead` on script-initiated `Connection()` calls; `Database.CachedWrite` emits the cached-write lifecycle rows via the combined-telemetry packet (same path as `ApiOutbound.Cached*`). Site Runtime is the API surface that exposes the `Database.*` calls to scripts; the audit emission itself lives in ESG.
per Cached Operations — Combined Telemetry, using kinds
`CachedSubmit` / `ApiCallCached` / `CachedResolve`.
- **[External System Gateway (#7)](Component-ExternalSystemGateway.md) — Database layer** — the database access modes inside ESG emit `DbOutbound.DbWrite` rows on script-initiated `Connection()` calls (writes and reads share the kind; distinguish via `Extra.rowsAffected` vs `Extra.rowsReturned`); `Database.CachedWrite` emits the cached-write lifecycle rows via the combined-telemetry packet using kinds `CachedSubmit` / `DbWriteCached` / `CachedResolve` (same shape as `ApiOutbound`). Site Runtime is the API surface that exposes the `Database.*` calls to scripts; the audit emission itself lives in ESG.
- **[Inbound API (#14)](Component-InboundAPI.md)** — emits one
`ApiInbound.Completed` row per request from request-handler middleware,
written directly to central via `ICentralAuditWriter` before the response is
flushed.
`ApiInbound.InboundRequest` row per successful request from request-handler
middleware, written directly to central via `ICentralAuditWriter` before the
response is flushed. Auth-layer rejections emit
`ApiInbound.InboundAuthFailure` instead (`Status=Failed`, HTTP 401).
- **[Notification Outbox (#21)](Component-NotificationOutbox.md)** — the
site-emitted `Notification.Enqueued` row flows via audit telemetry; the
central dispatcher writes `Notification.Attempt` (per delivery attempt) and
`Notification.Terminal` (on terminal status) directly via
`ICentralAuditWriter`. The operational `Notifications` table is unchanged.
site-emitted `Notification.NotifySend` row (`Status=Submitted`) flows via
audit telemetry; the central dispatcher writes `Notification.NotifyDeliver`
rows directly via `ICentralAuditWriter``Status=Attempted` per delivery
attempt, `Status=Delivered`/`Parked`/`Discarded` on terminal status. The
operational `Notifications` table is unchanged.
- **[Site Call Audit (#22)](Component-SiteCallAudit.md)** — shares the
cached-call telemetry packet. Central ingest of that packet performs both the
`AuditLog` insert and the `SiteCalls` upsert in one transaction. `SiteCalls`