# Audit: `project()` callers and non-idempotent projector handlers **Date:** 2026-04-27 **Triggering incident:** commit `0f8bf94` — `kickoff_post` 500'd with `sqlite3.IntegrityError: UNIQUE constraint failed: chats.id` after a second bot's kickoff. Root cause: the route appended events with `append_event()` and then called `project(conn)`, which replays the *entire* event log. The `chat_created` handler in `chat/state/world.py` uses raw `INSERT INTO chats ...` (no `OR REPLACE`/`OR IGNORE`), so on a DB that already had a first bot's chat row, the replay re-hit that row and raised. This audit walks the rest of the live request paths to make sure no other route has the same shape, and inventories every projector handler that uses raw `INSERT` so the trade-offs are documented for future hardening passes. --- ## Step 1 — `project()` callers `grep -rn "project(" chat/ --include="*.py"` (excluding the definition itself in `chat/eventlog/projector.py:17` and the local `project_id` type variables that the regex doesn't actually catch): | File:line | Caller | Classification | |---|---|---| | `chat/web/bots.py:113` | `bot_create` route — append `bot_authored` then `project(conn)` | **Unsafe (live path) — fixed** | | `chat/web/settings.py:44` | `settings_post` route — append `you_authored` then `project(conn)` | **Unsafe (live path) — fixed** | | `chat/services/rewind.py:110` | `execute_rewind` — clears every projected table then re-projects from the truncated log | **Safe (replay-only)** | | `chat/eventlog/projector.py:17` | Definition site, not a call | n/a | | `tests/test_*.py` (~50 tests) | Test setup pattern: append a sequence of synthetic events into a fresh DB, then `project(conn)` to materialise | **Safe (replay-only)** — projects against an empty/fresh DB; not a live request path | ### Safe (replay-only) - **`chat/services/rewind.py:110`** — `execute_rewind` is the canonical "rebuild the projection" entry point. Lines 95–104 explicitly `DELETE FROM` every projected table (`memories`, `activity`, `scenes`, `containers`, `chat_state`, `chats`, `edges`, `bots`, `you_entity`, `classifier_failures`) before calling `project(conn)`. The handler registry then walks the truncated log against empty tables, so even the raw-INSERT handlers run safely on a clean slate. The module docstring (lines 1–21) calls out exactly why a full replay (rather than a "revert delta") is the right move here: the `edge_update` handler is a delta accumulator with no clean inverse. **Do not change.** - **Test suite** — every `from chat.eventlog.projector import project` in `tests/` is a setup helper. They open a fresh in-memory or tmp-path DB, append a hand-crafted sequence of events, and call `project(conn)` once. There is no second-replay risk because the DB starts empty. These are not live paths. ### Unsafe (live-path) — fixed in this audit Both fixes follow the pattern established by `0f8bf94`: drop the `append_event` + `project` pair in favour of `append_and_apply` (defined in `chat/eventlog/log.py:32`), which appends and runs *only the brand-new event* through its registered handler. - **`chat/web/bots.py:113` — `bot_create`** Was: `append_event(conn, kind="bot_authored", ...); project(conn)`. Now: `append_and_apply(conn, kind="bot_authored", ...)`. In isolation, `_apply_bot_authored` is itself idempotent (`INSERT OR REPLACE INTO bots`), so the *route* didn't fail today. The bug is latent: as soon as any kickoff ran first (which produces `chat_created` events), the next call to `bot_create` would replay that prior `chat_created` and trip the same UNIQUE constraint. We saw this happen in `0f8bf94` — fixing the symmetric route prevents the next variant of the same incident. Removed unused imports: `append_event`, `project`. - **`chat/web/settings.py:44` — `settings_post`** Was: `append_event(conn, kind="you_authored", ...); project(conn)`. Now: `append_and_apply(conn, kind="you_authored", ...)`. Same shape as `bot_create`. `_apply_you_authored` is idempotent on its own (`INSERT OR REPLACE INTO you_entity`), but `project()` walks the *whole* log, including any `chat_created` / `container_created` / `scene_opened` events that have accumulated. Editing the user's own settings on a non-empty DB would 500 with the same UNIQUE constraint error — not because the new event is unsafe, but because the replay is. Fixed by per-event apply. Removed unused imports: `append_event`, `project`. ### Unsafe — still to fix None. The two unsafe live-path call sites identified above were both fixed in this commit. Future hardening: a CI lint that flags `project(` outside `chat/services/rewind.py` and `tests/` would catch a regression, but that's out of scope here. --- ## Step 2 — non-idempotent projector handler inventory Output of `grep -n "INSERT INTO\|INSERT OR REPLACE\|INSERT OR IGNORE" chat/state/*.py`, classified. ### Replay-safe handlers These either use `INSERT OR REPLACE` / `INSERT OR IGNORE` (so a second apply is a no-op or an overwrite of identical data), or are pure `UPDATE` against rows the prior event created. | Handler | File | Statement | Why safe | |---|---|---|---| | `_apply_bot_authored` | `chat/state/entities.py:12` | `INSERT OR REPLACE INTO bots` | `id` is the natural PK; replay overwrites with identical payload. | | `_apply_you_authored` | `chat/state/entities.py:29` | `INSERT OR REPLACE INTO you_entity` | Singleton row keyed on `id=1`. | | `_apply_activity_change` | `chat/state/world.py:98` | `INSERT OR REPLACE INTO activity` | Activity is keyed on `entity_id` — last write wins is exactly the intended semantics. | | `_apply_thread_opened` | `chat/state/threads.py:12` | `INSERT OR IGNORE INTO threads` | `thread_id` is the natural PK. | | `_apply_event_planned` | `chat/state/events.py:16` | `INSERT OR IGNORE INTO events` | `event_id` is the natural PK. | | `_apply_branch_created` | `chat/state/branches.py:27` | `INSERT OR IGNORE INTO branches` | Branch `name` is unique. | | `_apply_group_node_initialized` | `chat/state/group_node.py:12` | `INSERT OR REPLACE INTO group_node` | One row per `chat_id`. | | `_apply_embedding_indexed` | `chat/state/embeddings.py:28` | `INSERT OR REPLACE INTO embeddings` | One vector per `memory_id`. | | Pure-`UPDATE` handlers | various — `_apply_time_skip_*`, `_apply_guest_added`/`_removed`, `_apply_scene_closed`, `_apply_memory_significance_set`, `_apply_memory_pin_changed`, `_apply_meanwhile_scene_closed`, `_apply_meanwhile_digest_consumed`, `_apply_thread_updated`, `_apply_event_started`/`_completed`/`_cancelled` (etc.), `_apply_group_node_updated` | n/a | Idempotent: re-applying the same UPDATE produces the same row state. | ### Unsafe-on-replay handlers (raw `INSERT`) | Handler | File | Statement | Failure mode on replay | |---|---|---|---| | `_apply_chat_created` | `chat/state/world.py:14` | `INSERT INTO chats`, `INSERT INTO chat_state` | `chats.id` is PK — second insert raises `IntegrityError: UNIQUE constraint failed: chats.id`. **This is the `0f8bf94` bug.** `chat_state.chat_id` is also unique; would raise too. | | `_apply_container_created` | `chat/state/world.py:78` | `INSERT INTO containers` | `containers.id` is `INTEGER PRIMARY KEY AUTOINCREMENT` — replay does NOT raise (a new id is assigned), but it silently creates a duplicate row, fragmenting downstream lookups by `(chat_id, name)`. **Silent corruption, not a crash.** | | `_apply_scene_opened` | `chat/state/world.py:115` | `INSERT INTO scenes` | Same shape: autoincrement `id`. Replay creates a duplicate scene row and re-points `chat_state.active_scene_id` to the new copy. **Silent corruption.** | | `_apply_memory_written` | `chat/state/memory.py:14` | `INSERT INTO memories` | Autoincrement `id`. Replay duplicates the memory; FTS5 trigger then double-indexes the same `pov_summary`. **Silent corruption + double-counting in retrieval.** | | `_apply_meanwhile_scene_started` | `chat/state/meanwhile.py:29` | `INSERT INTO scenes` (with explicit `scene_id`) | Caller supplies `scene_id` (deterministic). Replay raises `IntegrityError: UNIQUE constraint failed: scenes.id`. **Hard crash, like `chat_created`.** | | `_apply_meanwhile_digest_created` | `chat/state/meanwhile.py:67` | `INSERT INTO meanwhile_digest_pending` | Autoincrement `id`. Replay creates a duplicate pending digest, surfacing the same summary twice in the next you-scene's prompt. **Silent corruption.** | | `_apply_edge_update` | `chat/state/edges.py:12` | `INSERT OR IGNORE INTO edges` followed by `UPDATE … SET affinity = ? + delta` | The `INSERT OR IGNORE` is fine, but the handler is *delta-shaped* — each replay re-adds `affinity_delta` and `trust_delta`, and re-extends `knowledge_json`. **Silent corruption: scores drift up; knowledge facts duplicate.** Already called out in `chat/eventlog/log.py:39-46` as the canonical reason `append_and_apply` exists. | ### Trade-offs — why we are NOT switching every handler to `INSERT OR REPLACE` This is the part the audit is here to nail down before someone "fixes it" with a one-line s/`INSERT INTO`/`INSERT OR REPLACE INTO`/. 1. **Autoincrement-id handlers (`containers`, `scenes`, `memories`, `meanwhile_digest_pending`)** — `INSERT OR REPLACE` doesn't help. Each event's payload doesn't carry the row's eventual id — the id comes from `lastrowid` *at projection time*. There is no key for `OR REPLACE` to match on. The fix here is either (a) make the event carry a deterministic id derived from the event's own id (large refactor — payload schemas, downstream FK lookups, FTS rowid alignment), or (b) keep the handler raw-INSERT and ensure every live path uses `append_and_apply` (the path we're on). We are on path (b), and this audit makes it explicit. 2. **`chat_created`** — `chats.id` IS keyed on the natural PK, so `INSERT OR REPLACE INTO chats ...` would technically work for the chat row. *But* it would silently overwrite `chat_state` columns that other events legitimately mutate later: `chat_state.time` is bumped by `time_skip_elision`, `active_scene_id` is set/cleared by `scene_opened`/`scene_closed`. On replay the `chat_created` overwrite would clobber those subsequent updates, then later events would re-set them — *if* the events themselves appear in order (they do today). It would work in practice, but it would erase the invariant that "each handler is responsible for one table-shape change" and make the projector's correctness depend on strict event-order replay through `chat_state`. Not worth the subtle coupling; keep the raw INSERT and treat replay as an explicit "wipe + replay" operation (the rewind path does exactly that). 3. **`meanwhile_scene_started`** — could be made idempotent (the payload supplies `scene_id`), but it shares the `scenes` table with `_apply_scene_opened` (autoincrement) — making one half of the table writers `OR REPLACE` and the other half raw-INSERT is asking for a future bug. Keep both raw, lean on `append_and_apply`. 4. **`edge_update`** — fundamentally cannot be made idempotent under replay without either changing the event schema (carry absolute values, not deltas) or recording per-event-id "already applied" flags. Either is a multi-week project. The current contract is "edge_update is a delta event; never apply it twice"; the `append_and_apply` rule enforces that contract from the call site. **Conclusion:** the handler layer is *correctly* non-idempotent for event-sourcing semantics. The defect class lives in the *caller* layer (routes that mistakenly call `project()` instead of `append_and_apply`). This audit fixes the two known offenders and pins the contract with a regression test (see Step 3). --- ## Step 3 — regression test Added `tests/test_chat_created_non_idempotent.py`. The test: 1. Opens a fresh DB and runs the migration chain. 2. Appends one `chat_created` event and projects — first projection succeeds. 3. Appends a *second* `chat_created` for the same chat id and projects again — asserts that the second projection raises `sqlite3.IntegrityError`. The point isn't that the test catches a future "make it idempotent" change automatically; it's that any such change MUST update this test, forcing a deliberate review of all the trade-offs documented above. --- ## Files changed - `chat/web/bots.py` — swap `append_event`+`project` → `append_and_apply`, drop unused imports. - `chat/web/settings.py` — same swap. - `tests/test_chat_created_non_idempotent.py` — new regression test. - `docs/audits/2026-04-27-project-callers.md` — this file.