Same defect class as 0f8bf94: routes that ``append_event`` then
``project(conn)`` 500 once any prior event makes the full-log replay
hit a raw-INSERT handler (chat_created, container_created,
scene_opened, memory_written, meanwhile_scene_started, etc.).
Fixes the two remaining live-path callers:
- chat/web/bots.py (bot_create) — bot_authored
- chat/web/settings.py (settings_post) — you_authored
Both swap ``append_event`` + ``project`` → ``append_and_apply`` so only
the new event is applied through its registered handler. Unused
imports of ``append_event`` and ``project`` removed from each file.
The rewind path (chat/services/rewind.py) intentionally calls
``project()`` after wiping every projected table — that's the
canonical "rebuild from log against an empty DB" entry point and is
left unchanged.
Inventory of every projector handler that uses raw INSERT
(chat_created, container_created, scene_opened, memory_written,
meanwhile_scene_started, meanwhile_digest_created, edge_update) is
documented with the trade-offs of why we don't blindly switch them to
INSERT OR REPLACE — for autoincrement-id rows there is no key to match
on, and for chat_created a lossy overwrite would silently clobber
chat_state mutations from later events. The handler layer stays
correctly non-idempotent under event-sourcing semantics; the rule is
enforced at the call site.
Adds a regression test (tests/test_chat_created_non_idempotent.py)
that pins the contract: appending two chat_created events for the same
id and then ``project()``ing a second time MUST raise
``IntegrityError`` on chats.id. Any future "make it idempotent" change
must update the test, forcing a deliberate review.
Suite: 471 passed in 11.82s (was 470 + this regression test).
Report: docs/audits/2026-04-27-project-callers.md
12 KiB
Audit: project() callers and non-idempotent projector handlers
Date: 2026-04-27
Triggering incident: commit 0f8bf94 — kickoff_post 500'd with
sqlite3.IntegrityError: UNIQUE constraint failed: chats.id after a
second bot's kickoff. Root cause: the route appended events with
append_event() and then called project(conn), which replays the
entire event log. The chat_created handler in chat/state/world.py
uses raw INSERT INTO chats ... (no OR REPLACE/OR IGNORE), so on a
DB that already had a first bot's chat row, the replay re-hit that row
and raised.
This audit walks the rest of the live request paths to make sure no
other route has the same shape, and inventories every projector handler
that uses raw INSERT so the trade-offs are documented for future
hardening passes.
Step 1 — project() callers
grep -rn "project(" chat/ --include="*.py" (excluding the definition
itself in chat/eventlog/projector.py:17 and the local project_id
type variables that the regex doesn't actually catch):
| File:line | Caller | Classification |
|---|---|---|
chat/web/bots.py:113 |
bot_create route — append bot_authored then project(conn) |
Unsafe (live path) — fixed |
chat/web/settings.py:44 |
settings_post route — append you_authored then project(conn) |
Unsafe (live path) — fixed |
chat/services/rewind.py:110 |
execute_rewind — clears every projected table then re-projects from the truncated log |
Safe (replay-only) |
chat/eventlog/projector.py:17 |
Definition site, not a call | n/a |
tests/test_*.py (~50 tests) |
Test setup pattern: append a sequence of synthetic events into a fresh DB, then project(conn) to materialise |
Safe (replay-only) — projects against an empty/fresh DB; not a live request path |
Safe (replay-only)
-
chat/services/rewind.py:110—execute_rewindis the canonical "rebuild the projection" entry point. Lines 95–104 explicitlyDELETE FROMevery projected table (memories,activity,scenes,containers,chat_state,chats,edges,bots,you_entity,classifier_failures) before callingproject(conn). The handler registry then walks the truncated log against empty tables, so even the raw-INSERT handlers run safely on a clean slate. The module docstring (lines 1–21) calls out exactly why a full replay (rather than a "revert delta") is the right move here: theedge_updatehandler is a delta accumulator with no clean inverse. Do not change. -
Test suite — every
from chat.eventlog.projector import projectintests/is a setup helper. They open a fresh in-memory or tmp-path DB, append a hand-crafted sequence of events, and callproject(conn)once. There is no second-replay risk because the DB starts empty. These are not live paths.
Unsafe (live-path) — fixed in this audit
Both fixes follow the pattern established by 0f8bf94: drop the
append_event + project pair in favour of append_and_apply (defined
in chat/eventlog/log.py:32), which appends and runs only the
brand-new event through its registered handler.
-
chat/web/bots.py:113—bot_createWas:append_event(conn, kind="bot_authored", ...); project(conn). Now:append_and_apply(conn, kind="bot_authored", ...). In isolation,_apply_bot_authoredis itself idempotent (INSERT OR REPLACE INTO bots), so the route didn't fail today. The bug is latent: as soon as any kickoff ran first (which produceschat_createdevents), the next call tobot_createwould replay that priorchat_createdand trip the same UNIQUE constraint. We saw this happen in0f8bf94— fixing the symmetric route prevents the next variant of the same incident. Removed unused imports:append_event,project. -
chat/web/settings.py:44—settings_postWas:append_event(conn, kind="you_authored", ...); project(conn). Now:append_and_apply(conn, kind="you_authored", ...). Same shape asbot_create._apply_you_authoredis idempotent on its own (INSERT OR REPLACE INTO you_entity), butproject()walks the whole log, including anychat_created/container_created/scene_openedevents that have accumulated. Editing the user's own settings on a non-empty DB would 500 with the same UNIQUE constraint error — not because the new event is unsafe, but because the replay is. Fixed by per-event apply. Removed unused imports:append_event,project.
Unsafe — still to fix
None. The two unsafe live-path call sites identified above were both
fixed in this commit. Future hardening: a CI lint that flags
project( outside chat/services/rewind.py and tests/ would catch a
regression, but that's out of scope here.
Step 2 — non-idempotent projector handler inventory
Output of grep -n "INSERT INTO\|INSERT OR REPLACE\|INSERT OR IGNORE" chat/state/*.py, classified.
Replay-safe handlers
These either use INSERT OR REPLACE / INSERT OR IGNORE (so a second
apply is a no-op or an overwrite of identical data), or are pure
UPDATE against rows the prior event created.
| Handler | File | Statement | Why safe |
|---|---|---|---|
_apply_bot_authored |
chat/state/entities.py:12 |
INSERT OR REPLACE INTO bots |
id is the natural PK; replay overwrites with identical payload. |
_apply_you_authored |
chat/state/entities.py:29 |
INSERT OR REPLACE INTO you_entity |
Singleton row keyed on id=1. |
_apply_activity_change |
chat/state/world.py:98 |
INSERT OR REPLACE INTO activity |
Activity is keyed on entity_id — last write wins is exactly the intended semantics. |
_apply_thread_opened |
chat/state/threads.py:12 |
INSERT OR IGNORE INTO threads |
thread_id is the natural PK. |
_apply_event_planned |
chat/state/events.py:16 |
INSERT OR IGNORE INTO events |
event_id is the natural PK. |
_apply_branch_created |
chat/state/branches.py:27 |
INSERT OR IGNORE INTO branches |
Branch name is unique. |
_apply_group_node_initialized |
chat/state/group_node.py:12 |
INSERT OR REPLACE INTO group_node |
One row per chat_id. |
_apply_embedding_indexed |
chat/state/embeddings.py:28 |
INSERT OR REPLACE INTO embeddings |
One vector per memory_id. |
Pure-UPDATE handlers |
various — _apply_time_skip_*, _apply_guest_added/_removed, _apply_scene_closed, _apply_memory_significance_set, _apply_memory_pin_changed, _apply_meanwhile_scene_closed, _apply_meanwhile_digest_consumed, _apply_thread_updated, _apply_event_started/_completed/_cancelled (etc.), _apply_group_node_updated |
n/a | Idempotent: re-applying the same UPDATE produces the same row state. |
Unsafe-on-replay handlers (raw INSERT)
| Handler | File | Statement | Failure mode on replay |
|---|---|---|---|
_apply_chat_created |
chat/state/world.py:14 |
INSERT INTO chats, INSERT INTO chat_state |
chats.id is PK — second insert raises IntegrityError: UNIQUE constraint failed: chats.id. This is the 0f8bf94 bug. chat_state.chat_id is also unique; would raise too. |
_apply_container_created |
chat/state/world.py:78 |
INSERT INTO containers |
containers.id is INTEGER PRIMARY KEY AUTOINCREMENT — replay does NOT raise (a new id is assigned), but it silently creates a duplicate row, fragmenting downstream lookups by (chat_id, name). Silent corruption, not a crash. |
_apply_scene_opened |
chat/state/world.py:115 |
INSERT INTO scenes |
Same shape: autoincrement id. Replay creates a duplicate scene row and re-points chat_state.active_scene_id to the new copy. Silent corruption. |
_apply_memory_written |
chat/state/memory.py:14 |
INSERT INTO memories |
Autoincrement id. Replay duplicates the memory; FTS5 trigger then double-indexes the same pov_summary. Silent corruption + double-counting in retrieval. |
_apply_meanwhile_scene_started |
chat/state/meanwhile.py:29 |
INSERT INTO scenes (with explicit scene_id) |
Caller supplies scene_id (deterministic). Replay raises IntegrityError: UNIQUE constraint failed: scenes.id. Hard crash, like chat_created. |
_apply_meanwhile_digest_created |
chat/state/meanwhile.py:67 |
INSERT INTO meanwhile_digest_pending |
Autoincrement id. Replay creates a duplicate pending digest, surfacing the same summary twice in the next you-scene's prompt. Silent corruption. |
_apply_edge_update |
chat/state/edges.py:12 |
INSERT OR IGNORE INTO edges followed by UPDATE … SET affinity = ? + delta |
The INSERT OR IGNORE is fine, but the handler is delta-shaped — each replay re-adds affinity_delta and trust_delta, and re-extends knowledge_json. Silent corruption: scores drift up; knowledge facts duplicate. Already called out in chat/eventlog/log.py:39-46 as the canonical reason append_and_apply exists. |
Trade-offs — why we are NOT switching every handler to INSERT OR REPLACE
This is the part the audit is here to nail down before someone "fixes
it" with a one-line s/INSERT INTO/INSERT OR REPLACE INTO/.
-
Autoincrement-id handlers (
containers,scenes,memories,meanwhile_digest_pending) —INSERT OR REPLACEdoesn't help. Each event's payload doesn't carry the row's eventual id — the id comes fromlastrowidat projection time. There is no key forOR REPLACEto match on. The fix here is either (a) make the event carry a deterministic id derived from the event's own id (large refactor — payload schemas, downstream FK lookups, FTS rowid alignment), or (b) keep the handler raw-INSERT and ensure every live path usesappend_and_apply(the path we're on). We are on path (b), and this audit makes it explicit. -
chat_created—chats.idIS keyed on the natural PK, soINSERT OR REPLACE INTO chats ...would technically work for the chat row. But it would silently overwritechat_statecolumns that other events legitimately mutate later:chat_state.timeis bumped bytime_skip_elision,active_scene_idis set/cleared byscene_opened/scene_closed. On replay thechat_createdoverwrite would clobber those subsequent updates, then later events would re-set them — if the events themselves appear in order (they do today). It would work in practice, but it would erase the invariant that "each handler is responsible for one table-shape change" and make the projector's correctness depend on strict event-order replay throughchat_state. Not worth the subtle coupling; keep the raw INSERT and treat replay as an explicit "wipe + replay" operation (the rewind path does exactly that). -
meanwhile_scene_started— could be made idempotent (the payload suppliesscene_id), but it shares thescenestable with_apply_scene_opened(autoincrement) — making one half of the table writersOR REPLACEand the other half raw-INSERT is asking for a future bug. Keep both raw, lean onappend_and_apply. -
edge_update— fundamentally cannot be made idempotent under replay without either changing the event schema (carry absolute values, not deltas) or recording per-event-id "already applied" flags. Either is a multi-week project. The current contract is "edge_update is a delta event; never apply it twice"; theappend_and_applyrule enforces that contract from the call site.
Conclusion: the handler layer is correctly non-idempotent for
event-sourcing semantics. The defect class lives in the caller layer
(routes that mistakenly call project() instead of append_and_apply).
This audit fixes the two known offenders and pins the contract with a
regression test (see Step 3).
Step 3 — regression test
Added tests/test_chat_created_non_idempotent.py. The test:
- Opens a fresh DB and runs the migration chain.
- Appends one
chat_createdevent and projects — first projection succeeds. - Appends a second
chat_createdfor the same chat id and projects again — asserts that the second projection raisessqlite3.IntegrityError.
The point isn't that the test catches a future "make it idempotent" change automatically; it's that any such change MUST update this test, forcing a deliberate review of all the trade-offs documented above.
Files changed
chat/web/bots.py— swapappend_event+project→append_and_apply, drop unused imports.chat/web/settings.py— same swap.tests/test_chat_created_non_idempotent.py— new regression test.docs/audits/2026-04-27-project-callers.md— this file.