perf: scope regenerate sibling-lookup to chat_id (T83.3)

The sibling assistant_turn lookup in ``regenerate_assistant_turn``
previously scanned every non-superseded ``assistant_turn`` row across
the whole database and filtered in Python. With many chats in the log
this is O(total_assistant_turns) per regenerate.

Push the chat_id filter into SQL via ``json_extract(payload_json,
'$.chat_id') = ?`` and add ``ORDER BY id DESC LIMIT 50`` so worst-case
work is bounded even within a single chat. Mirrors the SQL pattern in
``chat.web.meanwhile._last_meanwhile_speaker``.

Test added: test_regenerate_sibling_lookup_scoped_to_chat seeds two
chats — the second has an interjection whose ``interjection_of`` value
collides with the first chat's primary speaker. Regenerating chat A
must leave chat B's rows untouched and the regenerated chat A
interjection's ``regenerated_from`` must point at chat A's original
(not chat B's). Pre-T83.3 a global query could in principle latch
onto cross-chat rows.
This commit is contained in:
Joseph Doherty
2026-04-26 22:16:23 -04:00
parent d833bbc3e7
commit a1e2d9a24d
2 changed files with 188 additions and 2 deletions
+12 -2
View File
@@ -148,6 +148,13 @@ async def regenerate_assistant_turn(
# the silent witness (the bot that wasn't the primary addressee).
# Filter on ``superseded_by IS NULL`` so prior regenerates of this
# group don't reappear as siblings.
#
# T83.3: push the chat_id filter into SQL via ``json_extract`` so
# the query doesn't scan every assistant_turn row across the whole
# database. ``LIMIT 50`` bounds worst-case work even when chat_id
# isn't selective (e.g. a single chat with many turns) — we only
# need the one matching sibling. Mirrors the SQL pattern in
# ``chat.web.meanwhile._last_meanwhile_speaker``.
original_interjection_event_id: int | None = None
original_interjection_payload: dict | None = None
if original_user_turn_id is not None:
@@ -155,8 +162,11 @@ async def regenerate_assistant_turn(
"SELECT id, payload_json FROM event_log "
"WHERE kind = 'assistant_turn' "
" AND id != ? "
" AND superseded_by IS NULL",
(original_assistant_event_id,),
" AND superseded_by IS NULL "
" AND json_extract(payload_json, '$.chat_id') = ? "
"ORDER BY id DESC "
"LIMIT 50",
(original_assistant_event_id, chat_id),
)
for sib_id, sib_payload_json in sibling_cur.fetchall():
sib_payload = json.loads(sib_payload_json)