chat/chat at a902d8643247fa2f7a853f11130800c59da88960 - chat

Files

T

Joseph Doherty a902d86432 fix: workers retry-on-lock so they don't drop writes under busy_timeout=100ms

The previous commit dropped open_db's busy_timeout from 5s to 100ms
to prevent the embedding worker from GIL-blocking the asyncio event
loop and silently adding 5s to every state_update LLM call. That fixed
the chat path but broke worker durability: any worker write that
collided with the request handler's brief open transaction failed
with 'database is locked' instead of waiting.

Adds append_and_apply_with_retry in chat/eventlog/log.py — same
contract as append_and_apply but runs through a conn_factory and
retries with exponential backoff (50ms..500ms, ~10s total budget) on
'database is locked'. Returns None and logs WARNING if all retries
fail; callers handle that as a no-op.

Wires it into:
- embedding_worker._process for embedding_indexed events
- background._process for memory_significance_set events (auto-pin
  still uses a direct open_db when the score warrants it; that one
  is fast and not racy in practice)

Verified live: ran 4 back-to-back chat turns, zero worker errors,
embeddings + significance landing correctly. Suite: 464 passed in
11.5s.

2026-04-27 14:04:27 -04:00

perf: 18s/turn -> 2.5s/turn (SQLite busy_timeout, parallel state pairs, OpenRouter Cerebras-pinned classifier)

2026-04-27 13:51:27 -04:00

eventlog

fix: workers retry-on-lock so they don't drop writes under busy_timeout=100ms

2026-04-27 14:04:27 -04:00

llm

perf: 18s/turn -> 2.5s/turn (SQLite busy_timeout, parallel state pairs, OpenRouter Cerebras-pinned classifier)

2026-04-27 13:51:27 -04:00

services

fix: workers retry-on-lock so they don't drop writes under busy_timeout=100ms

2026-04-27 14:04:27 -04:00