chat/CLAUDE.md

# Roleplay Engine

Local-first roleplay chat app that treats fiction as a **simulation**, not a chat log. The LLM is a renderer for structured world state — it does not hold state.

See [rp-engine-design.md](rp-engine-design.md) for the architectural design and [docs/plans/2026-04-26-v1-requirements-design.md](docs/plans/2026-04-26-v1-requirements-design.md) for the v1 product requirements & behavioral spec. This file is the working summary.

## Why this exists

Fixes three failure modes of conventional RP chatbots:

1. **Memory loss** — old context drops as history grows
2. **Quality decay** — bots get terse and generic over long conversations
3. **Stale state pollution** — bots fixate on past props (the "picnic basket" problem)

## Hard scope constraints

- **Single user, single machine** (the user's Mac)
- **Max 3 entities per scene**: `you` + up to 2 bots (`botA`, `botB`)
- **Chat-only** — no voice, no real-time

The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities.

## Architecture

- **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which.
- Streaming required for UX.

## Runtime stack (locked for v1)

- **Backend**: Python 3.11+ with **FastAPI**.
- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain.
- **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates).
- **Inference backend**: **Featherless** (OpenAI-compatible API).
  - `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored).
  - `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`.
- **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
- **OOC marker**: `((double parens))` (configurable).
- **Data layout**: everything under `<repo>/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override.
- **Auth**: bind to `127.0.0.1` only in v1. No auth.

## Behavioral defaults (locked in v1 brainstorm round 2)

- **Significance scale**: 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Score-3 turns auto-pin per witness. Drives retrieval ranking, compression, JSON exports.
- **Edge updates**: per-turn deltas (`affinity_delta`, `trust_delta`, `knowledge_facts`, `last_interaction`); per-scene-close summary rewrite. Every mutation goes through the event log as `edge_update`.
- **Classifier failure handling**: Pydantic-constrained → 1 retry with stricter reminder → schema-default fallback. 10s timeout. Never block the play loop. Refusals trigger fallback-model swap for that one call. Failures logged to `classifier_failures` table.
- **Activity verbs**: open string + classifier-extracted `interruptible`, `required_attention`, `expected_duration`. Attention is optional free-form; omit from prompt when empty.
- **Containers**: parse-and-extend. Per-chat scoped. Kickoff parse seeds initial; transitions create new.
- **Pinning**: soft cap 8 / bot. Pivotal (score 3) = auto-pin. Manual pins never auto-evicted.
- **Snapshots**: periodic every 100 events / 30 min; pre-rewind always. 5 periodic retained; pre-rewind retained 14 days.
- **Streaming**: Stop button on streaming row; mid-stream disconnect commits partial with `truncated: true`; Send disabled mid-stream; multi-tab streaming via per-chat SSE channel.
- **Display**: lightweight markdown; `*action*` italic; OOC `((parens))` shown dimmed/italic, never sent to bot.
- **Multi-entity defaults (Phase 2)**: when `chat.guest_bot_id is None`, behavior matches Phase 1 single-bot 1:1. With a guest, all 3 entities are present in the prompt, witness writes, and state-update fan-out (6 directed pairs).
- **Addressee detection**: simple substring match (whole-word, case-insensitive) over the user turn's body. If both bot names match or neither does, the host gets the floor.
- **Interjection**: classifier-driven, conservative bias (default false on classifier failure / refusal / parse error). When the classifier returns true, the addressee speaks first, then the non-addressee may interject in a follow-up turn.
- **Per-POV summaries (multi-entity)**: each present witness with a memory store gets their own per-POV summary on scene close. The summary differs per bot based on persona + their edge to "you". The group node summary is updated alongside.

## Core concepts (vocabulary)

- **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
- **Container**: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph.
- **Activity record**: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block.
- **Relationship graph**: 6 **directed** edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction.
- **Scene configurations**: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule.
- **Witnessed-by flag**: every memory has a 3-bit `[you, botA, botB]` mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know.
- **Event**: scoped lifecycle (`planned | active | completed | cancelled | expired`) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory.
- **Active threads**: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes.
- **Scene**: closes when container changes meaningfully or significant time passes. Compression boundary.
- **Per-POV summary**: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores.
- **Time skip**: `elision` (skip the boring middle of an in-progress activity) vs `jump` (next morning, a week later). Skips run intervening events forward, compress, reset landing activity.

## What promotes out of an event (and what doesn't)

- Object acquired → inventory
- Knowledge gained → edge `knowledge` field
- Relationship change → edge summary
- **Everything else stays in the closed event record.** The blanket, the basket, the specific sandwich do **not** become memories. This rule is the whole point — don't bypass it.

## Persistence

- **SQLite** (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
- **sqlite-vss** or **sqlite-vec** for embeddings (same DB file). Decide at Phase 4.
- **JSON** for snapshots, character templates, scene exports.
- **No** Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer.

Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch.

## Event sourcing — non-negotiable

State is a **projection** of an append-only event log. State is **never mutated directly** — append an event, the projector applies it.

Event kinds: `user_turn`, `assistant_turn`, `time_skip`, `event_triggered`, `edge_update`, `scene_transition`, `entity_state_change`, `activity_change`.

This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes").

**Determinism on replay**: LLM calls are nondeterministic. Store the *outcome* in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay.

**Snapshots** every N events / M minutes so we don't replay everything on load. Log is source of truth.

## Prompt construction

A speaker's prompt is assembled from **their** edges and **their** witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene.

Order (for speaker BotA, with you and BotB present):

1. BotA identity + current state
2. BotA → You edge
3. BotA → BotB edge
4. Group node (only if all three present)
5. World state (time, weather, location)
6. Active scene description
7. Activity snapshot for **all** present entities
8. Active threads
9. Recent dialogue window
10. Retrieved memories (top-K, witness-filtered, BotA-owned)
11. Currently active events + their props

After every utterance, run a state-update pass on **every present entity**, not just the speaker. Silent witnesses still update edges.

## Memory retrieval

- Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval).
- Retrieved: top-K vector search over **the speaker's** memory store, filtered by witness flag, with recency + significance boosts.
- Keep K small. Bloated retrieval poisons the prompt.
- Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4.

## Implementation phases

1. **Core loop**: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
2. **Multi-entity**: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
3. **Events & skips**: event queue with triggers, time skips, active threads, significance classifier.
4. **Polish**: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds.

Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands.

## Conventions for working in this repo

- **Don't bypass the event log.** Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong.
- **Don't collapse directed edges.** `botA → botB` and `botB → botA` are independent. Asymmetry is the point.
- **Don't promote event props to memory.** Only the four promotion categories above survive an event closing.
- **Per-POV, not omniscient.** When writing scene summaries, write one per witness, from their angle.
- **Witness filter every memory read.** A bot must never see a memory their bit isn't set on.
- **Activity block is always in the prompt.** It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures.
- **Streaming on the inference path; non-blocking bookkeeping** (significance classification, embeddings, snapshots) runs while the LLM streams.
- **No Docker, no extra services.** SQLite + a process. Push back on suggestions to add infrastructure.

## Open decisions (deferred — don't pre-decide)

- Token budget strategy (during Phase 1, with real prompts)
- Embedding model (Phase 4)
- `sqlite-vss` vs `sqlite-vec` (Phase 4)
- UI framework (local web app / Tauri / Electron / native — TBD)
- Inference hosting (start with a cloud API, re-evaluate later)
- Character template format (during Phase 1)
- Multi-session / multi-character casts: **out of scope for v1**. Leave cheap schema hooks only.

## Phase 1 status

Phase 1 shipped end-to-end across **35 tasks** (T0–T35). The single-bot core loop is functional: event log + projector, schema + migrations, settings/bot authoring, kickoff confirm, streaming turns, drawer rendering, regenerate/rewind, scene close + per-POV summaries, significance classifier, snapshots/backups, first-run navigation, and friendly 404/500 pages. **168 tests passing.**

Deferred to Phase 2: second bot, group node, scene configurations, witness filtering across multi-entity scenes, activity/containers, scene-transition compression. Phase 3: event queue + triggers, time skips, active threads. Phase 4: vector retrieval, branching, surgical delete + regenerate, impact-preview UI.

### Known v1 limitations (read before extending)

- **Drawer edits scope**: only affinity, significance, and pin can be hand-edited from the drawer. Other v1 fields (knowledge, summary text, traits) are deferred to Phase 1.5.
- **Cold-load snapshot path** is wired and unit-tested but rarely exercised in dev — long-running sessions are the only realistic trigger.
- **WAL sidecar files** (`-wal`, `-shm`) are not captured in nightly backups; the nightly snapshot is a fresh `.backup()` so this is fine for restore but worth knowing if you copy the db file by hand.
- **HTMX SSE event names** may need a version check if you bump the htmx CDN URL in `base.html` — the swap targets are name-coupled.
- **"You" activity rows** can linger after `bot_reset` (the reset purges the bot's chats and the bot's own activity row but not the "you" row that was associated with those chats). Cosmetic, fixed in Phase 1.5.
- **Projector replay is non-idempotent** for plain `INSERT` events. After appending, call `apply_event(conn, event)` for the new row only — calling `project(conn)` re-runs every handler from scratch and will trip uniqueness or duplicate inserts.
- **8-pin auto-cap eviction** is FIFO over the auto-pinned set only. Manual pins survive the eviction; this is by design (manual intent > auto-pin signal).
- **Regenerate (T29) does not broadcast `turn_html` over SSE** — the page must refresh to show the regenerated turn. Acceptable for v1 single-tab usage; Phase 1.5 should wire the SSE event.
- **First-run middleware** fires only on bare `/` and `/chats`. Sub-paths like `/chats/<id>` and `/chats/<id>/drawer` pass through (correct: HTMX partials should not page-redirect, and a deep-link to a missing chat should 404, not redirect mid-setup).

### Phase 1.5 cleanup backlog

All items shipped — see Phase 2.5 status below.

## Phase 2 status

Phase 2 shipped end-to-end across **13 tasks** (T36–T48 wave). The multi-entity surface is functional: chats can host a guest bot, the prompt assembly is guest-aware, post-turn fans out across all directed pairs, and scene close writes a per-POV summary per present witness plus a group_node summary.

- **Multi-entity scene support**: chats can now have a guest bot (you + host + guest). The 3-entity cap holds. New event kinds: `guest_added`, `guest_removed`, `group_node_initialized`, `group_node_updated`. New table: `group_node` (members, summary, dynamic, threads).
- **Drawer guest UX**: add/remove guest from the drawer side panel. The "have they met?" prose seed is parsed by the `relationship_seed` classifier into inter-bot directed edges (host↔guest).
- **Multi-entity turn flow**: `post_turn` assembles narrative with the guest-aware prompt; writes memories for **all** present bot witnesses; runs state updates for **all** directed pairs (6 with 3 entities); detects interjections via classifier (default false; the addressee gets the floor first).
- **Per-POV scene close summaries**: each present witness with a memory store gets their own per-POV summary on close; `group_node` summary updated alongside.
- **Bot reset cascade**: resetting a bot now also clears `chats.guest_bot_id` references in other chats (root-cause fix for stale-guest references after T47).

### Phase 2.5 / 3 backlog

All items shipped — see Phase 2.5 status below.

## Phase 2.5 status

Phase 2.5 cleanup shipped end-to-end across 8 tasks (T68–T75). Two CLAUDE.md backlogs (Phase 1.5 cleanup, Phase 2.5/3) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 2.6 / 3 backlog" section below.

- **`open_db` with check_same_thread parameter (T68)**: refactored `chat/db/connection.py` so `chat/web/bots.py:get_conn` no longer duplicates the PRAGMA setup. Default behavior preserved.
- **`bot_reset` cross-chat cleanup (T69)**: now purges orphaned "you" activity rows. Note: this also fixed a latent FK constraint crash that was lurking in the projector — `activity.container_id` is FK-referenced and the prior code would have crashed on any reset of a bot whose chat had a non-NULL `container_id` "you" activity row. The bug was masked because no prior test seeded such a row.
- **LLM-merged group meta-summary (T70)**: replaces Phase 2 T45's naive concat with a classifier merge call. Falls back to the naive concat on classifier failure.
- **`prompt.py` polish (T71)**: witness role parametric (`host` vs `guest` derived from chat membership); single `ACTIVITIES:` block with bullet-level trim; NICE trim order kept with documented rationale (greedy cheapest-impact-first beats spec-listing order in practice).
- **Drawer polish (T72)**: deferred v1 edits (edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove) + first-meeting gate (Add-guest form disables prose textarea when host→guest edge already exists; "re-seed anyway" toggle re-enables) + witness flag inline-edit (per-memory checkboxes for [you, host, guest] flags). Two new `manual_edit` projector branches: `edge_knowledge_fact` and `memory_witness`.
- **Regenerate polish (T73)**: regenerate now broadcasts `turn_html_replace` over SSE (NEW event distinct from `turn_html` to avoid breaking the existing append-semantic consumer); regenerate covers interjection turns (re-detects + re-streams or supersedes); defensive stale-guest degrade removed.
- **Turn-flow polish + addressee service (T74)**: classifier-based addressee detection (substring helper kept as no-guest fast path); SignificanceJob enqueued for interjection memories; scene-close-on-cancel pinned with comment + regression test (close detection is genuinely user-prose-only); defensive stale-guest degrade removed.

### Phase 2.6 / 3 backlog

All items shipped — see Phase 3.5 status below.

## Phase 3 status

Phase 3 shipped end-to-end across 19 tasks (T49–T67). Events with full lifecycle, time skips, active threads, significance refinements, and meanwhile scenes are functional. Schema baseline is now version 11 (migrations 0009 events, 0010 threads, 0011 meanwhile_scenes). Test count grew from ~247 (Phase 2) to ~315 (+68 new tests across the wave).

- **Wave 1 — schema + lifecycle handlers (parallel)**:
  - **T49** `events` table + lifecycle handlers (`event_planned`, `event_started`, `event_completed`, `event_cancelled`, `event_expired`).
  - **T50** `time_skip` event handlers (elision and jump variants).
  - **T51** `threads` table + handlers (`thread_opened`, `thread_updated`, `thread_closed`).
- **Wave 2 — detection / narration services (parallel)**:
  - **T52** event-lifecycle detection service (planned→active→completed transitions inferred from narration).
  - **T53** skip narration service (elision + jump prose).
  - **T54** synthesized-memories service for jump skips (LLM-summarized intervening time).
  - **T55** thread-detection service (open/update/close inferred from recent dialogue).
- **Wave 3 — promotion + ranking (parallel)**:
  - **T56** event-completion promotion service (objects → inventory, knowledge → edge knowledge, relationship deltas → edge summary; everything else stays in the closed event).
  - **T57** significance-aware retrieval ranking — SQL-side `SIGNIFICANCE_RANK_BIAS` plus the existing Python composite re-rank.
  - **T58** scene compression keeps key quotes when significance ≥ 2; thread emission piggybacks on scene close.
- **Wave 4 — drawer UX (single)**:
  - **T59** drawer additions: events panel, threads panel, skip controls.
- **Wave 5a — prompt + turn flow integration (parallel)**:
  - **T60** prompt assembly includes active events + open threads in the speaker's prompt.
  - **T61** turn flow invokes event-detection + completion promotion alongside existing post-turn fan-out.
- **Wave 5b — natural-language skip surface (single)**:
  - **T62** classifier-driven skip command at the user-input layer; shared skip controllers extracted into `chat/web/skip.py`.
- **Wave 6a — meanwhile schema (single)**:
  - **T63** meanwhile-scene schema + state (scene config 4: host+guest, no "you").
- **Wave 6b — meanwhile turn flow (parallel)**:
  - **T64** meanwhile turn flow (host+guest, no "you" in the prompt or witness writes).
  - **T65** meanwhile summary digest surfaces to the next "you"-present scene.
- **Wave 7 — integration + docs (parallel)**:
  - **T66** cross-feature integration tests covering events × skips × threads × meanwhile.
  - **T67** documentation (this section).

### Phase 3.5 / 4 backlog

All items shipped — see Phase 3.5 status below.

## Phase 3.5 status

Phase 3.5 cleanup shipped end-to-end across 12 tasks (T76–T87). Two CLAUDE.md backlogs (Phase 2.6/3, Phase 3.5/4) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 3.6 / 4 backlog" section below. Test count grew from 315 (Phase 3) to 343 (+28 new tests).

- **Wave 1 — trivial polish (parallel)**:
  - **T76** `narrate_skip` `timeout_s` plumbed through to `client.generate`.
  - **T77** `AddresseeDecision.confidence` typed as `Literal["high","medium","low"]`.
  - **T78** `search_memories` docstring notes SQL-side significance bias (`SIGNIFICANCE_RANK_BIAS`).
  - **T79** `_witness_role_for` defensive `host_bot_id is None` handling (returns `"host"` for Phase-1 chats).
- **Wave 2 — scene_summarize polish (single)**:
  - **T80** five T58 follow-ups: re-close suffix bloat guard, transcript scoping by scene, swallowed-exception logging in `detect_threads`, chat-clock `closed_at`, and three new tests covering T58 gaps (200-char truncation, `thread_updated`/`thread_closed` candidate paths, try/except fallback).
- **Wave 3 — typed exception (single)**:
  - **T81** `ChatNotFoundError` replaces string-prefix sniff in skip routes; mapped to 404 (vs 400 for other `ValueError` cases).
- **Wave 4 — turn-flow wiring (single)**:
  - **T82** `consume_pending_meanwhile_digests` wired into `post_turn` (closes T66 gap; meanwhile digests no longer pile up); natural-language skip dispatch now runs scene close detection first.
- **Wave 5 — regenerate polish (single)**:
  - **T83** five sub-fixes — cancel/stop hook (regenerate registers stream task in `_in_flight_tasks`); DRY extraction of `read_recent_dialogue` and `gather_prior_edges` into `chat/services/turn_common.py`; chat-scoped sibling-assistant-turn lookup; lifecycle-rollback warning log on regenerate; ordering-symmetry comment between post_turn and regenerate event-detection paths.
- **Wave 6 — final polish (parallel)**:
  - **T84** unified `record_turn_memory` API with `you_present` kwarg; `record_meanwhile_memory` becomes a thin wrapper.
  - **T85** JSON-build audit (no findings) + meanwhile cancel route-level test.
  - **T86** frontend `turn_html_replace` SSE handler + turn_id stamping on rendered HTML so the in-place swap actually works.

### Phase 3.6 / 4 backlog

New follow-ups discovered during Phase 3.5 reviews and execution. None are blocking; pick up at any time.

#### From T80 review

- **`read_recent_dialogue` chat-id pushdown**: helper filters `chat_id` post-fetch in Python. Could push the `json_extract(payload_json, '$.chat_id') = ?` predicate into SQL (matching T83.3's pattern) for tighter LIMIT semantics. Currently a chat-with-many-other-chats can have its 50-row LIMIT consumed by foreign rows.
- **Lifecycle warning wording in regenerate**: T83.4's warning log lists ALL lifecycle event ids that exist after the original `assistant_turn` id, not just ones produced by the superseded turn. For the typical "regenerate the most recent" flow these are identical, but if a user regenerates an OLDER turn, the warning will list intervening-turn lifecycle events that legitimately stand. Tighten warning wording to "lifecycle transitions at-or-after turn X" (operator-friendly); a code-level fix would require a schema change to add explicit back-reference from lifecycle events to their producing turn.

#### From T84 review

- **`record_turn_memory` legacy single-bot function** still exists alongside the unified `record_turn_memory_for_present`. Could be consolidated in a follow-up.

#### From T86 fix-up

- **Test fixtures + `tests/test_phase3_integration.py`** that seed turns directly via `append_event`+`project` may need updating once any new test asserts the rendered HTML carries the new turn ids end-to-end. Existing tests pass because they don't read the stamped attribute, but they're brittle if the contract evolves.

#### Deferred items (carry-over)

- **Scene-close-on-cancel UX revisit** (Phase 2.5 carry-over): T74.3 pinned the existing behavior; revisit if real play-testing surfaces a regression.
- **Cross-feature canned-queue brittleness**: meanwhile-scene close test required a canned response for T65's digest call after T64+T65 merge. Future close-path additions will keep extending the queue. Consider a structured fixture builder rather than positional canned arrays. NOT addressed in Phase 3.5.
- **Lifecycle-transition rollback in regenerate**: T83.4 added a warning log; actual rollback (with proper schema linkage from lifecycle event back to producing turn) is Phase 4 work.

## Phase 4 status

Phase 4 polish shipped end-to-end across 15 tasks (T88–T102). Vector retrieval is functional via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions). Branching is data-model + drawer UI. Surgical delete with cascade preview, hide-from-view soft delete, significance review panel, snapshot UX, and cross-chat search all surface from the drawer or top-bar. Test count grew from 343 (Phase 3.5) to ~413 (+70 new tests).

- **Wave 1 — schema + Phase 3.6 carry-overs (parallel)**:
  - **T88** `embeddings` table + projector handlers (pure-Python cosine, JSON-blob storage; sqlite-vec deferred).
  - **T89** `branches` table + handlers (main bootstrapped; `is_active` flag; partial unique index).
  - **T90** Phase 3.6 carry-overs trio — `read_recent_dialogue` chat-id SQL pushdown, lifecycle warning wording tightening, legacy `record_turn_memory` removed.
- **Wave 2 — services (parallel)**:
  - **T91** embedding generation service (Phase 4 ships a deterministic SHA-256-derived pseudo-embedding; real model swap is Phase 4.5+).
  - **T92** vector search service via pure-Python cosine.
  - **T93** cross-chat search service (FTS5 across all owners, no witness filter — admin-style).
- **Wave 3 — services (parallel)**:
  - **T94** branching service (`branch_from_event`, `switch_active_branch`, `list_branches_with_metadata`).
  - **T95** delete-impact computation service (cascade preview, no DB mutation).
- **Wave 4 — combined retrieval (single)**:
  - **T96** combined FTS + vector retrieval ranking via reciprocal-rank fusion (RRF, `RRF_CONST=60`); existing significance/recency boost applied as final pass.
- **Wave 5 — memory write hook + backfill (single)**:
  - **T97** `EmbeddingWorker` drains queue and emits `embedding_indexed` events; `memory_write` enqueues per `memory_written`; `backfill_embeddings` script for existing memories; ALL 4 production call sites wired (turns, regenerate, meanwhile, drawer).
- **Wave 6 — drawer Phase 4 bundle (single, 5 sub-features)**:
  - **T98.1** branching UI (Branches panel + 3 routes).
  - **T98.2** significance review panel (distribution bar chart + per-memory edit).
  - **T98.3** hide-from-view toggle + `turn_hidden` `manual_edit` branch.
  - **T98.4** surgical delete with cascade preview (reuses existing rewind path; pre-rewind snapshot preserved).
  - **T98.5** remaining v1 edits — `narrative_anchor` + weather drawer affordances + 2 new `manual_edit` branches.
- **Wave 7 — UX surfaces (parallel)**:
  - **T99** snapshot UX (manual trigger, list, restore with hard-confirm, preview).
  - **T100** cross-chat search UX (top-bar form + results page).
- **Wave 8 — polish (parallel)**:
  - **T101** cross-feature integration tests (5 multi-feature scenarios).
  - **T102** documentation (this section).

### Phase 4.5 / 5 backlog

All items shipped or deferred to Phase 5 (see "Phase 5 backlog" below). Final schema version: 14.

## Phase 4.5 status

Phase 4.5 cleanup shipped 13 of 14 planned tasks (T103–T117 with T115 deferred; T118 is this docs sweep). Two CLAUDE.md backlogs (Phase 3.6/4, Phase 4.5/5) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 5 backlog" section below. Schema baseline advanced from version 13 to **14** (migration 0014: `memories.event_id`). Test count grew from ~413 (Phase 4) to ~457 (+~44 new tests across the wave).

- **Wave 1 — trivial polish (parallel)**:
  - **T103** branches polish — global-branch (`chat_id IS NULL`) leak documented in `list_branches`; branch-switch to nonexistent name now logs a warning.
  - **T104** `memory.py` DRY — `MAX(id)` helper extracted; `fts_rank=None` contract documented for vector-only rows.
  - **T105** `snapshots.py` polish — `datetime`/`timezone` imports hoisted to module level; strict `kind` validation in restore/preview (rejects missing); `created_at` from file mtime documented.
  - **T106** `search.py` polish — `k=50` extracted to module constant; N+1 `get_bot`/`get_chat`/`get_scene` lookups batched.
  - **T107** `embeddings.py` — `timeout_s` fallback-path warning when non-default model misconfigured.
- **Wave 2 — scene-close-on-cancel (single)**:
  - **T108** strengthened the T74.3 regression test + documented rationale in `turns.py`. **Surfaced a deferred bug**: existing pin only passes because `asyncio` isn't imported in the test module (NameError caught instead of CancelledError). When CancelledError fires for real, `post_turn`'s end-of-function re-raise causes `open_db`'s dependency teardown to skip `conn.commit()`, rolling back ALL post-cancel writes. Documented and deferred to Phase 5 triage.
- **Wave 3 — schema 0014 (single)**:
  - **T109** `memories.event_id` column (foundation for T111 deep-link). FK CASCADE on `embeddings.memory_id` deferred (memories rows are never deleted today; defensive constraint can't fire — saved for broader migration cleanup in Phase 5).
- **Wave 4 — drawer Phase 4.5 bundle (single)**:
  - **T110** `event_id <= 0` guard in `delete_turn` + `html.escape()` on delete-impact modal + Jinja partial extraction + bulk significance re-rate per chat (one `manual_edit` event per memory).
- **Wave 5 — search UX (single)**:
  - **T111** FTS snippet highlighting via `snippet()` + deep-link to turn via `memories.event_id`.
- **Wave 6 — real embedding model swap (single)**:
  - **T112** `LLMClient.embed()` Protocol + Mock impl with `canned_embeddings` + `FeatherlessClient.embed()` (raises `NotImplementedError` — Featherless OAI-compat doesn't expose embeddings, gap documented) + `generate_embedding` routes non-default models through `client.embed()` with fallback + `--re-embed-all` backfill flag.
- **Wave 7 — branching read-side filter (single)**:
  - **T113** `active_branch_event_ids(conn)` helper + applied to `read_recent_dialogue`, `scene_summarize._read_recent_dialogue`, `search_memories`, and `meanwhile._read_recent_meanwhile_dialogue`. Cross-chat search and projector queries deliberately NOT filtered (cross-chat is by design; projectors must see full log). Bootstrap "main" branch (origin=0, head=0) detected as the no-clamp sentinel.
- **Wave 8 — regenerate lifecycle rollback (single)**:
  - **T114** `triggered_by_assistant_turn_id` payload back-reference on `event_started`/`event_completed`/`event_cancelled` + new `event_status_reverted` event kind + projector handler in `chat/state/events.py` + regenerate flow emits revert events for affected lifecycle transitions.
- **Wave 9 — final polish + integration (parallel)**:
  - **T115** sqlite-vec swap — **DEFERRED to Phase 5**. Pre-flight failed: host Python build doesn't expose `sqlite3.Connection.enable_load_extension` (raises `AttributeError`). Requires either Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Phase 4 pure-Python cosine remains in production.
  - **T116** structured `CannedQueue` test fixture builder + 2–3 POC test migrations (Phase 5 to migrate the rest).
  - **T117** Phase 4.5 cross-feature integration tests (5 minimum: real embedding swap, branching read-side filter, lifecycle rollback, search deep-link, bulk significance re-rate).
  - **T118** documentation (this section).

### Phase 5 backlog

New follow-ups discovered during Phase 4.5 reviews and execution, plus carry-over deferrals. None are blocking; pick up at any time.

- **T115 sqlite-vec swap** (environmental blocker): host Python's `sqlite3.Connection` does not expose `enable_load_extension` — `python -c "import sqlite3; sqlite3.connect(':memory:').enable_load_extension(True)"` raises `AttributeError`. Fix requires either a Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Pure-Python cosine remains in production until then.
- **T108 follow-up: cancel-path commit bug** — `post_turn`'s re-raised `CancelledError` causes `open_db` dependency teardown to skip `conn.commit()`, rolling back all post-cancel writes. The existing T74.3 regression test passes only because `asyncio` isn't imported in the test module (NameError masks the real cancel path). Triage required — either commit before re-raise, or restructure the route to never re-raise after the close-detection branch.
- **`embeddings` FK CASCADE on `memory_id`** — deferred from T109; do as part of a broader migration consolidation in Phase 5.
- **`CannedQueue` fixture migration** — T116 shipped the builder + POC migrations; remaining tests still use positional canned arrays. Migrate in Phase 5.
- **Vector index optimization (HNSW)** — currently scales to a few thousand memories on the flat-index pure-Python cosine path; revisit when counts grow past flat-index feasibility.
- **Branch-isolated `event_log`** — each branch has its own physical `event_log` range vs the current shared id space + head filter; full branch isolation is Phase 5+.
- **Embedding model swap migration tooling** — T112 added `--re-embed-all`; a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
- **Real-time collaborative branching** (multi-user) — out of scope for v1.
- **Avatars / portraits** (multimodality) — deferred indefinitely per design §14.