docs: add Phase 4.5 cleanup plan (all 24 backlog items)

16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Mix of: - Wave 1 (parallel 6-way): trivial polish across 6 different files - Wave 2 (single): schema migration 0014 (FK CASCADE + memories.event_id) - Wave 3 (single): drawer bundle (event_id guard + html.escape + modal partial + bulk significance re-rate) - Wave 4 (single): search UX (FTS snippet highlight + deep-link) - Wave 5 (single): real embedding model swap (LLMClient.embed protocol) - Wave 6 (single): branching read-side filter (riskiest — cross-cutting) - Wave 7 (single): regenerate lifecycle rollback - Wave 8 (single): sqlite-vec swap [ENVIRONMENTAL — may defer to Phase 5 if Python rebuild / apsw not feasible] - Wave 9 (parallel 3-way): structured fixture builder + integration tests + docs Schema baseline 13 -> 14 (or 15 with T115). Big tasks (T112 real embed, T113 branching filter, T114 lifecycle rollback) advance the engine beyond Phase 4's metadata-only state. T115 environmental decision captured in pre-flight; the other 13 tasks ship without it. Uses task ids T103-T118 to avoid collision with prior phases.
2026-04-27 04:22:08 -04:00
parent df977fc985
commit a06f90a164
2 changed files with 747 additions and 0 deletions
@@ -0,0 +1,724 @@
+# Roleplay Engine — Phase 4.5 Cleanup Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
+
+**Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder).
+
+**Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability.
+
+**Tech Stack:**
+
+- Existing — same as Phase 4.
+- **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5.
+
+**Source-of-truth references:**
+
+- Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred).
+- Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md).
+- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status".
+
+---
+
+## Pre-flight
+
+**Branch:** create `phase-4.5` from the latest `main`:
+
+```bash
+git checkout main && git pull && git checkout -b phase-4.5
+```
+
+**Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15.
+
+**Optional pre-flight for T115 (sqlite-vec swap):**
+
+The host Python build needs `enable_load_extension`. Two options:
+
+1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv.
+2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`).
+
+If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected.
+
+**Pinned non-negotiables (carried forward):**
+
+- State changes go through the event log. Use `append_and_apply` for the live path.
+- Witness filter every memory read at SQL level.
+- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
+- One commit per task minimum. Bundled tasks split internally.
+
+**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output.
+
+---
+
+## Backlog item → task mapping
+
+24 items consolidated into 14 tasks by **file ownership**:
+
+| # | Item | Source | Task |
+|---|------|--------|------|
+| 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) |
+| 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** |
+| 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** |
+| 4 | Real embedding model swap | T91 / deferred | **T112** |
+| 5 | `timeout_s` fallback-path logging | T91 | **T107** |
+| 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** |
+| 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** |
+| 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** |
+| 9 | `html.escape()` on delete-impact modal output | T98 | **T110** |
+| 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** |
+| 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** |
+| 12 | Strict `kind` validation in snapshot routes | T99 | **T105** |
+| 13 | `created_at` from file mtime — document drift risk | T99 | **T105** |
+| 14 | Hardcoded `k=50` → module constant | T100 | **T106** |
+| 15 | N+1 lookups in search results | T100 | **T106** |
+| 16 | FTS highlighting via `snippet()` | T100 | **T111** |
+| 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** |
+| 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** |
+| 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** |
+| 20 | Bulk significance re-rate in drawer | deferred | **T110** |
+| 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) |
+| 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** |
+| 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** |
+| 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** |
+
+---
+
+## Parallel-Execution Strategy
+
+Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves.
+
+### How to dispatch a wave in parallel
+
+Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4.5`.)
+
+### After a wave completes
+
+1. Each subagent returns its worktree path and commit SHA(s).
+2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
+3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
+4. **Run the full test suite** on the merged `phase-4.5`.
+5. **Push `phase-4.5`** to gitea.
+6. Optionally clean up worktrees.
+
+### Conflict prevention checklist
+
+For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`.
+
+### Why each wave is parallel-safe
+
+| Wave | Tasks | Hot files | Disjoint? |
+|------|-------|-----------|-----------|
+| 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ |
+| 2 | T109 | new migration + minor projector update | (single task) |
+| 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) |
+| 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) |
+| 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) |
+| 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) |
+| 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) |
+| 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) |
+| 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ |
+
+---
+
+## Task overview
+
+```
+Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning)
+        ├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)
+        ├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)
+        ├─ T106: search.py polish (k constant + N+1 batched lookups)
+        ├─ T107: embeddings.py timeout_s fallback-path logging
+        └─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test)
+
+Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column)
+
+Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)
+
+Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id)
+
+Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill)
+
+Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range)
+
+Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede)
+
+Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight]
+
+Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness)
+        ├─ T117: Phase 4.5 cross-feature integration tests
+        └─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals
+```
+
+Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 2–8 are single-task by hot-file constraint.
+
+---
+
+## Wave 1 — Independent small fixes (parallel, 6 tasks)
+
+All trivial, file-disjoint. Each is 1-line + 1-test or similar.
+
+### Task 103: branches polish
+
+**Files:**
+- Modify: `chat/state/branches.py`
+- Modify: `tests/test_branches_state.py`
+
+**Spec (2 sub-fixes, single commit):**
+
+1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped).
+
+2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches.
+
+**Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable).
+
+**Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`.
+
+---
+
+### Task 104: state/memory.py polish
+
+**Files:**
+- Modify: `chat/state/memory.py`
+- Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed)
+
+**Spec (2 sub-fixes):**
+
+1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper.
+
+2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit).
+
+**Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired.
+
+**Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`.
+
+---
+
+### Task 105: snapshots.py polish
+
+**Files:**
+- Modify: `chat/web/snapshots.py`
+- Modify: `tests/test_snapshot_ux.py` (1 new test)
+
+**Spec (3 sub-fixes):**
+
+1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`).
+
+2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting.
+
+3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note.
+
+**Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/<id>` without `kind`; assert 400.
+
+**Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`.
+
+---
+
+### Task 106: search.py polish
+
+**Files:**
+- Modify: `chat/web/search.py`
+- Modify: `tests/test_search_ux.py` (1 new test)
+
+**Spec (2 sub-fixes):**
+
+1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site.
+
+2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row.
+
+**Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`.
+
+**Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`.
+
+---
+
+### Task 107: embeddings.py timeout_s fallback-path logging
+
+**Files:**
+- Modify: `chat/services/embeddings.py`
+- Modify: `tests/test_embeddings.py` (1 new test)
+
+**Spec:**
+
+When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade.
+
+```python
+if model != DEFAULT_EMBEDDING_MODEL:
+    _log.warning(
+        "generate_embedding: non-default model %r returned fallback "
+        "(model client.embed() not yet implemented in Phase 4.5+); "
+        "downstream search will degrade silently. Configure a supported model.",
+        model,
+    )
+    return EmbeddingResult(...)  # fallback
+```
+
+The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning.
+
+**Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears.
+
+**Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`.
+
+---
+
+### Task 108: scene-close-on-cancel UX revisit
+
+**Files:**
+- Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one)
+- Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation
+
+**Spec:**
+
+This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output.
+
+**Action:**
+
+1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4).
+2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't).
+3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only.
+
+**Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change.
+
+**Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract.
+
+**Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`.
+
+---
+
+## Wave 2 — Schema migration (single)
+
+### Task 109: 0014 schema migration
+
+**Files:**
+- Create: `chat/db/migrations/0014_phase45_schema.sql`
+- Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written)
+- Modify: `tests/test_world.py` (bump schema_version assertion to 14)
+- Modify: `tests/test_memory_write.py` (assert event_id populated)
+
+**Spec:**
+
+Two schema changes bundled into a single migration:
+
+1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically.
+
+2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting.
+
+**Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context.
+
+**Tests:**
+
+1. `test_schema_version_after_migration_is_14` (rename + bump from 13).
+2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id.
+3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable.
+
+**Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`.
+
+---
+
+## Wave 3 — Drawer Phase 4.5 bundle (single)
+
+### Task 110: drawer polish + bulk significance re-rate
+
+**Files:**
+- Modify: `chat/web/drawer.py`
+- Modify: `chat/templates/_drawer.html`
+- Create: `chat/templates/_delete_impact_modal.html` (extracted partial)
+- Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind)
+- Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests)
+
+**Spec (4 sub-fixes, 4 commits):**
+
+1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`.
+
+2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS.
+
+3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern.
+
+4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail).
+
+**Tests:**
+
+1. `test_delete_turn_with_event_id_zero_returns_400`.
+2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"<div class=\"delete-impact-modal\">" in response.content` or similar).
+3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `<script>` in a description-bound field; render preview; assert it appears HTML-escaped.
+4. `test_bulk_significance_re_rate_emits_manual_edit_per_memory` — seed 5 memories at significance 0; bulk re-rate to 2; assert 5 `manual_edit` events landed.
+
+**Commits (4):**
+- `fix: drawer delete_turn guards event_id <= 0 (T110.1)`
+- `fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)`
+- `refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)`
+- `feat: drawer bulk significance re-rate per chat (T110.4)`
+
+---
+
+## Wave 4 — Search UX enhancements (single)
+
+### Task 111: FTS highlighting + deep-link to turn
+
+**Files:**
+- Modify: `chat/services/cross_chat_search.py`
+- Modify: `chat/web/search.py`
+- Modify: `chat/templates/search.html`
+- Modify: `tests/test_search_ux.py`
+
+**Spec (2 sub-fixes, 2 commits):**
+
+1. **FTS highlighting via `snippet()`** (T100 nit): replace the `pov_summary` column in `search_all_memories`'s SELECT with `snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)` to return a highlighted snippet around the match. The template renders this raw via `|safe` (the snippet is built by SQLite from indexed content; the `<mark>` tags are the only HTML, and SQLite escapes any HTML special chars in the source content).
+
+2. **Deep-link to turn via memories.event_id** (T100 nit + T109 dependency): now that `memories.event_id` exists (from T109), each search result row knows the originating event id. The chat page uses turn-id stamping (Phase 3.5 T86 added `id="turn-{event_id}"`). Build result links as `/chats/{chat_id}#turn-{event_id}`. The chat page DOM scrolls to the anchor on load (browser default).
+
+**Tests:**
+
+1. `test_search_results_include_fts_snippet_with_highlight` — seed memory with text containing "rabbit"; search for "rabbit"; assert response body contains `<mark>rabbit</mark>` (or whatever marker the snippet uses).
+2. `test_search_result_link_includes_turn_anchor` — seed memory with known event_id; search; assert link href contains `#turn-{event_id}`.
+
+**Commits (2):**
+- `feat: cross-chat search FTS snippet highlighting (T111.1)`
+- `feat: cross-chat search deep-links to turn via memories.event_id (T111.2)`
+
+---
+
+## Wave 5 — Real embedding model (single)
+
+### Task 112: Real embedding model swap
+
+**Files:**
+- Modify: `chat/llm/client.py` (Protocol — add `embed(text, model) -> list[float]` method)
+- Modify: `chat/llm/featherless.py` (FeatherlessClient — implement `embed` against Featherless `/v1/embeddings` endpoint OR equivalent)
+- Modify: `chat/llm/mock.py` (MockLLMClient — accept canned embedding vectors)
+- Modify: `chat/services/embeddings.py` (route non-default model through `client.embed()`)
+- Modify: `chat/config.py` (add `embedding_model: str` setting; default to current pseudo)
+- Modify: `scripts/backfill_embeddings.py` (re-embed-all option for model swaps)
+- Modify: `tests/test_embeddings.py` + `tests/test_llm_mock.py` + `tests/test_featherless.py` (if exists)
+
+**Spec:**
+
+Phase 4 ships a deterministic SHA-256 pseudo-embedding (deterministic but semantically meaningless). T112 wires the path for a real embedding model.
+
+**Steps:**
+
+1. **Extend `LLMClient` Protocol** with `async def embed(self, text: str, *, model: str) -> list[float]`.
+
+2. **Implement on FeatherlessClient**: call the Featherless OpenAI-compatible `/v1/embeddings` endpoint:
+   ```python
+   response = await self._http.post(
+       "/v1/embeddings",
+       json={"model": model, "input": text},
+       headers={"Authorization": f"Bearer {self._api_key}"},
+   )
+   data = response.json()
+   return data["data"][0]["embedding"]
+   ```
+   Handle rate limits (existing 2-conn semaphore covers this).
+
+3. **Implement on MockLLMClient**: `embed` pops a canned vector from a new `canned_embeddings` queue. Tests configure this queue.
+
+4. **Update `generate_embedding`**: when `model != DEFAULT_EMBEDDING_MODEL`, call `client.embed(text, model=model)` instead of falling through to fallback. Wrap in try/except — failures fall back to zero vector (existing fallback path).
+
+5. **Settings**: add `embedding_model: str = "pseudo-sha256-384"` to `Settings`. App reads this at startup; the embedding worker (`chat/services/embedding_worker.py`) passes it through.
+
+6. **Backfill script**: add `--re-embed-all` flag that walks ALL memories (regardless of existing `embeddings_meta` rows) and re-embeds with the configured model. Useful for swapping models.
+
+**Tests:**
+
+1. `test_embed_routes_to_client_when_non_default_model` — mock client with canned vector; call `generate_embedding(model="bge-small-en-v1.5")`; assert vector matches the canned response.
+2. `test_embed_falls_back_on_client_failure` — mock client to raise; assert returns zero vector with model="fallback".
+3. `test_mock_llm_client_embed_pops_canned`.
+4. `test_featherless_embed_calls_correct_endpoint` (if there's an existing featherless test pattern; otherwise mock the HTTP layer).
+
+**Commits:**
+- `feat: LLMClient Protocol gains embed() method (T112.1)`
+- `feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)`
+- `feat: generate_embedding routes non-default models through client.embed (T112.3)`
+- `feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)`
+
+---
+
+## Wave 6 — Branching read-side filter (single, BIG)
+
+### Task 113: Branching read-side filter
+
+**Files (cross-cutting):**
+- Modify: `chat/services/turn_common.py::read_recent_dialogue` — filter events to active branch's range
+- Modify: `chat/services/scene_summarize.py::_read_recent_dialogue` (similar)
+- Modify: `chat/state/memory.py::search_memories` — memories should be filtered to active branch (memories.event_id from T109 enables this)
+- Modify: `chat/state/branches.py` — add helper `active_branch_event_ids(conn) -> tuple[int, int]` returning (origin, head)
+- Add tests across multiple files
+- Modify: `tests/test_branching.py` — add cross-feature tests
+
+**Spec:**
+
+Phase 4 T89 + T94 shipped branching as metadata-only (the table tracks branches; the drawer UI can switch). But event readers DON'T consult `is_active` — they read the entire event_log. So switching branches has no functional effect.
+
+T113 wires the filter:
+
+1. **Helper** `active_branch_event_ids(conn) -> tuple[int, int]`: returns `(origin_event_id, head_event_id)` for the currently active branch. For "main" with origin=0 + head=N, returns `(0, N)` meaning "all events visible".
+
+2. **Apply filter** in every event reader that returns historical state:
+   - `read_recent_dialogue`: WHERE clause adds `id BETWEEN ? AND ?` (the active branch's range).
+   - `search_memories`: WHERE clause adds `m.event_id BETWEEN ? AND ?` (uses T109's column).
+   - `scene_summarize._read_recent_dialogue`: same as turn_common.
+   - Other readers TBD — grep for `event_log` SELECT patterns and audit each one.
+
+3. **Branches that diverge**: when branch B is created from event 10 and then accumulates events 11-15 (which only exist on B's timeline), but main also accumulates 11-12, the events overlap by id range. This is OK because event reads filter by `id <= active_branch.head_event_id`. The simpler model: branches share event_log ids globally, but each branch's "head" defines which ids are visible.
+
+4. **Events written under branch B** carry an implicit branch tag — but the event_log table has no `branch_id` column today. T113 punts on cross-branch event writes (they all land in the global log) and relies on the `head_event_id` filter to scope reads. This is a Phase 4.5+ first cut; full branch-isolated event_log is Phase 5+.
+
+**Edge cases:**
+
+- Active branch has `head_event_id = 0` (just created): readers return empty.
+- No active branch: readers fall through to "all events visible" (defensive).
+- Switching branches mid-flight: each `read_recent_dialogue` call re-queries `active_branch`, so it's always current. No caching.
+
+**Tests:** 5+ minimum.
+
+1. `test_read_recent_dialogue_respects_active_branch_head` — seed 10 events; active branch head = 5; assert only first 5 returned.
+2. `test_search_memories_respects_active_branch_head` — same.
+3. `test_branch_switch_changes_visible_events` — switch branches; immediately read; assert different result sets.
+4. `test_main_branch_with_head_zero_returns_empty` — defensive.
+5. `test_no_active_branch_falls_through_to_all_events` — defensive.
+
+**Commit:** `feat: branching read-side filter — event readers consult active branch range (T113)`.
+
+**This is the largest task in Phase 4.5.** Estimate 200-400 lines across multiple files. Implementer should split commits if it helps clarity (one per affected reader).
+
+---
+
+## Wave 7 — Lifecycle rollback in regenerate (single)
+
+### Task 114: Lifecycle rollback
+
+**Files:**
+- Modify: `chat/services/regenerate.py`
+- Modify: `chat/db/migrations/0014_phase45_schema.sql` (T109's migration) — add column? OR
+- Add new migration — see decision below
+- Modify: tests in `tests/test_regenerate.py`
+
+**Spec:**
+
+Phase 3.5 T83.4 shipped a warning log when regenerate detects un-rolled-back lifecycle transitions. T114 implements actual rollback.
+
+**Schema decision:**
+
+Option A: extend lifecycle event payloads with `triggered_by_assistant_turn_id` (no schema change needed — just a payload convention). Production code (T61 turn flow) populates it when emitting `event_started`/`event_completed`/`event_cancelled`. Existing rows have NULL — rollback skips them with a debug log.
+
+Option B: add a column to `event_log` for stronger invariants. Significant migration cost.
+
+**Recommended:** Option A. Safer, no migration, backward compatible (older events skip rollback). Document in commit body.
+
+**Rollback semantics:**
+
+When regenerate detects lifecycle events triggered by the superseded turn:
+- `event_started` → emit `event_cancelled` (or a NEW `event_started_undone` event kind that reverts status to "planned") with the same event_id.
+- `event_completed` → emit `event_uncompleted` (NEW event kind that reverts status from "completed" to "active").
+- `event_cancelled` → emit `event_uncancelled` (reverts to prior status — which we'd need to track; or simpler: emit `event_started` again to restore "active").
+
+**Simpler approach (recommended):** add ONE new event kind `event_status_reverted` with payload `{event_id, prior_status}`. The projector sets `events.status = prior_status` for the event_id. Rollback emits this event for each affected lifecycle transition, looking up the prior status from the row's history (via event_log scan) or accepting it as a payload field.
+
+**Production code change:** in `chat/web/turns.py::post_turn` (and `chat/services/regenerate.py`), when emitting `event_started`/`event_completed`/`event_cancelled`, populate `triggered_by_assistant_turn_id: <id>` in the payload. Forward-only — older code doesn't need updating.
+
+**Tests:** 3 minimum.
+
+1. `test_regenerate_rolls_back_event_started_from_superseded_turn` — seed an event; play a turn that starts it; regenerate; assert `event_status_reverted` event landed with `prior_status="planned"` and the events row is back to "planned".
+2. `test_regenerate_rolls_back_event_completed_to_active` — same but completed → active rollback.
+3. `test_regenerate_skips_events_without_back_reference` — older events without `triggered_by_assistant_turn_id` are not rolled back (debug log). Pin the backward-compat behavior.
+
+**Commits:**
+- `feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)`
+- `feat: event_status_reverted event kind + projector handler (T114.2)`
+- `feat: regenerate rolls back lifecycle transitions on supersede (T114.3)`
+
+---
+
+## Wave 8 — sqlite-vec swap (single, ENVIRONMENTAL)
+
+### Task 115: sqlite-vec swap (optional)
+
+**Files:**
+- Create: `chat/db/migrations/0015_vec0_virtual_tables.sql`
+- Modify: `chat/db/connection.py` (load extension on every connection)
+- Modify: `chat/services/vector_search.py` (rewrite to use vec0 MATCH instead of pure-Python cosine)
+- Modify: `chat/state/embeddings.py` (writer needs to populate vec0 table)
+- Modify: `pyproject.toml` (add `sqlite-vec` dependency)
+
+**Pre-flight:**
+
+This task REQUIRES one of:
+- Python rebuilt with `--enable-loadable-sqlite-extensions` (pyenv reinstall).
+- `apsw` migration of `chat/db/connection.py`.
+
+If neither is feasible at the time of execution: SKIP THIS TASK and document the deferral in T118 docs sweep. The other 13 Phase 4.5 tasks ship without it.
+
+**Spec:**
+
+1. **Migration** `0015_vec0_virtual_tables.sql`:
+   ```sql
+   CREATE VIRTUAL TABLE embeddings_vec USING vec0(
+       memory_id INTEGER PRIMARY KEY,
+       embedding FLOAT[384]
+   );
+   -- Backfill from existing JSON embeddings table.
+   INSERT INTO embeddings_vec (memory_id, embedding)
+   SELECT memory_id, vec_f32(vector_json) FROM embeddings;
+   ```
+
+2. **`chat/db/connection.py`** loads `sqlite_vec` extension on every connection:
+   ```python
+   import sqlite_vec
+   def open_db(...):
+       conn = sqlite3.connect(...)
+       conn.enable_load_extension(True)
+       sqlite_vec.load(conn)
+       conn.enable_load_extension(False)
+       ...
+   ```
+
+3. **Rewrite `vector_search.py`** to use `embeddings_vec MATCH ?` syntax with `k=?` clause:
+   ```sql
+   SELECT m.id, m.pov_summary, m.significance, e.distance
+   FROM embeddings_vec e
+   JOIN memories m ON m.id = e.memory_id
+   WHERE e.embedding MATCH ? AND k = ?
+     AND m.owner_id = ?
+     AND m.witness_<role> = 1
+   ORDER BY e.distance ASC
+   LIMIT ?
+   ```
+
+4. **HNSW note**: vec0 supports both flat (default) and HNSW indexes. T115 ships flat (sufficient for < few thousand memories). Document HNSW upgrade path in CLAUDE.md if memory counts ever grow past pure-Python feasibility.
+
+5. **Old `embeddings` JSON table**: keep alongside `embeddings_vec` (data redundancy is fine; the JSON table is the source of truth and `embeddings_vec` is the index). Backfill on migration. Keep the `embedding_indexed` projector populating both.
+
+**Tests:** rewrite `tests/test_vector_search.py` to expect new behavior. Same observable contract — only implementation changes. All 5 existing tests should pass post-swap.
+
+**Commit:** `feat: sqlite-vec swap (vec0 virtual tables + MATCH-based search) (T115)`.
+
+---
+
+## Wave 9 — Polish (parallel, 3 tasks)
+
+### Task 116: Structured test-fixture builder
+
+**Files:**
+- Create: `tests/fixtures.py` (or extend `tests/conftest.py`)
+- Modify: existing test files that use brittle canned-queue arrays (selectively)
+
+**Spec:**
+
+Phase 3 carry-over. Tests across `test_turn_flow.py`, `test_meanwhile_turn_flow.py`, `test_phase3_integration.py`, `test_phase4_integration.py` use positional canned-response arrays for `MockLLMClient`. Adding a new classifier call to a code path requires updating canned arrays in many tests.
+
+**Solution:** structured fixture builder that lets tests declare their classifier expectations by name, not position:
+
+```python
+# tests/fixtures.py
+class CannedQueue:
+    def __init__(self):
+        self._queue = []
+    def parse_turn(self, **fields): ...
+    def state_update(self, **fields): ...
+    def detect_scene_close(self, should_close: bool): ...
+    def detect_event_transitions(self, transitions: list[dict]): ...
+    def summarize_scene(self, summary: str, **fields): ...
+    def detect_threads(self, candidates: list[dict]): ...
+    # ... one method per classifier service
+    def build(self) -> list[str]:
+        return [json.dumps(item) for item in self._queue]
+```
+
+Usage:
+
+```python
+def test_post_turn_with_event_transition(...):
+    canned = (
+        CannedQueue()
+            .parse_turn(intent="narrative")
+            .narrative("BotA speaks.")  # narrative is a stream, but for simplicity treat it like a canned response
+            .state_update(affinity_delta=0, trust_delta=0)
+            .state_update(affinity_delta=0, trust_delta=0)
+            .detect_event_transitions([{"event_id": "evt_1", "new_status": "completed"}])
+            .detect_scene_close(should_close=False)
+            .build()
+    )
+    mock = MockLLMClient(canned=canned)
+    # ...
+```
+
+**Migration scope:** don't migrate ALL existing tests at once — that's a separate massive refactor. Instead, ship the fixture builder + migrate 2-3 representative tests as proof of concept. Document the migration path in the fixture's docstring.
+
+**Tests:** the fixture builder itself doesn't need extensive testing — it's just a builder. Add 1-2 sanity tests that the JSON output matches expected shapes.
+
+**Commit:** `test: structured CannedQueue fixture builder for classifier mocks (T116)`.
+
+---
+
+### Task 117: Phase 4.5 cross-feature integration tests
+
+**Files:**
+- Create: `tests/test_phase45_integration.py`
+
+**Spec:**
+
+End-to-end multi-feature flows specific to Phase 4.5 changes. 5 tests minimum.
+
+1. **Real embedding swap + retrieval** — configure `embedding_model="bge-small-en-v1.5"` (mocked); write a memory; backfill or wait for worker; assert vector search returns the memory via `client.embed`-derived vector (not pseudo).
+
+2. **Branching read-side filter end-to-end** — create a branch from turn 5; switch; play 3 turns on the branch; switch back to main; assert main's recent dialogue is missing the branch turns (read filter respects active branch's head).
+
+3. **Lifecycle rollback** — start an event via a turn; regenerate that turn; assert lifecycle reverted (event back to "planned").
+
+4. **Search deep-link** — write memories; search; click a result; verify the chat page renders with the right turn anchored (assert via TestClient response — either the browser anchor OR a server-side scroll-to-anchor mechanism).
+
+5. **Bulk significance re-rate end-to-end** — seed 5 memories at significance 0; bulk re-rate via drawer; verify significance histogram updates.
+
+**Commit:** `test: phase 4.5 cross-feature integration coverage (T117)`.
+
+---
+
+### Task 118: Phase 4.5 documentation update
+
+**Files:**
+- Modify: `CLAUDE.md`
+- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 entries — though they're already shipped per Phase 4 T102)
+
+**Spec:**
+
+Mirror the Phase 3.5 / 2.5 status sections. Document:
+
+- All shipped items per task (T103–T117).
+- Empty out the Phase 4.5 / 5 backlog (replace with single "All items shipped" line).
+- Add new "Phase 5 backlog" section if any Phase 4.5 reviews surfaced new follow-ups.
+
+**Phase 5 backlog candidates** (default, if no new follow-ups discovered):
+
+- Vector index optimization (HNSW) when memory counts grow past flat-index feasibility.
+- Branch-isolated event_log (each branch has its own physical event_log range vs the current shared id space + head filter).
+- Embedding model swap migration tooling — when changing models, need to re-embed everything; T112 added `--re-embed-all` but a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
+- Real-time collaborative branching (multi-user) — out of scope for v1.
+- Avatars / portraits (multimodality) — deferred indefinitely per design §14.
+
+**Commit:** `docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118)`.
+
+---
+
+## Wrap-up
+
+After Wave 9 lands:
+
+1. **Run full suite** on `phase-4.5`: should be ~430+ tests passing (413 from Phase 4 + ~20 new across Phase 4.5).
+2. **Manual smoke** (recommended before opening the PR):
+   - Configure `embedding_model="bge-small-en-v1.5"` (or whatever real model is chosen); restart server; play a turn; verify `embedding_indexed` events use the real model and search returns semantically-relevant memories.
+   - Create a branch, switch, play turns, switch back — verify main's history is unaffected.
+   - Plan an event, complete it via a turn, regenerate that turn — verify event reverts to "planned".
+   - Use cross-chat search; click a result; verify it lands on the right turn in the chat page.
+   - Bulk re-rate a chat's significance distribution.
+3. **Push `phase-4.5`** to gitea.
+4. **Open PR** `phase-4.5 → main`.
+
+---
+
+## Notes for the controller running this plan
+
+- **T115 (sqlite-vec swap)** is environmental. If pre-flight fails (no rebuilt Python, no apsw), defer to Phase 5 and ship Phase 4.5 with 13 tasks. T118 docs sweep should note the deferral.
+- **T112 (real embedding swap)** assumes Featherless or similar exposes an `/v1/embeddings` endpoint. If not available, document the gap and ship the Protocol + Mock impl only (Featherless impl deferred). The pseudo path remains the default in that case — same as Phase 4.
+- **T113 (branching read-side filter)** is the riskiest task. Cross-cutting. Land it on a quiet branch, test thoroughly. If integration tests break in unexpected ways, bisect the affected reader and add coverage.
+- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
+- **Token-spend rough estimate**: Phase 4.5 should be ~50% the size of Phase 4 (similar number of tasks, mostly smaller). Big tasks (T112, T113, T114) bring the per-task spend up but parallelism in Wave 1 + Wave 9 brings the wall-clock down.
+- **DO NOT break existing v1/v2/v3/v3.5/v4 surface contracts.** Every test file that was green at the start of Phase 4.5 must stay green at the end. The cross-feature integration tests (`tests/test_phase4_integration.py`, `tests/test_phase3_integration.py`) are particularly load-bearing.
@@ -0,0 +1,23 @@
+{
+  "planPath": "docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md",
+  "tasks": [
+    {"id": 103, "subject": "T103: branches polish (global-leak doc + branch-switch warning)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 104, "subject": "T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 105, "subject": "T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 106, "subject": "T106: search.py polish (k constant + N+1 batched lookups)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 107, "subject": "T107: embeddings.py timeout_s fallback-path logging", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 108, "subject": "T108: scene-close-on-cancel UX revisit (regression test pin + rationale doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 109, "subject": "T109: 0014 schema migration (FK CASCADE + memories.event_id column)", "status": "pending", "wave": 2, "parallelGroup": null},
+    {"id": 110, "subject": "T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [109]},
+    {"id": 111, "subject": "T111: search UX (FTS snippet highlighting + deep-link via memories.event_id)", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [109]},
+    {"id": 112, "subject": "T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + routing)", "status": "pending", "wave": 5, "parallelGroup": null},
+    {"id": 113, "subject": "T113: branching read-side filter (event readers consult is_active branch range)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [109]},
+    {"id": 114, "subject": "T114: regenerate lifecycle rollback (back-reference + event_status_reverted)", "status": "pending", "wave": 7, "parallelGroup": null},
+    {"id": 115, "subject": "T115: sqlite-vec swap (vec0 virtual tables + MATCH search) [ENVIRONMENTAL — may defer]", "status": "pending", "wave": 8, "parallelGroup": null},
+    {"id": 116, "subject": "T116: structured CannedQueue test fixture builder", "status": "pending", "wave": 9, "parallelGroup": "wave-9"},
+    {"id": 117, "subject": "T117: phase 4.5 cross-feature integration tests", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]},
+    {"id": 118, "subject": "T118: phase 4.5 docs sweep — prune backlog, capture phase 5 candidates", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]}
+  ],
+  "lastUpdated": "2026-04-27T00:00:00Z",
+  "notes": "16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Wave 1 (6-way parallel) and Wave 9 (3-way parallel) maximize parallelism. Waves 2-8 are single-task by hot-file constraint. T115 (sqlite-vec swap) requires Python rebuild OR apsw migration — environmental; may defer to Phase 5. Schema baseline 13 -> 14 (T109's 0014) -> optionally 15 (T115's 0015). Big tasks: T112 (real embedding swap), T113 (branching read-side filter — riskiest), T114 (lifecycle rollback). Uses task ids T103-T118."
+}