# Roleplay Engine — Phase 4.5 Cleanup Plan > **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves. **Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder). **Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability. **Tech Stack:** - Existing — same as Phase 4. - **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5. **Source-of-truth references:** - Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred). - Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md). - Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status". --- ## Pre-flight **Branch:** create `phase-4.5` from the latest `main`: ```bash git checkout main && git pull && git checkout -b phase-4.5 ``` **Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15. **Optional pre-flight for T115 (sqlite-vec swap):** The host Python build needs `enable_load_extension`. Two options: 1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv. 2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`). If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected. **Pinned non-negotiables (carried forward):** - State changes go through the event log. Use `append_and_apply` for the live path. - Witness filter every memory read at SQL level. - TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor). - One commit per task minimum. Bundled tasks split internally. **Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. --- ## Backlog item → task mapping 24 items consolidated into 14 tasks by **file ownership**: | # | Item | Source | Task | |---|------|--------|------| | 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) | | 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** | | 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** | | 4 | Real embedding model swap | T91 / deferred | **T112** | | 5 | `timeout_s` fallback-path logging | T91 | **T107** | | 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** | | 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** | | 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** | | 9 | `html.escape()` on delete-impact modal output | T98 | **T110** | | 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** | | 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** | | 12 | Strict `kind` validation in snapshot routes | T99 | **T105** | | 13 | `created_at` from file mtime — document drift risk | T99 | **T105** | | 14 | Hardcoded `k=50` → module constant | T100 | **T106** | | 15 | N+1 lookups in search results | T100 | **T106** | | 16 | FTS highlighting via `snippet()` | T100 | **T111** | | 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** | | 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** | | 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** | | 20 | Bulk significance re-rate in drawer | deferred | **T110** | | 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) | | 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** | | 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** | | 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** | --- ## Parallel-Execution Strategy Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves. ### How to dispatch a wave in parallel Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/- -b / phase-4.5`.) ### After a wave completes 1. Each subagent returns its worktree path and commit SHA(s). 2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115). 3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`. 4. **Run the full test suite** on the merged `phase-4.5`. 5. **Push `phase-4.5`** to gitea. 6. Optionally clean up worktrees. ### Conflict prevention checklist For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`. ### Why each wave is parallel-safe | Wave | Tasks | Hot files | Disjoint? | |------|-------|-----------|-----------| | 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ | | 2 | T109 | new migration + minor projector update | (single task) | | 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) | | 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) | | 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) | | 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) | | 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) | | 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) | | 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ | --- ## Task overview ``` Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning) ├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc) ├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc) ├─ T106: search.py polish (k constant + N+1 batched lookups) ├─ T107: embeddings.py timeout_s fallback-path logging └─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test) Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column) Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate) Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id) Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill) Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range) Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede) Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight] Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness) ├─ T117: Phase 4.5 cross-feature integration tests └─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals ``` Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 2–8 are single-task by hot-file constraint. --- ## Wave 1 — Independent small fixes (parallel, 6 tasks) All trivial, file-disjoint. Each is 1-line + 1-test or similar. ### Task 103: branches polish **Files:** - Modify: `chat/state/branches.py` - Modify: `tests/test_branches_state.py` **Spec (2 sub-fixes, single commit):** 1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped). 2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches. **Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable). **Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`. --- ### Task 104: state/memory.py polish **Files:** - Modify: `chat/state/memory.py` - Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed) **Spec (2 sub-fixes):** 1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper. 2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit). **Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired. **Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`. --- ### Task 105: snapshots.py polish **Files:** - Modify: `chat/web/snapshots.py` - Modify: `tests/test_snapshot_ux.py` (1 new test) **Spec (3 sub-fixes):** 1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`). 2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting. 3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note. **Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/` without `kind`; assert 400. **Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`. --- ### Task 106: search.py polish **Files:** - Modify: `chat/web/search.py` - Modify: `tests/test_search_ux.py` (1 new test) **Spec (2 sub-fixes):** 1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site. 2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row. **Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`. **Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`. --- ### Task 107: embeddings.py timeout_s fallback-path logging **Files:** - Modify: `chat/services/embeddings.py` - Modify: `tests/test_embeddings.py` (1 new test) **Spec:** When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade. ```python if model != DEFAULT_EMBEDDING_MODEL: _log.warning( "generate_embedding: non-default model %r returned fallback " "(model client.embed() not yet implemented in Phase 4.5+); " "downstream search will degrade silently. Configure a supported model.", model, ) return EmbeddingResult(...) # fallback ``` The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning. **Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears. **Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`. --- ### Task 108: scene-close-on-cancel UX revisit **Files:** - Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one) - Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation **Spec:** This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output. **Action:** 1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4). 2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't). 3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only. **Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change. **Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract. **Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`. --- ## Wave 2 — Schema migration (single) ### Task 109: 0014 schema migration **Files:** - Create: `chat/db/migrations/0014_phase45_schema.sql` - Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written) - Modify: `tests/test_world.py` (bump schema_version assertion to 14) - Modify: `tests/test_memory_write.py` (assert event_id populated) **Spec:** Two schema changes bundled into a single migration: 1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically. 2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting. **Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context. **Tests:** 1. `test_schema_version_after_migration_is_14` (rename + bump from 13). 2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id. 3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable. **Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`. --- ## Wave 3 — Drawer Phase 4.5 bundle (single) ### Task 110: drawer polish + bulk significance re-rate **Files:** - Modify: `chat/web/drawer.py` - Modify: `chat/templates/_drawer.html` - Create: `chat/templates/_delete_impact_modal.html` (extracted partial) - Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind) - Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests) **Spec (4 sub-fixes, 4 commits):** 1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`. 2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS. 3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern. 4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail). **Tests:** 1. `test_delete_turn_with_event_id_zero_returns_400`. 2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"
" in response.content` or similar). 3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `