From a06f90a164bcd23a9ea57c728befc5cce08feb22 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 27 Apr 2026 04:22:08 -0400 Subject: [PATCH] docs: add Phase 4.5 cleanup plan (all 24 backlog items) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Mix of: - Wave 1 (parallel 6-way): trivial polish across 6 different files - Wave 2 (single): schema migration 0014 (FK CASCADE + memories.event_id) - Wave 3 (single): drawer bundle (event_id guard + html.escape + modal partial + bulk significance re-rate) - Wave 4 (single): search UX (FTS snippet highlight + deep-link) - Wave 5 (single): real embedding model swap (LLMClient.embed protocol) - Wave 6 (single): branching read-side filter (riskiest — cross-cutting) - Wave 7 (single): regenerate lifecycle rollback - Wave 8 (single): sqlite-vec swap [ENVIRONMENTAL — may defer to Phase 5 if Python rebuild / apsw not feasible] - Wave 9 (parallel 3-way): structured fixture builder + integration tests + docs Schema baseline 13 -> 14 (or 15 with T115). Big tasks (T112 real embed, T113 branching filter, T114 lifecycle rollback) advance the engine beyond Phase 4's metadata-only state. T115 environmental decision captured in pre-flight; the other 13 tasks ship without it. Uses task ids T103-T118 to avoid collision with prior phases. --- .../plans/2026-04-27-v4.5-phase4.5-cleanup.md | 724 ++++++++++++++++++ ...-04-27-v4.5-phase4.5-cleanup.md.tasks.json | 23 + 2 files changed, 747 insertions(+) create mode 100644 docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md create mode 100644 docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md.tasks.json diff --git a/docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md b/docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md new file mode 100644 index 0000000..5944ffd --- /dev/null +++ b/docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md @@ -0,0 +1,724 @@ +# Roleplay Engine — Phase 4.5 Cleanup Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves. + +**Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder). + +**Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability. + +**Tech Stack:** + +- Existing — same as Phase 4. +- **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5. + +**Source-of-truth references:** + +- Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred). +- Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md). +- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status". + +--- + +## Pre-flight + +**Branch:** create `phase-4.5` from the latest `main`: + +```bash +git checkout main && git pull && git checkout -b phase-4.5 +``` + +**Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15. + +**Optional pre-flight for T115 (sqlite-vec swap):** + +The host Python build needs `enable_load_extension`. Two options: + +1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv. +2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`). + +If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected. + +**Pinned non-negotiables (carried forward):** + +- State changes go through the event log. Use `append_and_apply` for the live path. +- Witness filter every memory read at SQL level. +- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor). +- One commit per task minimum. Bundled tasks split internally. + +**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. + +--- + +## Backlog item → task mapping + +24 items consolidated into 14 tasks by **file ownership**: + +| # | Item | Source | Task | +|---|------|--------|------| +| 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) | +| 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** | +| 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** | +| 4 | Real embedding model swap | T91 / deferred | **T112** | +| 5 | `timeout_s` fallback-path logging | T91 | **T107** | +| 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** | +| 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** | +| 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** | +| 9 | `html.escape()` on delete-impact modal output | T98 | **T110** | +| 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** | +| 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** | +| 12 | Strict `kind` validation in snapshot routes | T99 | **T105** | +| 13 | `created_at` from file mtime — document drift risk | T99 | **T105** | +| 14 | Hardcoded `k=50` → module constant | T100 | **T106** | +| 15 | N+1 lookups in search results | T100 | **T106** | +| 16 | FTS highlighting via `snippet()` | T100 | **T111** | +| 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** | +| 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** | +| 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** | +| 20 | Bulk significance re-rate in drawer | deferred | **T110** | +| 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) | +| 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** | +| 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** | +| 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** | + +--- + +## Parallel-Execution Strategy + +Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves. + +### How to dispatch a wave in parallel + +Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/- -b / phase-4.5`.) + +### After a wave completes + +1. Each subagent returns its worktree path and commit SHA(s). +2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115). +3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`. +4. **Run the full test suite** on the merged `phase-4.5`. +5. **Push `phase-4.5`** to gitea. +6. Optionally clean up worktrees. + +### Conflict prevention checklist + +For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`. + +### Why each wave is parallel-safe + +| Wave | Tasks | Hot files | Disjoint? | +|------|-------|-----------|-----------| +| 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ | +| 2 | T109 | new migration + minor projector update | (single task) | +| 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) | +| 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) | +| 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) | +| 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) | +| 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) | +| 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) | +| 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ | + +--- + +## Task overview + +``` +Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning) + ├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc) + ├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc) + ├─ T106: search.py polish (k constant + N+1 batched lookups) + ├─ T107: embeddings.py timeout_s fallback-path logging + └─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test) + +Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column) + +Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate) + +Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id) + +Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill) + +Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range) + +Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede) + +Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight] + +Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness) + ├─ T117: Phase 4.5 cross-feature integration tests + └─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals +``` + +Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 2–8 are single-task by hot-file constraint. + +--- + +## Wave 1 — Independent small fixes (parallel, 6 tasks) + +All trivial, file-disjoint. Each is 1-line + 1-test or similar. + +### Task 103: branches polish + +**Files:** +- Modify: `chat/state/branches.py` +- Modify: `tests/test_branches_state.py` + +**Spec (2 sub-fixes, single commit):** + +1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped). + +2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches. + +**Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable). + +**Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`. + +--- + +### Task 104: state/memory.py polish + +**Files:** +- Modify: `chat/state/memory.py` +- Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed) + +**Spec (2 sub-fixes):** + +1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper. + +2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit). + +**Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired. + +**Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`. + +--- + +### Task 105: snapshots.py polish + +**Files:** +- Modify: `chat/web/snapshots.py` +- Modify: `tests/test_snapshot_ux.py` (1 new test) + +**Spec (3 sub-fixes):** + +1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`). + +2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting. + +3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note. + +**Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/` without `kind`; assert 400. + +**Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`. + +--- + +### Task 106: search.py polish + +**Files:** +- Modify: `chat/web/search.py` +- Modify: `tests/test_search_ux.py` (1 new test) + +**Spec (2 sub-fixes):** + +1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site. + +2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row. + +**Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`. + +**Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`. + +--- + +### Task 107: embeddings.py timeout_s fallback-path logging + +**Files:** +- Modify: `chat/services/embeddings.py` +- Modify: `tests/test_embeddings.py` (1 new test) + +**Spec:** + +When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade. + +```python +if model != DEFAULT_EMBEDDING_MODEL: + _log.warning( + "generate_embedding: non-default model %r returned fallback " + "(model client.embed() not yet implemented in Phase 4.5+); " + "downstream search will degrade silently. Configure a supported model.", + model, + ) + return EmbeddingResult(...) # fallback +``` + +The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning. + +**Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears. + +**Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`. + +--- + +### Task 108: scene-close-on-cancel UX revisit + +**Files:** +- Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one) +- Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation + +**Spec:** + +This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output. + +**Action:** + +1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4). +2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't). +3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only. + +**Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change. + +**Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract. + +**Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`. + +--- + +## Wave 2 — Schema migration (single) + +### Task 109: 0014 schema migration + +**Files:** +- Create: `chat/db/migrations/0014_phase45_schema.sql` +- Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written) +- Modify: `tests/test_world.py` (bump schema_version assertion to 14) +- Modify: `tests/test_memory_write.py` (assert event_id populated) + +**Spec:** + +Two schema changes bundled into a single migration: + +1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically. + +2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting. + +**Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context. + +**Tests:** + +1. `test_schema_version_after_migration_is_14` (rename + bump from 13). +2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id. +3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable. + +**Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`. + +--- + +## Wave 3 — Drawer Phase 4.5 bundle (single) + +### Task 110: drawer polish + bulk significance re-rate + +**Files:** +- Modify: `chat/web/drawer.py` +- Modify: `chat/templates/_drawer.html` +- Create: `chat/templates/_delete_impact_modal.html` (extracted partial) +- Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind) +- Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests) + +**Spec (4 sub-fixes, 4 commits):** + +1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`. + +2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS. + +3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern. + +4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail). + +**Tests:** + +1. `test_delete_turn_with_event_id_zero_returns_400`. +2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"
" in response.content` or similar). +3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `