docs: add Phase 4.5 cleanup plan (all 24 backlog items)

16 tasks across 9 waves consolidating all 24 items in CLAUDE.md
Phase 4.5/5 backlog. Mix of:

- Wave 1 (parallel 6-way): trivial polish across 6 different files
- Wave 2 (single): schema migration 0014 (FK CASCADE + memories.event_id)
- Wave 3 (single): drawer bundle (event_id guard + html.escape + modal
  partial + bulk significance re-rate)
- Wave 4 (single): search UX (FTS snippet highlight + deep-link)
- Wave 5 (single): real embedding model swap (LLMClient.embed protocol)
- Wave 6 (single): branching read-side filter (riskiest — cross-cutting)
- Wave 7 (single): regenerate lifecycle rollback
- Wave 8 (single): sqlite-vec swap [ENVIRONMENTAL — may defer to Phase 5
  if Python rebuild / apsw not feasible]
- Wave 9 (parallel 3-way): structured fixture builder + integration tests + docs

Schema baseline 13 -> 14 (or 15 with T115). Big tasks (T112 real embed,
T113 branching filter, T114 lifecycle rollback) advance the engine
beyond Phase 4's metadata-only state. T115 environmental decision
captured in pre-flight; the other 13 tasks ship without it.

Uses task ids T103-T118 to avoid collision with prior phases.
This commit is contained in:
Joseph Doherty
2026-04-27 04:22:08 -04:00
parent df977fc985
commit a06f90a164
2 changed files with 747 additions and 0 deletions
@@ -0,0 +1,724 @@
# Roleplay Engine — Phase 4.5 Cleanup Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
**Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder).
**Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability.
**Tech Stack:**
- Existing — same as Phase 4.
- **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5.
**Source-of-truth references:**
- Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred).
- Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md).
- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status".
---
## Pre-flight
**Branch:** create `phase-4.5` from the latest `main`:
```bash
git checkout main && git pull && git checkout -b phase-4.5
```
**Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15.
**Optional pre-flight for T115 (sqlite-vec swap):**
The host Python build needs `enable_load_extension`. Two options:
1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv.
2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`).
If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected.
**Pinned non-negotiables (carried forward):**
- State changes go through the event log. Use `append_and_apply` for the live path.
- Witness filter every memory read at SQL level.
- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
- One commit per task minimum. Bundled tasks split internally.
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output.
---
## Backlog item → task mapping
24 items consolidated into 14 tasks by **file ownership**:
| # | Item | Source | Task |
|---|------|--------|------|
| 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) |
| 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** |
| 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** |
| 4 | Real embedding model swap | T91 / deferred | **T112** |
| 5 | `timeout_s` fallback-path logging | T91 | **T107** |
| 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** |
| 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** |
| 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** |
| 9 | `html.escape()` on delete-impact modal output | T98 | **T110** |
| 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** |
| 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** |
| 12 | Strict `kind` validation in snapshot routes | T99 | **T105** |
| 13 | `created_at` from file mtime — document drift risk | T99 | **T105** |
| 14 | Hardcoded `k=50` → module constant | T100 | **T106** |
| 15 | N+1 lookups in search results | T100 | **T106** |
| 16 | FTS highlighting via `snippet()` | T100 | **T111** |
| 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** |
| 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** |
| 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** |
| 20 | Bulk significance re-rate in drawer | deferred | **T110** |
| 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) |
| 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** |
| 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** |
| 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** |
---
## Parallel-Execution Strategy
Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves.
### How to dispatch a wave in parallel
Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4.5`.)
### After a wave completes
1. Each subagent returns its worktree path and commit SHA(s).
2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
4. **Run the full test suite** on the merged `phase-4.5`.
5. **Push `phase-4.5`** to gitea.
6. Optionally clean up worktrees.
### Conflict prevention checklist
For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`.
### Why each wave is parallel-safe
| Wave | Tasks | Hot files | Disjoint? |
|------|-------|-----------|-----------|
| 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ |
| 2 | T109 | new migration + minor projector update | (single task) |
| 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) |
| 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) |
| 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) |
| 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) |
| 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) |
| 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) |
| 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ |
---
## Task overview
```
Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning)
├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)
├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)
├─ T106: search.py polish (k constant + N+1 batched lookups)
├─ T107: embeddings.py timeout_s fallback-path logging
└─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test)
Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column)
Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)
Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id)
Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill)
Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range)
Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede)
Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight]
Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness)
├─ T117: Phase 4.5 cross-feature integration tests
└─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals
```
Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 28 are single-task by hot-file constraint.
---
## Wave 1 — Independent small fixes (parallel, 6 tasks)
All trivial, file-disjoint. Each is 1-line + 1-test or similar.
### Task 103: branches polish
**Files:**
- Modify: `chat/state/branches.py`
- Modify: `tests/test_branches_state.py`
**Spec (2 sub-fixes, single commit):**
1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped).
2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches.
**Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable).
**Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`.
---
### Task 104: state/memory.py polish
**Files:**
- Modify: `chat/state/memory.py`
- Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed)
**Spec (2 sub-fixes):**
1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper.
2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit).
**Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired.
**Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`.
---
### Task 105: snapshots.py polish
**Files:**
- Modify: `chat/web/snapshots.py`
- Modify: `tests/test_snapshot_ux.py` (1 new test)
**Spec (3 sub-fixes):**
1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`).
2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting.
3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note.
**Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/<id>` without `kind`; assert 400.
**Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`.
---
### Task 106: search.py polish
**Files:**
- Modify: `chat/web/search.py`
- Modify: `tests/test_search_ux.py` (1 new test)
**Spec (2 sub-fixes):**
1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site.
2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row.
**Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`.
**Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`.
---
### Task 107: embeddings.py timeout_s fallback-path logging
**Files:**
- Modify: `chat/services/embeddings.py`
- Modify: `tests/test_embeddings.py` (1 new test)
**Spec:**
When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade.
```python
if model != DEFAULT_EMBEDDING_MODEL:
_log.warning(
"generate_embedding: non-default model %r returned fallback "
"(model client.embed() not yet implemented in Phase 4.5+); "
"downstream search will degrade silently. Configure a supported model.",
model,
)
return EmbeddingResult(...) # fallback
```
The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning.
**Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears.
**Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`.
---
### Task 108: scene-close-on-cancel UX revisit
**Files:**
- Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one)
- Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation
**Spec:**
This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output.
**Action:**
1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4).
2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't).
3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only.
**Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change.
**Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract.
**Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`.
---
## Wave 2 — Schema migration (single)
### Task 109: 0014 schema migration
**Files:**
- Create: `chat/db/migrations/0014_phase45_schema.sql`
- Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written)
- Modify: `tests/test_world.py` (bump schema_version assertion to 14)
- Modify: `tests/test_memory_write.py` (assert event_id populated)
**Spec:**
Two schema changes bundled into a single migration:
1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically.
2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting.
**Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context.
**Tests:**
1. `test_schema_version_after_migration_is_14` (rename + bump from 13).
2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id.
3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable.
**Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`.
---
## Wave 3 — Drawer Phase 4.5 bundle (single)
### Task 110: drawer polish + bulk significance re-rate
**Files:**
- Modify: `chat/web/drawer.py`
- Modify: `chat/templates/_drawer.html`
- Create: `chat/templates/_delete_impact_modal.html` (extracted partial)
- Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind)
- Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests)
**Spec (4 sub-fixes, 4 commits):**
1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`.
2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS.
3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern.
4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail).
**Tests:**
1. `test_delete_turn_with_event_id_zero_returns_400`.
2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"<div class=\"delete-impact-modal\">" in response.content` or similar).
3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `<script>` in a description-bound field; render preview; assert it appears HTML-escaped.
4. `test_bulk_significance_re_rate_emits_manual_edit_per_memory` — seed 5 memories at significance 0; bulk re-rate to 2; assert 5 `manual_edit` events landed.
**Commits (4):**
- `fix: drawer delete_turn guards event_id <= 0 (T110.1)`
- `fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)`
- `refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)`
- `feat: drawer bulk significance re-rate per chat (T110.4)`
---
## Wave 4 — Search UX enhancements (single)
### Task 111: FTS highlighting + deep-link to turn
**Files:**
- Modify: `chat/services/cross_chat_search.py`
- Modify: `chat/web/search.py`
- Modify: `chat/templates/search.html`
- Modify: `tests/test_search_ux.py`
**Spec (2 sub-fixes, 2 commits):**
1. **FTS highlighting via `snippet()`** (T100 nit): replace the `pov_summary` column in `search_all_memories`'s SELECT with `snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)` to return a highlighted snippet around the match. The template renders this raw via `|safe` (the snippet is built by SQLite from indexed content; the `<mark>` tags are the only HTML, and SQLite escapes any HTML special chars in the source content).
2. **Deep-link to turn via memories.event_id** (T100 nit + T109 dependency): now that `memories.event_id` exists (from T109), each search result row knows the originating event id. The chat page uses turn-id stamping (Phase 3.5 T86 added `id="turn-{event_id}"`). Build result links as `/chats/{chat_id}#turn-{event_id}`. The chat page DOM scrolls to the anchor on load (browser default).
**Tests:**
1. `test_search_results_include_fts_snippet_with_highlight` — seed memory with text containing "rabbit"; search for "rabbit"; assert response body contains `<mark>rabbit</mark>` (or whatever marker the snippet uses).
2. `test_search_result_link_includes_turn_anchor` — seed memory with known event_id; search; assert link href contains `#turn-{event_id}`.
**Commits (2):**
- `feat: cross-chat search FTS snippet highlighting (T111.1)`
- `feat: cross-chat search deep-links to turn via memories.event_id (T111.2)`
---
## Wave 5 — Real embedding model (single)
### Task 112: Real embedding model swap
**Files:**
- Modify: `chat/llm/client.py` (Protocol — add `embed(text, model) -> list[float]` method)
- Modify: `chat/llm/featherless.py` (FeatherlessClient — implement `embed` against Featherless `/v1/embeddings` endpoint OR equivalent)
- Modify: `chat/llm/mock.py` (MockLLMClient — accept canned embedding vectors)
- Modify: `chat/services/embeddings.py` (route non-default model through `client.embed()`)
- Modify: `chat/config.py` (add `embedding_model: str` setting; default to current pseudo)
- Modify: `scripts/backfill_embeddings.py` (re-embed-all option for model swaps)
- Modify: `tests/test_embeddings.py` + `tests/test_llm_mock.py` + `tests/test_featherless.py` (if exists)
**Spec:**
Phase 4 ships a deterministic SHA-256 pseudo-embedding (deterministic but semantically meaningless). T112 wires the path for a real embedding model.
**Steps:**
1. **Extend `LLMClient` Protocol** with `async def embed(self, text: str, *, model: str) -> list[float]`.
2. **Implement on FeatherlessClient**: call the Featherless OpenAI-compatible `/v1/embeddings` endpoint:
```python
response = await self._http.post(
"/v1/embeddings",
json={"model": model, "input": text},
headers={"Authorization": f"Bearer {self._api_key}"},
)
data = response.json()
return data["data"][0]["embedding"]
```
Handle rate limits (existing 2-conn semaphore covers this).
3. **Implement on MockLLMClient**: `embed` pops a canned vector from a new `canned_embeddings` queue. Tests configure this queue.
4. **Update `generate_embedding`**: when `model != DEFAULT_EMBEDDING_MODEL`, call `client.embed(text, model=model)` instead of falling through to fallback. Wrap in try/except — failures fall back to zero vector (existing fallback path).
5. **Settings**: add `embedding_model: str = "pseudo-sha256-384"` to `Settings`. App reads this at startup; the embedding worker (`chat/services/embedding_worker.py`) passes it through.
6. **Backfill script**: add `--re-embed-all` flag that walks ALL memories (regardless of existing `embeddings_meta` rows) and re-embeds with the configured model. Useful for swapping models.
**Tests:**
1. `test_embed_routes_to_client_when_non_default_model` — mock client with canned vector; call `generate_embedding(model="bge-small-en-v1.5")`; assert vector matches the canned response.
2. `test_embed_falls_back_on_client_failure` — mock client to raise; assert returns zero vector with model="fallback".
3. `test_mock_llm_client_embed_pops_canned`.
4. `test_featherless_embed_calls_correct_endpoint` (if there's an existing featherless test pattern; otherwise mock the HTTP layer).
**Commits:**
- `feat: LLMClient Protocol gains embed() method (T112.1)`
- `feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)`
- `feat: generate_embedding routes non-default models through client.embed (T112.3)`
- `feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)`
---
## Wave 6 — Branching read-side filter (single, BIG)
### Task 113: Branching read-side filter
**Files (cross-cutting):**
- Modify: `chat/services/turn_common.py::read_recent_dialogue` — filter events to active branch's range
- Modify: `chat/services/scene_summarize.py::_read_recent_dialogue` (similar)
- Modify: `chat/state/memory.py::search_memories` — memories should be filtered to active branch (memories.event_id from T109 enables this)
- Modify: `chat/state/branches.py` — add helper `active_branch_event_ids(conn) -> tuple[int, int]` returning (origin, head)
- Add tests across multiple files
- Modify: `tests/test_branching.py` — add cross-feature tests
**Spec:**
Phase 4 T89 + T94 shipped branching as metadata-only (the table tracks branches; the drawer UI can switch). But event readers DON'T consult `is_active` — they read the entire event_log. So switching branches has no functional effect.
T113 wires the filter:
1. **Helper** `active_branch_event_ids(conn) -> tuple[int, int]`: returns `(origin_event_id, head_event_id)` for the currently active branch. For "main" with origin=0 + head=N, returns `(0, N)` meaning "all events visible".
2. **Apply filter** in every event reader that returns historical state:
- `read_recent_dialogue`: WHERE clause adds `id BETWEEN ? AND ?` (the active branch's range).
- `search_memories`: WHERE clause adds `m.event_id BETWEEN ? AND ?` (uses T109's column).
- `scene_summarize._read_recent_dialogue`: same as turn_common.
- Other readers TBD — grep for `event_log` SELECT patterns and audit each one.
3. **Branches that diverge**: when branch B is created from event 10 and then accumulates events 11-15 (which only exist on B's timeline), but main also accumulates 11-12, the events overlap by id range. This is OK because event reads filter by `id <= active_branch.head_event_id`. The simpler model: branches share event_log ids globally, but each branch's "head" defines which ids are visible.
4. **Events written under branch B** carry an implicit branch tag — but the event_log table has no `branch_id` column today. T113 punts on cross-branch event writes (they all land in the global log) and relies on the `head_event_id` filter to scope reads. This is a Phase 4.5+ first cut; full branch-isolated event_log is Phase 5+.
**Edge cases:**
- Active branch has `head_event_id = 0` (just created): readers return empty.
- No active branch: readers fall through to "all events visible" (defensive).
- Switching branches mid-flight: each `read_recent_dialogue` call re-queries `active_branch`, so it's always current. No caching.
**Tests:** 5+ minimum.
1. `test_read_recent_dialogue_respects_active_branch_head` — seed 10 events; active branch head = 5; assert only first 5 returned.
2. `test_search_memories_respects_active_branch_head` — same.
3. `test_branch_switch_changes_visible_events` — switch branches; immediately read; assert different result sets.
4. `test_main_branch_with_head_zero_returns_empty` — defensive.
5. `test_no_active_branch_falls_through_to_all_events` — defensive.
**Commit:** `feat: branching read-side filter — event readers consult active branch range (T113)`.
**This is the largest task in Phase 4.5.** Estimate 200-400 lines across multiple files. Implementer should split commits if it helps clarity (one per affected reader).
---
## Wave 7 — Lifecycle rollback in regenerate (single)
### Task 114: Lifecycle rollback
**Files:**
- Modify: `chat/services/regenerate.py`
- Modify: `chat/db/migrations/0014_phase45_schema.sql` (T109's migration) — add column? OR
- Add new migration — see decision below
- Modify: tests in `tests/test_regenerate.py`
**Spec:**
Phase 3.5 T83.4 shipped a warning log when regenerate detects un-rolled-back lifecycle transitions. T114 implements actual rollback.
**Schema decision:**
Option A: extend lifecycle event payloads with `triggered_by_assistant_turn_id` (no schema change needed — just a payload convention). Production code (T61 turn flow) populates it when emitting `event_started`/`event_completed`/`event_cancelled`. Existing rows have NULL — rollback skips them with a debug log.
Option B: add a column to `event_log` for stronger invariants. Significant migration cost.
**Recommended:** Option A. Safer, no migration, backward compatible (older events skip rollback). Document in commit body.
**Rollback semantics:**
When regenerate detects lifecycle events triggered by the superseded turn:
- `event_started` → emit `event_cancelled` (or a NEW `event_started_undone` event kind that reverts status to "planned") with the same event_id.
- `event_completed` → emit `event_uncompleted` (NEW event kind that reverts status from "completed" to "active").
- `event_cancelled` → emit `event_uncancelled` (reverts to prior status — which we'd need to track; or simpler: emit `event_started` again to restore "active").
**Simpler approach (recommended):** add ONE new event kind `event_status_reverted` with payload `{event_id, prior_status}`. The projector sets `events.status = prior_status` for the event_id. Rollback emits this event for each affected lifecycle transition, looking up the prior status from the row's history (via event_log scan) or accepting it as a payload field.
**Production code change:** in `chat/web/turns.py::post_turn` (and `chat/services/regenerate.py`), when emitting `event_started`/`event_completed`/`event_cancelled`, populate `triggered_by_assistant_turn_id: <id>` in the payload. Forward-only — older code doesn't need updating.
**Tests:** 3 minimum.
1. `test_regenerate_rolls_back_event_started_from_superseded_turn` — seed an event; play a turn that starts it; regenerate; assert `event_status_reverted` event landed with `prior_status="planned"` and the events row is back to "planned".
2. `test_regenerate_rolls_back_event_completed_to_active` — same but completed → active rollback.
3. `test_regenerate_skips_events_without_back_reference` — older events without `triggered_by_assistant_turn_id` are not rolled back (debug log). Pin the backward-compat behavior.
**Commits:**
- `feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)`
- `feat: event_status_reverted event kind + projector handler (T114.2)`
- `feat: regenerate rolls back lifecycle transitions on supersede (T114.3)`
---
## Wave 8 — sqlite-vec swap (single, ENVIRONMENTAL)
### Task 115: sqlite-vec swap (optional)
**Files:**
- Create: `chat/db/migrations/0015_vec0_virtual_tables.sql`
- Modify: `chat/db/connection.py` (load extension on every connection)
- Modify: `chat/services/vector_search.py` (rewrite to use vec0 MATCH instead of pure-Python cosine)
- Modify: `chat/state/embeddings.py` (writer needs to populate vec0 table)
- Modify: `pyproject.toml` (add `sqlite-vec` dependency)
**Pre-flight:**
This task REQUIRES one of:
- Python rebuilt with `--enable-loadable-sqlite-extensions` (pyenv reinstall).
- `apsw` migration of `chat/db/connection.py`.
If neither is feasible at the time of execution: SKIP THIS TASK and document the deferral in T118 docs sweep. The other 13 Phase 4.5 tasks ship without it.
**Spec:**
1. **Migration** `0015_vec0_virtual_tables.sql`:
```sql
CREATE VIRTUAL TABLE embeddings_vec USING vec0(
memory_id INTEGER PRIMARY KEY,
embedding FLOAT[384]
);
-- Backfill from existing JSON embeddings table.
INSERT INTO embeddings_vec (memory_id, embedding)
SELECT memory_id, vec_f32(vector_json) FROM embeddings;
```
2. **`chat/db/connection.py`** loads `sqlite_vec` extension on every connection:
```python
import sqlite_vec
def open_db(...):
conn = sqlite3.connect(...)
conn.enable_load_extension(True)
sqlite_vec.load(conn)
conn.enable_load_extension(False)
...
```
3. **Rewrite `vector_search.py`** to use `embeddings_vec MATCH ?` syntax with `k=?` clause:
```sql
SELECT m.id, m.pov_summary, m.significance, e.distance
FROM embeddings_vec e
JOIN memories m ON m.id = e.memory_id
WHERE e.embedding MATCH ? AND k = ?
AND m.owner_id = ?
AND m.witness_<role> = 1
ORDER BY e.distance ASC
LIMIT ?
```
4. **HNSW note**: vec0 supports both flat (default) and HNSW indexes. T115 ships flat (sufficient for < few thousand memories). Document HNSW upgrade path in CLAUDE.md if memory counts ever grow past pure-Python feasibility.
5. **Old `embeddings` JSON table**: keep alongside `embeddings_vec` (data redundancy is fine; the JSON table is the source of truth and `embeddings_vec` is the index). Backfill on migration. Keep the `embedding_indexed` projector populating both.
**Tests:** rewrite `tests/test_vector_search.py` to expect new behavior. Same observable contract — only implementation changes. All 5 existing tests should pass post-swap.
**Commit:** `feat: sqlite-vec swap (vec0 virtual tables + MATCH-based search) (T115)`.
---
## Wave 9 — Polish (parallel, 3 tasks)
### Task 116: Structured test-fixture builder
**Files:**
- Create: `tests/fixtures.py` (or extend `tests/conftest.py`)
- Modify: existing test files that use brittle canned-queue arrays (selectively)
**Spec:**
Phase 3 carry-over. Tests across `test_turn_flow.py`, `test_meanwhile_turn_flow.py`, `test_phase3_integration.py`, `test_phase4_integration.py` use positional canned-response arrays for `MockLLMClient`. Adding a new classifier call to a code path requires updating canned arrays in many tests.
**Solution:** structured fixture builder that lets tests declare their classifier expectations by name, not position:
```python
# tests/fixtures.py
class CannedQueue:
def __init__(self):
self._queue = []
def parse_turn(self, **fields): ...
def state_update(self, **fields): ...
def detect_scene_close(self, should_close: bool): ...
def detect_event_transitions(self, transitions: list[dict]): ...
def summarize_scene(self, summary: str, **fields): ...
def detect_threads(self, candidates: list[dict]): ...
# ... one method per classifier service
def build(self) -> list[str]:
return [json.dumps(item) for item in self._queue]
```
Usage:
```python
def test_post_turn_with_event_transition(...):
canned = (
CannedQueue()
.parse_turn(intent="narrative")
.narrative("BotA speaks.") # narrative is a stream, but for simplicity treat it like a canned response
.state_update(affinity_delta=0, trust_delta=0)
.state_update(affinity_delta=0, trust_delta=0)
.detect_event_transitions([{"event_id": "evt_1", "new_status": "completed"}])
.detect_scene_close(should_close=False)
.build()
)
mock = MockLLMClient(canned=canned)
# ...
```
**Migration scope:** don't migrate ALL existing tests at once — that's a separate massive refactor. Instead, ship the fixture builder + migrate 2-3 representative tests as proof of concept. Document the migration path in the fixture's docstring.
**Tests:** the fixture builder itself doesn't need extensive testing — it's just a builder. Add 1-2 sanity tests that the JSON output matches expected shapes.
**Commit:** `test: structured CannedQueue fixture builder for classifier mocks (T116)`.
---
### Task 117: Phase 4.5 cross-feature integration tests
**Files:**
- Create: `tests/test_phase45_integration.py`
**Spec:**
End-to-end multi-feature flows specific to Phase 4.5 changes. 5 tests minimum.
1. **Real embedding swap + retrieval** — configure `embedding_model="bge-small-en-v1.5"` (mocked); write a memory; backfill or wait for worker; assert vector search returns the memory via `client.embed`-derived vector (not pseudo).
2. **Branching read-side filter end-to-end** — create a branch from turn 5; switch; play 3 turns on the branch; switch back to main; assert main's recent dialogue is missing the branch turns (read filter respects active branch's head).
3. **Lifecycle rollback** — start an event via a turn; regenerate that turn; assert lifecycle reverted (event back to "planned").
4. **Search deep-link** — write memories; search; click a result; verify the chat page renders with the right turn anchored (assert via TestClient response — either the browser anchor OR a server-side scroll-to-anchor mechanism).
5. **Bulk significance re-rate end-to-end** — seed 5 memories at significance 0; bulk re-rate via drawer; verify significance histogram updates.
**Commit:** `test: phase 4.5 cross-feature integration coverage (T117)`.
---
### Task 118: Phase 4.5 documentation update
**Files:**
- Modify: `CLAUDE.md`
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 entries — though they're already shipped per Phase 4 T102)
**Spec:**
Mirror the Phase 3.5 / 2.5 status sections. Document:
- All shipped items per task (T103T117).
- Empty out the Phase 4.5 / 5 backlog (replace with single "All items shipped" line).
- Add new "Phase 5 backlog" section if any Phase 4.5 reviews surfaced new follow-ups.
**Phase 5 backlog candidates** (default, if no new follow-ups discovered):
- Vector index optimization (HNSW) when memory counts grow past flat-index feasibility.
- Branch-isolated event_log (each branch has its own physical event_log range vs the current shared id space + head filter).
- Embedding model swap migration tooling — when changing models, need to re-embed everything; T112 added `--re-embed-all` but a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
- Real-time collaborative branching (multi-user) — out of scope for v1.
- Avatars / portraits (multimodality) — deferred indefinitely per design §14.
**Commit:** `docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118)`.
---
## Wrap-up
After Wave 9 lands:
1. **Run full suite** on `phase-4.5`: should be ~430+ tests passing (413 from Phase 4 + ~20 new across Phase 4.5).
2. **Manual smoke** (recommended before opening the PR):
- Configure `embedding_model="bge-small-en-v1.5"` (or whatever real model is chosen); restart server; play a turn; verify `embedding_indexed` events use the real model and search returns semantically-relevant memories.
- Create a branch, switch, play turns, switch back — verify main's history is unaffected.
- Plan an event, complete it via a turn, regenerate that turn — verify event reverts to "planned".
- Use cross-chat search; click a result; verify it lands on the right turn in the chat page.
- Bulk re-rate a chat's significance distribution.
3. **Push `phase-4.5`** to gitea.
4. **Open PR** `phase-4.5 → main`.
---
## Notes for the controller running this plan
- **T115 (sqlite-vec swap)** is environmental. If pre-flight fails (no rebuilt Python, no apsw), defer to Phase 5 and ship Phase 4.5 with 13 tasks. T118 docs sweep should note the deferral.
- **T112 (real embedding swap)** assumes Featherless or similar exposes an `/v1/embeddings` endpoint. If not available, document the gap and ship the Protocol + Mock impl only (Featherless impl deferred). The pseudo path remains the default in that case — same as Phase 4.
- **T113 (branching read-side filter)** is the riskiest task. Cross-cutting. Land it on a quiet branch, test thoroughly. If integration tests break in unexpected ways, bisect the affected reader and add coverage.
- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T103T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
- **Token-spend rough estimate**: Phase 4.5 should be ~50% the size of Phase 4 (similar number of tasks, mostly smaller). Big tasks (T112, T113, T114) bring the per-task spend up but parallelism in Wave 1 + Wave 9 brings the wall-clock down.
- **DO NOT break existing v1/v2/v3/v3.5/v4 surface contracts.** Every test file that was green at the start of Phase 4.5 must stay green at the end. The cross-feature integration tests (`tests/test_phase4_integration.py`, `tests/test_phase3_integration.py`) are particularly load-bearing.
@@ -0,0 +1,23 @@
{
"planPath": "docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md",
"tasks": [
{"id": 103, "subject": "T103: branches polish (global-leak doc + branch-switch warning)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 104, "subject": "T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 105, "subject": "T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 106, "subject": "T106: search.py polish (k constant + N+1 batched lookups)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 107, "subject": "T107: embeddings.py timeout_s fallback-path logging", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 108, "subject": "T108: scene-close-on-cancel UX revisit (regression test pin + rationale doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 109, "subject": "T109: 0014 schema migration (FK CASCADE + memories.event_id column)", "status": "pending", "wave": 2, "parallelGroup": null},
{"id": 110, "subject": "T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [109]},
{"id": 111, "subject": "T111: search UX (FTS snippet highlighting + deep-link via memories.event_id)", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [109]},
{"id": 112, "subject": "T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + routing)", "status": "pending", "wave": 5, "parallelGroup": null},
{"id": 113, "subject": "T113: branching read-side filter (event readers consult is_active branch range)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [109]},
{"id": 114, "subject": "T114: regenerate lifecycle rollback (back-reference + event_status_reverted)", "status": "pending", "wave": 7, "parallelGroup": null},
{"id": 115, "subject": "T115: sqlite-vec swap (vec0 virtual tables + MATCH search) [ENVIRONMENTAL — may defer]", "status": "pending", "wave": 8, "parallelGroup": null},
{"id": 116, "subject": "T116: structured CannedQueue test fixture builder", "status": "pending", "wave": 9, "parallelGroup": "wave-9"},
{"id": 117, "subject": "T117: phase 4.5 cross-feature integration tests", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]},
{"id": 118, "subject": "T118: phase 4.5 docs sweep — prune backlog, capture phase 5 candidates", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]}
],
"lastUpdated": "2026-04-27T00:00:00Z",
"notes": "16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Wave 1 (6-way parallel) and Wave 9 (3-way parallel) maximize parallelism. Waves 2-8 are single-task by hot-file constraint. T115 (sqlite-vec swap) requires Python rebuild OR apsw migration — environmental; may defer to Phase 5. Schema baseline 13 -> 14 (T109's 0014) -> optionally 15 (T115's 0015). Big tasks: T112 (real embedding swap), T113 (branching read-side filter — riskiest), T114 (lifecycle rollback). Uses task ids T103-T118."
}