From bffd9a2f384633e272ca2633c0938024f702a248 Mon Sep 17 00:00:00 2001
From: Joseph Doherty <dohejw01@gmail.com>
Date: Mon, 27 Apr 2026 02:03:08 -0400
Subject: [PATCH] docs: add Phase 4 implementation plan (vector retrieval +
 branching + polish)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

15 tasks across 8 waves landing the Phase 4 deliverables per
requirements doc §13 + §14:

- Vector retrieval via sqlite-vec (new external dependency)
- Branching UI (event log forks)
- Drawer-edit on every field (significance review, hide-from-view,
  surgical delete with cascade preview, branching affordances)
- Backup tooling (snapshot UX surface)
- Cross-chat search

Plus the 3 Phase 3.6 carry-over fixes (T90 bundle).

Wave structure:
- W1 (parallel 3-way): schema foundation + carry-overs
- W2 (parallel 3-way): embedding/search services
- W3 (parallel 2-way): branching + delete services
- W4 (single): combined retrieval ranking
- W5 (single): memory write hook + backfill
- W6 (single): drawer Phase 4 bundle (5 sub-features)
- W7 (parallel 2-way): snapshot UX + cross-chat search UX
- W8 (parallel 2-way): integration tests + docs

External dependency: sqlite-vec must be installed BEFORE Wave 1.
Embedding model choice (384-dim default) pinned in T91 before dispatch
since the migration hardcodes the dimension.

Schema baseline: 11 -> 13 (adds 0012_embeddings.sql + 0013_branches.sql).
Task ids T88-T102 to avoid collision with prior phases.
---
 .../2026-04-27-v4-phase4-implementation.md    | 832 ++++++++++++++++++
 ...-27-v4-phase4-implementation.md.tasks.json |  22 +
 2 files changed, 854 insertions(+)
 create mode 100644 docs/plans/2026-04-27-v4-phase4-implementation.md
 create mode 100644 docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json

diff --git a/docs/plans/2026-04-27-v4-phase4-implementation.md b/docs/plans/2026-04-27-v4-phase4-implementation.md
new file mode 100644
index 0000000..fb89991
--- /dev/null
+++ b/docs/plans/2026-04-27-v4-phase4-implementation.md
@@ -0,0 +1,832 @@
+# Roleplay Engine — Phase 4 Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
+
+**Goal:** Land Phase 4 polish per requirements doc §13 + §14: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view soft delete, plus cross-chat search and the small Phase 3.6 carry-over fixes.
+
+**Architecture:** Builds on Phase 3.5's stable base. Two new tables (`embeddings`, `branches`) and one external dependency (sqlite-vec extension). Embedding generation runs as a deferred async job — NOT inline with turns — so the play loop stays fast even when the embedding endpoint is slow. Branching is data-model-only at first (events + selectors); UI grafts on top. Surgical delete + cascade preview reuses the existing rewind-and-supersede plumbing. Cross-chat search piggybacks on the existing FTS5 + (now) vector retrieval.
+
+**Tech Stack:**
+
+- **NEW dependency: `sqlite-vec`** (or `sqlite-vss` — Phase 4 picks; recommended `sqlite-vec` for simpler load semantics and active maintenance). Add to `pyproject.toml`.
+- **Embedding model selection** is part of T91 spec. Recommended default: a small model on Featherless (e.g., `BAAI/bge-small-en-v1.5` if available) or a local CPU-friendly model via `sentence-transformers`. Document choice in CLAUDE.md.
+- Same as Phase 3 otherwise (Python 3.11+, FastAPI, HTMX, SQLite).
+
+**Source-of-truth references:**
+
+- Phase 4 scope: requirements doc §13 "Phase 4 — polish" + §14 "Open / Deferred Decisions".
+- Behavioral details: §6 (prompt assembly + retrieval), §10 (rewind / regenerate / reset), §11 (compression + significance), §12 (snapshots).
+- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 3 status" + §"Phase 3.5 status".
+- Phase 3.5 cleanup plan (style, file-bundling pattern): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md).
+
+---
+
+## Pre-flight
+
+**Branch:** create `phase-4` from the latest `main` after Phase 3.5 has merged (it has — main is at `1b66a28`):
+
+```bash
+git checkout main && git pull && git checkout -b phase-4
+```
+
+**Schema baseline:** Phase 3.5 leaves the DB at version 11. Phase 4 adds two migrations: `0012_embeddings.sql` and `0013_branches.sql`. Final schema version: 13.
+
+**External dependency setup (BEFORE T88 dispatch):**
+
+The controlling agent should add `sqlite-vec` to `pyproject.toml` and run `pip install -e .` (or equivalent) so all worktrees pick up the new dependency. Confirm `sqlite_vec` imports cleanly:
+
+```bash
+python -c "import sqlite_vec; print(sqlite_vec.__version__)"
+```
+
+If `sqlite_vec` isn't on PyPI when this plan executes, fall back to `sqlite-vss` and adapt T88/T92 accordingly. Both expose vector-search SQL via a loadable extension.
+
+**Pinned non-negotiables (carried forward):**
+
+- State changes go through the event log. Use `append_and_apply(conn, kind, payload)` for the live path; `apply_event` only after a fresh `append_event` returning the new id.
+- Witness filter every memory read at SQL level (hard `WHERE` constraint; never a soft signal).
+- Per-POV scene summaries — never write omniscient narration.
+- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
+- One commit per task minimum. Tasks that bundle multiple sub-features SHOULD split commits internally.
+
+**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. Don't assume green.
+
+---
+
+## Phase 3.6 carry-overs folded in
+
+Three small items from Phase 3.6 backlog are bundled into Phase 4's Wave 1 trivial-fixes task (T90):
+
+1. `read_recent_dialogue` chat-id pushdown into SQL (T80 review nit)
+2. Lifecycle warning wording in regenerate (T83.4 — "at-or-after turn X" tightening)
+3. Legacy single-bot `record_turn_memory` consolidation (T84 review nit)
+
+Three items remain DEFERRED beyond Phase 4 (Phase 4.5 if needed):
+
+- Scene-close-on-cancel UX revisit (no action unless real play surfaces a regression).
+- Cross-feature canned-queue brittleness (structured fixture builder for tests — not blocking).
+- Full lifecycle-rollback in regenerate (warning log already shipped in T83.4; proper rollback needs schema-level back-references, deferred indefinitely).
+
+---
+
+## Parallel-Execution Strategy
+
+Same pattern as Phase 3.5. Eight waves: parallel within each wave (file-disjoint), serial across waves.
+
+### How to dispatch a wave in parallel
+
+Use the **Agent tool with `isolation: "worktree"`** so each subagent gets its own git worktree. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4` from inside the chat repo.)
+
+Dispatch all tasks in a wave in a single message:
+
+```
+Agent({ description: "Wave 1 — T88 embeddings table", prompt: "...", isolation: "worktree" })
+Agent({ description: "Wave 1 — T89 branches table", ... })
+Agent({ description: "Wave 1 — T90 phase 3.6 carry-overs", ... })
+```
+
+### After a wave completes
+
+1. Each subagent returns its worktree path and commit SHA(s).
+2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval tasks (T91, T92, T96, T97) since the integration surface is wider.
+3. **Merge the wave into `phase-4`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
+4. **Run the full test suite** on the merged `phase-4`. If red, the wave's mutual-independence assumption was violated — bisect, fix, re-merge.
+5. **Push `phase-4`** to gitea.
+6. Optionally clean up worktrees.
+
+### Conflict prevention checklist
+
+For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan: `chat/web/drawer.py` + `chat/templates/_drawer.html` (T98 only — bundled), `chat/state/memory.py` (T96 only), `chat/services/memory_write.py` (T90 + T97 — sequential), `chat/web/turns.py` (T98 only via delete affordance — sequential after T96).
+
+### Why each wave is parallel-safe
+
+| Wave | Tasks | Hot files touched | Disjoint? |
+|------|-------|-------------------|-----------|
+| 1 | T88, T89, T90 | new migrations + new state modules; T90 touches `turn_common.py` + `regenerate.py` + `memory_write.py` (additive only) | ✅ |
+| 2 | T91, T92, T93 | new service modules (embeddings, vector_search, cross_chat_search) | ✅ |
+| 3 | T94, T95 | new service modules (branching, delete_impact) | ✅ |
+| 4 | T96 | `chat/state/memory.py` (combined retrieval ranking) | (single task) |
+| 5 | T97 | `chat/services/memory_write.py` + new backfill script | (single task) |
+| 6 | T98 | `chat/web/drawer.py` + `chat/templates/_drawer.html` (drawer Phase 4 bundle) | (single task) |
+| 7 | T99, T100 | new files: `chat/web/snapshots.py` + `chat/templates/snapshots.html` (T99); `chat/web/search.py` + `chat/templates/search.html` + small chat.html top-bar addition (T100) | ✅ (disjoint) |
+| 8 | T101, T102 | new test file (T101); CLAUDE.md + design doc (T102) | ✅ |
+
+---
+
+## Task overview
+
+```
+Wave 1 ─┬─ T88: embeddings table + projector handlers
+        ├─ T89: branches table + projector handlers
+        └─ T90: Phase 3.6 carry-overs trio (chat-id SQL pushdown + lifecycle wording + legacy-fn consolidation)
+
+Wave 2 ─┬─ T91: embedding generation service (Featherless or local)
+        ├─ T92: vector search service via sqlite-vec
+        └─ T93: cross-chat search service (FTS over all owners)
+
+Wave 3 ─┬─ T94: branch_from_event service (event-log fork, branch metadata)
+        └─ T95: delete-impact computation service (cascade preview)
+
+Wave 4 ─── T96: combined FTS + vector retrieval ranking in search_memories
+
+Wave 5 ─── T97: memory_write enqueues embedding job + backfill script for existing memories
+
+Wave 6 ─── T98: drawer Phase 4 bundle — branching UI + significance review + hide-from-view + surgical delete + remaining v1 edits
+
+Wave 7 ─┬─ T99: snapshot UX (manual trigger, retention display, restore-from-snapshot UI)
+        └─ T100: cross-chat search UX (top-bar input + search results page)
+
+Wave 8 ─┬─ T101: cross-feature integration tests (vector × branching × delete × snapshot × search)
+        └─ T102: Phase 4 documentation update
+```
+
+Critical path: 8 sequential merge points. Total tasks: 15. Parallelism: Waves 1, 2, 3, 7, 8 dispatch concurrently (3-way and 2-way). Waves 4, 5, 6 are single-task by hot-file constraint.
+
+---
+
+## Wave 1 — Schema foundation + Phase 3.6 carry-overs (parallel)
+
+### Task 88: Embeddings table + projector handlers
+
+**Files:**
+
+- Create: `chat/db/migrations/0012_embeddings.sql`
+- Create: `chat/state/embeddings.py`
+- Create: `tests/test_embeddings_state.py`
+- Modify: `pyproject.toml` (add `sqlite-vec` dependency — controlling agent should pre-install before dispatch; the worktree commits the dependency declaration)
+
+**Spec:**
+
+Adds the `embeddings` table that stores per-memory embedding vectors for vector retrieval. Uses `sqlite-vec` virtual-table syntax for cosine-similarity search. Schema:
+
+```sql
+-- Load sqlite-vec extension at connection time (handled in chat/db/connection.py).
+-- Embeddings are stored as blobs in a vec0 virtual table for fast similarity search.
+
+CREATE VIRTUAL TABLE embeddings USING vec0(
+    memory_id INTEGER PRIMARY KEY,
+    embedding FLOAT[384]   -- 384-dim default; adjust per chosen model
+);
+
+-- Sidecar table for non-vector metadata (model used, dim, indexed_at).
+CREATE TABLE embeddings_meta (
+    memory_id INTEGER PRIMARY KEY,
+    model TEXT NOT NULL,
+    dim INTEGER NOT NULL,
+    indexed_at TEXT NOT NULL DEFAULT (datetime('now')),
+    FOREIGN KEY (memory_id) REFERENCES memories(id)
+);
+```
+
+(If `sqlite-vss` is chosen instead, replace `vec0` with `vss0` and adapt the dim declaration. Both have similar Python loading semantics.)
+
+**`chat/state/embeddings.py`:**
+
+- `@on("embedding_indexed")` payload `{memory_id, model, dim, vector: list[float]}`. Inserts into both `embeddings` and `embeddings_meta`. Idempotent via `INSERT OR REPLACE` (re-indexing a memory replaces the prior vector).
+- `@on("embedding_deindexed")` payload `{memory_id}`. Deletes from both tables. Used when a memory is purged via reset/cascade.
+- Reader `get_embedding_meta(conn, memory_id) -> dict | None` returns the meta row.
+
+The `chat/db/connection.py` `open_db` helper needs to load the sqlite-vec extension on each connection. Add:
+
+```python
+import sqlite_vec
+# Inside open_db, after connection is opened:
+conn.enable_load_extension(True)
+sqlite_vec.load(conn)
+conn.enable_load_extension(False)
+```
+
+This is a small modification to `connection.py`. Include it in T88's diff.
+
+**Tests:** 3 minimum.
+
+1. `test_embedding_indexed_inserts_row`: append `bot_authored`, `chat_created`, `memory_written` (creates a memory), then `embedding_indexed` with `vector=[0.1] * 384`. Project. Assert `embeddings_meta` row exists for that memory_id with the right model.
+2. `test_embedding_deindexed_removes_row`: same setup; index then de-index; assert row is gone.
+3. `test_vector_similarity_search_returns_nearest`: index two memories with distinct vectors; query for nearest neighbor of one vector; assert correct memory_id returned. Uses `sqlite-vec`'s `MATCH '...'` syntax (verify against actual sqlite-vec docs; adapt if needed).
+
+If running tests requires sqlite-vec to be loaded, the test fixture may need to skip / xfail when the extension isn't installed. Use `pytest.importorskip("sqlite_vec")` at the top of the test file.
+
+**Commit:** `feat: embeddings table + projector handlers via sqlite-vec (T88)`.
+
+**Notes:**
+
+- Schema version after migration alone: 12. T89 adds 0013, taking final to 13. The schema_version assertion in `tests/test_world.py` updates to 13 in the wave-merge step.
+- The `connection.py` change is small but cross-cutting — affects every `open_db` call. Verify the existing 343 tests still pass after the change.
+
+---
+
+### Task 89: Branches table + projector handlers
+
+**Files:**
+
+- Create: `chat/db/migrations/0013_branches.sql`
+- Create: `chat/state/branches.py`
+- Create: `tests/test_branches_state.py`
+
+**Spec:**
+
+Adds the `branches` table that records named alternate event-log forks. A branch is metadata: a name, an `origin_event_id` (the event we forked from), and a `head_event_id` (the latest event in this branch). The event log itself is unchanged — the branch table just **labels** linear ranges of event ids.
+
+```sql
+CREATE TABLE branches (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL UNIQUE,
+    origin_event_id INTEGER NOT NULL,
+    head_event_id INTEGER NOT NULL,
+    chat_id TEXT,
+    created_at TEXT NOT NULL DEFAULT (datetime('now')),
+    is_active INTEGER NOT NULL DEFAULT 0
+);
+
+-- Exactly one row may have is_active = 1 at any time.
+CREATE UNIQUE INDEX branches_active_idx ON branches(is_active) WHERE is_active = 1;
+```
+
+The "main" branch is implicit and bootstrapped by the migration: `INSERT INTO branches (name, origin_event_id, head_event_id, is_active) VALUES ('main', 0, 0, 1);`. Subsequent branches reference an `origin_event_id` (the event that the branch forked from).
+
+`chat/state/branches.py`:
+
+- `@on("branch_created")` payload `{name, origin_event_id, chat_id?, head_event_id}`. Inserts a new row with `is_active=0`. Idempotent re-insertion via `INSERT OR IGNORE`.
+- `@on("branch_switched")` payload `{name}`. Sets `is_active=1` on the named branch and `is_active=0` on all others. Atomic via a single UPDATE.
+- `@on("branch_head_updated")` payload `{name, head_event_id}`. Updates `head_event_id` on the named branch. Used by the orchestrator when new events extend the branch.
+- Readers: `get_branch(conn, name)`, `list_branches(conn, chat_id=None)`, `active_branch(conn)`.
+
+**Tests:** 3 minimum.
+
+1. `test_branch_created_inserts_row`: append `branch_created` with name="experiment", origin_event_id=42; project; assert `get_branch(conn, "experiment")` returns the row.
+2. `test_branch_switched_atomic`: seed two branches; switch from one to the other; assert exactly one is active.
+3. `test_main_branch_bootstrapped_by_migration`: open a fresh DB, apply migrations; assert `active_branch(conn)["name"] == "main"`.
+
+**Commit:** `feat: branches table + projector handlers (T89)`.
+
+**Notes:**
+
+- Schema version after this migration alone: 13. Combined with T88: 13 (since T88 was 12, T89 stacks). Wave-merge bumps `tests/test_world.py` schema_version assertion to 13.
+- This task does NOT yet teach the orchestrator to consult `is_active` — the existing event_log queries assume a single timeline. T98 (drawer branching UI) will enable user-driven switches, but the actual "follow only the active branch" filter on event reads is a follow-up (Phase 4.5 nit; document in T102 docs sweep).
+
+---
+
+### Task 90: Phase 3.6 carry-overs trio
+
+**Files:**
+
+- Modify: `chat/services/turn_common.py` (push chat_id filter into SQL)
+- Modify: `chat/services/regenerate.py` (lifecycle warning wording tightening)
+- Modify: `chat/services/memory_write.py` (consolidate legacy `record_turn_memory` into the unified API or delete it)
+- Modify: `tests/test_turn_common.py`, `tests/test_regenerate.py`, `tests/test_memory_write.py`
+
+**Spec:** Three small Phase 3.6 carry-over fixes bundled because each is 1-line + 1-test.
+
+#### 90.1 — `read_recent_dialogue` chat-id SQL pushdown
+
+Per T80 review nit. Currently `read_recent_dialogue` filters chat_id post-fetch in Python. Push into SQL for tighter LIMIT semantics:
+
+```sql
+SELECT id, kind, payload_json
+FROM event_log
+WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn')
+  AND superseded_by IS NULL
+  AND hidden = 0
+  AND json_extract(payload_json, '$.chat_id') = ?
+ORDER BY id DESC
+LIMIT ?
+```
+
+Then the post-fetch loop becomes a simple reverse + slice — no chat_id check needed.
+
+**Test added:** `test_read_recent_dialogue_limit_respects_chat_scope` — seed two chats with 60 turns each; query chat_a with `limit=50`; assert returned rows are exactly 50 chat_a rows (not 50 cross-chat rows that filter down to <50 after Python).
+
+**Commit:** `perf: read_recent_dialogue pushes chat-id filter into SQL (T90.1)`.
+
+#### 90.2 — Lifecycle warning wording tightening
+
+Per T83.4 review nit. Current warning lists "lifecycle transitions from superseded turn are NOT being rolled back". When user regenerates an OLDER turn (T29 supports this), the warning lists intervening-turn transitions that legitimately stand. Tighten wording to "lifecycle transitions at-or-after turn X" so operators reading logs aren't misled.
+
+Change is one log message string. Test asserts the new wording appears.
+
+**Commit:** `chore: clarify regenerate lifecycle warning wording (T90.2)`.
+
+#### 90.3 — Legacy `record_turn_memory` consolidation
+
+Per T84 review nit. The original Phase 1 single-bot `record_turn_memory` function still exists alongside the unified `record_turn_memory_for_present`. Either:
+
+- (a) Remove the legacy function entirely; update any remaining callers to use the unified API.
+- (b) Convert it to a thin wrapper for backward compat.
+
+Pick (a) if there are zero remaining callers; (b) if any callers exist. Read the codebase to confirm. The mock-data seed scripts may still use the legacy fn.
+
+**Commit:** `refactor: consolidate legacy record_turn_memory into unified API (T90.3)`.
+
+**TDD process for T90:**
+
+1. Read all 3 affected files + their tests.
+2. Implement 90.1 with test; commit.
+3. Implement 90.2 with test; commit.
+4. Implement 90.3 with test; commit.
+5. Run full suite — should be 343 + 3 = 346 (or +2 if 90.3 had no behavioral change).
+
+---
+
+## Wave 2 — Embedding & search services (parallel)
+
+Three new service modules. Fully file-disjoint.
+
+### Task 91: Embedding generation service
+
+**Files:**
+
+- Create: `chat/services/embeddings.py`
+- Create: `tests/test_embeddings.py`
+
+**Spec:** Wraps the embedding API call. Signature:
+
+```python
+class EmbeddingResult(BaseModel):
+    vector: list[float]
+    model: str
+    dim: int
+
+async def generate_embedding(
+    client: LLMClient,    # or a separate embedding-specific client
+    *,
+    text: str,
+    model: str,
+    timeout_s: float = 30.0,
+) -> EmbeddingResult:
+    """Generate an embedding vector for the given text. Falls back to a
+    zero-vector with model='fallback' on failure (so callers get a deterministic
+    sentinel they can detect and skip indexing)."""
+```
+
+**Implementation:** call the embedding endpoint (Featherless OpenAI-compatible `/v1/embeddings`, or a local `sentence-transformers` model). Add a new method `client.embed(text, model)` to `LLMClient` Protocol (and to `MockLLMClient` and `FeatherlessClient`).
+
+**Embedding model choice:**
+
+Default to a small CPU-friendly model accessible through the existing Featherless setup:
+
+- If Featherless has `BAAI/bge-small-en-v1.5` or similar 384-dim model: use that.
+- If not: fall back to local `sentence-transformers/all-MiniLM-L6-v2` (384-dim, runs CPU). Add `sentence-transformers` to `pyproject.toml`.
+- Document choice in CLAUDE.md (T102 docs sweep).
+
+The 384 dim is hardcoded in T88's migration. If a different model with different dim is chosen, update T88's schema accordingly BEFORE T88 dispatches.
+
+**Tests:** 3 minimum.
+
+1. `test_generate_embedding_returns_vector_of_correct_dim`: mock embedding response with a 384-element vector; assert returned `vector` length is 384.
+2. `test_generate_embedding_returns_correct_model_metadata`: assert `result.model` matches the input.
+3. `test_generate_embedding_falls_back_on_failure`: mock the client to raise; assert the result is a 384-element zero vector with `model="fallback"`.
+
+**Commit:** `feat: embedding generation service (T91)`.
+
+---
+
+### Task 92: Vector search service via sqlite-vec
+
+**Files:**
+
+- Create: `chat/services/vector_search.py`
+- Create: `tests/test_vector_search.py`
+
+**Spec:** Wraps sqlite-vec's `MATCH` syntax for cosine-similarity search over the `embeddings` virtual table. Witness-filter aware (joins through `memories` table for the witness check).
+
+```python
+def vector_search(
+    conn,
+    *,
+    owner_id: str,
+    witness_role: str,    # "you" | "host" | "guest"
+    query_vector: list[float],
+    k: int = 4,
+) -> list[dict]:
+    """Return top-K memories by cosine similarity to query_vector,
+    witness-filtered for the requesting bot's POV. Returns same row
+    shape as state.memory.search_memories for combined-ranking
+    compatibility."""
+```
+
+SQL pattern (sqlite-vec):
+
+```sql
+SELECT m.id, m.text, m.pov_summary, m.significance, e.distance
+FROM embeddings e
+JOIN memories m ON m.id = e.memory_id
+WHERE e.embedding MATCH ?
+  AND k = ?
+  AND m.owner_id = ?
+  AND m.witness_<role> = 1
+ORDER BY e.distance ASC
+LIMIT ?
+```
+
+(Adapt to actual sqlite-vec syntax — use `vec0` MATCH semantics. The `witness_<role>` interpolation needs the same allowlist guard pattern as Phase 2.5 T72.3.)
+
+**Tests:** 3 minimum.
+
+1. `test_vector_search_returns_nearest_neighbors`: index 5 memories with synthetic vectors; query for nearest 3; assert correct order.
+2. `test_vector_search_respects_witness_filter`: index a memory with witness `[1, 1, 0]`; query with `witness_role="guest"`; assert empty result.
+3. `test_vector_search_respects_owner_filter`: index memories for two owners; assert query for owner_a doesn't return owner_b's memories.
+
+**Commit:** `feat: vector search service via sqlite-vec (T92)`.
+
+---
+
+### Task 93: Cross-chat search service
+
+**Files:**
+
+- Create: `chat/services/cross_chat_search.py`
+- Create: `tests/test_cross_chat_search.py`
+
+**Spec:** FTS5-based search across ALL chats and all owners (admin-style search; no witness filter). For "where did I last see this person mention X?" queries.
+
+```python
+def search_all_memories(
+    conn,
+    *,
+    query: str,
+    k: int = 20,
+) -> list[dict]:
+    """Search FTS across all owners and chats. Returns rows with
+    {memory_id, owner_id, chat_id, text, pov_summary, scene_id,
+    significance, ts}. Sorted by FTS rank."""
+```
+
+This is intentionally NOT witness-filtered — it's a power-user search surface. The UI (T100) prompts the user to acknowledge they're seeing memories across POVs.
+
+**Tests:** 3 minimum.
+
+1. `test_search_all_memories_returns_matches_across_owners`: seed 2 owners with overlapping keyword; search; assert both owner's matches appear.
+2. `test_search_all_memories_orders_by_fts_rank`: seed memories with varying FTS-match strength; assert order.
+3. `test_search_all_memories_respects_k_limit`.
+
+**Commit:** `feat: cross-chat search service (FTS5 over all owners) (T93)`.
+
+---
+
+## Wave 3 — Branching + delete services (parallel)
+
+Two new service modules. Fully file-disjoint.
+
+### Task 94: branch_from_event service
+
+**Files:**
+
+- Create: `chat/services/branching.py`
+- Create: `tests/test_branching.py`
+
+**Spec:**
+
+```python
+def branch_from_event(
+    conn,
+    *,
+    name: str,
+    origin_event_id: int,
+    chat_id: str | None = None,
+) -> int:
+    """Create a new named branch forking from origin_event_id.
+    Emits a branch_created event. Returns the new branch's row id.
+    Raises ValueError if name already exists."""
+
+def switch_active_branch(conn, *, name: str) -> None:
+    """Make the named branch active. Emits branch_switched. Subsequent
+    event reads should consult is_active to filter."""
+
+def list_branches_with_metadata(conn, chat_id: str | None = None) -> list[dict]:
+    """List branches with: name, origin_event_id, head_event_id, is_active,
+    event_count (number of events between origin and head, inclusive),
+    created_at."""
+```
+
+Tests cover: basic create, duplicate-name raises, switch updates `is_active` exclusively, list returns metadata.
+
+**Commit:** `feat: branching service (T94)`.
+
+---
+
+### Task 95: Delete-impact computation service
+
+**Files:**
+
+- Create: `chat/services/delete_impact.py`
+- Create: `tests/test_delete_impact.py`
+
+**Spec:** Computes the cascade impact of deleting a single event_log row (or a turn group: user_turn + assistant_turn + interjection if any). Returns a structured `ImpactReport` for the UI to render.
+
+```python
+class DeletedItem(BaseModel):
+    kind: str           # "memory" | "edge_update" | "scene_close" | etc.
+    description: str    # human-readable
+    target_id: int | str | None
+
+class ImpactReport(BaseModel):
+    target_event_id: int
+    cascading: list[DeletedItem]
+    notes: list[str]    # warnings, e.g. "this turn opened scene_X which has 3 subsequent turns"
+
+def compute_delete_impact(conn, *, target_event_id: int) -> ImpactReport:
+    """Walk the event log forward from target_event_id and identify
+    everything that depends on this event: child memory_written events,
+    edge_update events with this turn as source, scene_closed events
+    triggered by this turn, etc. Also identify subsequent turns that
+    REFERENCE this event (regenerated_from chains, etc.).
+    
+    Does NOT mutate the database. Pure computation for preview."""
+```
+
+The actual delete (truncate + supersede) is the existing rewind path from Phase 1 T31. T95 just builds the preview.
+
+**Tests:** 4 minimum.
+
+1. `test_impact_for_simple_turn_lists_memory_and_edges`: seed a chat with a turn that wrote 1 memory + 2 edge_updates. Compute impact. Assert the 3 items appear in `cascading`.
+2. `test_impact_for_scene_opening_turn_warns_about_subsequent_turns`: seed a turn that opened a scene + 5 subsequent turns. Assert `notes` mentions the dependency.
+3. `test_impact_for_regenerated_turn_lists_supersede_chain`: seed a turn that's been regenerated (has `superseded_by`). Compute impact for the original. Assert the chain appears.
+4. `test_impact_does_not_mutate_database`: snapshot event_log before + after; assert byte-identical.
+
+**Commit:** `feat: delete-impact computation service (T95)`.
+
+---
+
+## Wave 4 — Combined retrieval ranking (single)
+
+### Task 96: Combined FTS + vector retrieval ranking
+
+**Files:**
+
+- Modify: `chat/state/memory.py` — extend `search_memories` to optionally include vector hits
+- Modify: `tests/test_memory_search.py` — add 4 tests
+
+**Spec:**
+
+`search_memories` currently does FTS5 + Python-side significance/recency re-rank. Phase 4 adds:
+
+- An optional `query_vector: list[float] | None = None` kwarg.
+- When `query_vector` is provided, run `vector_search` (T92) for top-K-vector candidates.
+- Merge with FTS top-K candidates via reciprocal-rank fusion (RRF) or a simpler sum-of-ranks scheme — implementer's choice. Document the merge formula.
+- Final result is top-K from the fused set, with the existing significance + recency boosts applied as a final pass.
+
+When `query_vector` is None: existing behavior unchanged. Phase 1/2/3 callers that don't pass `query_vector` see no change.
+
+**Implementation note:** the embedding for the query (the speaker's recent context) must be generated by the caller (Wave 5 T97 wires the prompt-assembly pipeline to call `generate_embedding` on the dialogue tail). T96 only handles the search side — assumes the vector is pre-computed.
+
+**Tests:** 4 added.
+
+1. `test_search_memories_without_query_vector_uses_fts_only`: regression — call without `query_vector`; assert the existing FTS+rerank behavior.
+2. `test_search_memories_with_query_vector_includes_vector_hits`: index 5 memories where 1 is FTS-only-matching, 1 is vector-only-matching, 3 are unrelated. Pass both `query=...` and `query_vector=...`. Assert both the FTS hit and the vector hit appear in results.
+3. `test_search_memories_fusion_significance_bias_still_applies`: confirm the existing significance bias rerank still works on top of fused results.
+4. `test_search_memories_fusion_handles_empty_vector_results`: pass a vector for a memory that has no embeddings indexed; assert FTS-only results still come back.
+
+**Commit:** `feat: combined FTS + vector retrieval ranking (T96)`.
+
+---
+
+## Wave 5 — Memory write hook + backfill (single)
+
+### Task 97: Embedding generation hook + backfill script
+
+**Files:**
+
+- Modify: `chat/services/memory_write.py` — after each `memory_written` event, enqueue a background embedding job
+- Create: `chat/services/embedding_worker.py` — async worker that consumes the queue and emits `embedding_indexed` events
+- Create: `scripts/backfill_embeddings.py` — one-time script that walks all existing memories and embeds them
+- Modify: `chat/app.py` — wire the embedding worker into the lifespan startup
+- Modify: `tests/test_memory_write.py` — add 2 tests for the enqueue hook
+- Create: `tests/test_embedding_worker.py` — 3 tests for the worker drain logic
+
+**Spec:**
+
+After each successful `memory_written` event, enqueue an embedding job. The worker dequeues and:
+
+1. Reads the memory text (via `get_memory(conn, memory_id)`).
+2. Calls `generate_embedding(client, text=memory.text, model=settings.embedding_model)`.
+3. Appends `embedding_indexed` event with the result. (Skip if `result.model == "fallback"` — leave the memory un-indexed; will retry later via backfill.)
+
+The worker pattern mirrors Phase 1's `chat/services/significance.py` SignificanceWorker. Reuse its queue + lifecycle pattern.
+
+**Backfill script:**
+
+```bash
+.venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run]
+```
+
+Walks all memories where no `embeddings_meta` row exists. For each, generates an embedding and emits `embedding_indexed`. Useful for the initial migration after Phase 4 lands AND for periodic re-runs if an embedding model changes.
+
+**Tests:**
+
+`tests/test_memory_write.py`:
+1. `test_record_turn_memory_enqueues_embedding_job`: monkeypatch the worker's enqueue method; record_turn_memory_for_present; assert the worker received a job per memory.
+
+`tests/test_embedding_worker.py`:
+1. `test_worker_drains_jobs_and_emits_indexed_events`: enqueue 3 jobs with mock embeddings; run worker; assert 3 `embedding_indexed` events landed.
+2. `test_worker_skips_fallback_results`: mock the embedding service to return a fallback result; assert NO `embedding_indexed` event landed for that job.
+3. `test_worker_handles_concurrent_jobs_serially`: pin the Featherless 2-conn cap behavior (worker calls embed sequentially under the existing semaphore).
+
+**Commit (split):**
+
+- `feat: embedding worker drains queue and emits embedding_indexed events (T97.1)`
+- `feat: memory_write enqueues embedding job after each memory_written (T97.2)`
+- `feat: backfill_embeddings script for existing memories (T97.3)`
+
+**Verification gates:**
+
+- All Phase 1/2/3/3.5 memory tests still pass (regression critical).
+- New tests pass.
+- Manual smoke: run `scripts/backfill_embeddings.py --dry-run` against a seeded DB and verify expected count.
+
+---
+
+## Wave 6 — Drawer Phase 4 bundle (single task)
+
+### Task 98: Drawer Phase 4 features
+
+**Files:**
+
+- Modify: `chat/web/drawer.py` (add many new POST routes and GET extensions)
+- Modify: `chat/templates/_drawer.html` (add 5 new sections)
+- Create: `tests/test_drawer_phase4.py`
+
+**Spec:** Drawer affordances for 5 Phase 4 features. Single task by hot-file constraint; split into 5 commits internally.
+
+#### 98.1 — Branching UI
+
+GET drawer extension: `list_branches_with_metadata(conn)` → render in a "Branches" section (active branch highlighted + count of events).
+
+POST routes:
+- `/drawer/branch/create` — form `{name, origin_event_id}` → `branch_from_event` service.
+- `/drawer/branch/switch` — form `{name}` → `switch_active_branch`.
+- `/drawer/branch/from-turn/{event_id}` — convenience: branch from a specific turn (used by per-turn UI affordance).
+
+#### 98.2 — Significance review panel
+
+GET extension: significance distribution per chat (`SELECT significance, COUNT(*) GROUP BY significance`) → render histogram.
+
+POST route:
+- `/drawer/memory/significance/{memory_id}` — form `{new_value}` (already supported via T22 `manual_edit` `target_kind=memory_significance`); just add the UI form.
+
+Bulk re-rate is a Phase 4.5 polish — not in scope here. Just per-memory edit + distribution display.
+
+#### 98.3 — Hide-from-view toggle
+
+POST route:
+- `/drawer/turn/hide/{event_id}` — form `{hidden: bool}` → emits a `manual_edit` with `target_kind="turn_hidden"`.
+
+NEW `manual_edit` projector branch for `turn_hidden`: sets `event_log.hidden = ?` for the target event. Reuses the existing `hidden` column.
+
+UI affordance: per-turn checkbox in the chat surface or drawer (per-turn list with hide toggle).
+
+#### 98.4 — Surgical delete with cascade preview
+
+GET extension:
+- `/drawer/turn/delete-preview/{event_id}` → returns the `ImpactReport` (T95) rendered as a modal.
+
+POST route:
+- `/drawer/turn/delete/{event_id}` — invokes the rewind-and-truncate path (Phase 1 T31's `rewind_to_turn`) restricted to the target turn group.
+
+Important: this reuses the existing pre-rewind snapshot path so the action is undoable.
+
+#### 98.5 — Remaining v1 edits
+
+Audit: are any v1 fields STILL not editable from the drawer? Phase 2.5 T72.1 added edge_trust/edge_summary/memory_pov_summary/edge_knowledge_facts. T72.3 added witness flags. Anything left?
+
+Likely candidates: scene `narrative_anchor`, scene `weather`, container `properties` JSON. Add edit forms for any that surface during the audit. If none, this sub-fix is a no-op.
+
+**Tests:** 8+ in `tests/test_drawer_phase4.py` (one per sub-feature × happy path; plus 1 for the cascade-preview rendering).
+
+**Commits (5):**
+
+- `feat: drawer branching UI (T98.1)`
+- `feat: drawer significance review panel (T98.2)`
+- `feat: drawer hide-from-view toggle + manual_edit turn_hidden branch (T98.3)`
+- `feat: drawer surgical delete with cascade preview (T98.4)`
+- `feat: drawer remaining v1 field edits (T98.5)` (or "no-op audit" if nothing left)
+
+---
+
+## Wave 7 — Snapshot + cross-chat search UX (parallel)
+
+### Task 99: Snapshot UX
+
+**Files:**
+
+- Create: `chat/web/snapshots.py` (new route module)
+- Create: `chat/templates/snapshots.html` (snapshot list page)
+- Modify: `chat/templates/layout.html` (add "Snapshots" nav link)
+- Create: `tests/test_snapshot_ux.py`
+
+**Spec:** Surface the existing snapshot infrastructure (Phase 1 T20 wrote snapshots; Phase 4 makes them visible).
+
+GET `/snapshots` — list all snapshots (periodic + pre-rewind) with metadata: kind, created_at, event_log_size, file_size_bytes.
+
+POST `/snapshots/take` — manually trigger a snapshot now.
+
+POST `/snapshots/restore/{snapshot_id}` — restore from snapshot (with hard confirmation).
+
+GET `/snapshots/{snapshot_id}/preview` — show what's in the snapshot vs. current state.
+
+**Tests:** 4 minimum (list, take, restore, preview).
+
+**Commit:** `feat: snapshot UX (manual trigger, list, restore) (T99)`.
+
+---
+
+### Task 100: Cross-chat search UX
+
+**Files:**
+
+- Create: `chat/web/search.py` (new route module)
+- Create: `chat/templates/search.html` (search results page)
+- Modify: `chat/templates/layout.html` (add top-bar search input)
+- Create: `tests/test_search_ux.py`
+
+**Spec:** Top-bar search box submits to `/search?q=...`. Results page shows up to 50 matches across all chats and all owners (uses T93's `search_all_memories`). Each result shows: chat name, owner bot name, scene context, memory text excerpt with FTS highlight, "Open chat at this turn" link.
+
+**Tests:** 3 minimum.
+1. Search returns results from multiple chats.
+2. Empty query returns empty result set.
+3. Result links navigate to the right chat anchor.
+
+**Commit:** `feat: cross-chat search UX (top-bar input + results page) (T100)`.
+
+---
+
+## Wave 8 — Polish (parallel)
+
+### Task 101: Cross-feature integration tests
+
+**Files:**
+
+- Create: `tests/test_phase4_integration.py`
+
+**Spec:** End-to-end multi-feature flows. 5 tests minimum.
+
+1. **Vector retrieval feedback loop**: write a memory → embedding worker indexes it → search retrieves it via vector path.
+2. **Branch + diverge**: create branch B from turn 10 → switch to B → play 3 new turns → switch back to main → assert main's turn 11+ are still intact.
+3. **Surgical delete**: compute impact for a turn → confirm → assert event log truncated correctly + pre-rewind snapshot saved.
+4. **Hide + retrieval**: hide a turn → assert it doesn't appear in `read_recent_dialogue` (existing `hidden = 0` filter) → unhide → assert it reappears.
+5. **Cross-chat search**: write memories in 3 chats → search for keyword present in all 3 → assert all 3 appear in results.
+
+**Commit:** `test: phase 4 cross-feature integration coverage (T101)`.
+
+---
+
+### Task 102: Phase 4 documentation update
+
+**Files:**
+
+- Modify: `CLAUDE.md` (add "Phase 4 status" section; update behavioral defaults; add "Phase 4.5 / 5 backlog" with carry-overs)
+- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 as **Status: shipped 2026-04-27**)
+
+**Spec:**
+
+Mirror the Phase 3 / 3.5 status sections. Document:
+
+- **Vector retrieval**: sqlite-vec virtual table, embedding worker async pipeline, combined FTS + vector ranking via RRF.
+- **Branching**: forks the event log; UI in drawer; `is_active` flag plus orchestrator filter (caveat — see backlog if filter not yet wired into all readers).
+- **Drawer-edit on every field**: branching, significance review, hide-from-view, surgical delete with preview, plus any audit findings.
+- **Backup tooling**: snapshots panel surfaces existing infra.
+- **Significance review UI**: distribution + per-memory edit.
+- **Surgical delete + cascade preview**: piggybacks on rewind path; impact report from T95.
+- **Hide-from-view soft delete**: `manual_edit` `turn_hidden` branch.
+- **Cross-chat search**: top-bar + results page over T93's service.
+
+**Phase 4.5 / 5 backlog candidates** (reflect any discovered during execution):
+
+- Branching read-side filter — if T89's `is_active` isn't yet consulted by every event reader, this is the work to do.
+- Bulk significance re-rate (per T98.2 deferral).
+- Snapshot retention policy UI controls (per Phase 1 T19 deferred).
+- Auto-pin override UI (per Phase 2 design).
+- Embedding model swap migration tooling (when changing embedding model, need to re-embed everything).
+- Vector index optimization (HNSW vs flat — Phase 5 if needed).
+- Carry-overs that remained deferred from Phase 3.6: scene-close-on-cancel UX revisit, canned-queue brittleness fixture builder, full lifecycle rollback in regenerate.
+
+**Commit:** `docs: phase 4 status, behavioral defaults, deferred items (T102)`.
+
+---
+
+## Wrap-up
+
+After Wave 8 lands:
+
+1. **Run full suite** on `phase-4`: should be ~390+ tests passing (343 from Phase 3.5 + ~50 new).
+2. **Manual smoke** (recommended before opening the PR):
+   - Run `scripts/backfill_embeddings.py` against a seeded DB to verify vector indexing works.
+   - Search for a phrase that's substring-distinct but semantically similar to a memory; verify vector path returns it (FTS would miss).
+   - Create a branch from an old turn; switch; play a few turns; switch back.
+   - Trigger surgical delete on a turn; verify the impact preview matches what actually gets removed.
+   - Hide a turn; verify it disappears from the chat surface; unhide.
+   - Use top-bar search to find a phrase; verify cross-chat results appear.
+   - Click the "Snapshots" nav link; trigger a manual snapshot; verify it appears.
+3. **Push `phase-4`** to gitea.
+4. **Open PR** `phase-4 → main`.
+
+---
+
+## Notes for the controller running this plan
+
+- **External dependency**: `sqlite-vec` (or `sqlite-vss`) MUST be added to `pyproject.toml` and installed BEFORE Wave 1 dispatches. The migration in T88 expects the extension to be loadable.
+- **Embedding model choice**: pin in T91 spec before dispatch. The 384 dim is hardcoded in T88's migration; if a different dim is used, update T88 first.
+- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval and integration tasks (T91, T96, T97, T98, T101) — surface area is larger.
+- **Don't dispatch Wave 5 until Wave 4 merged green.** T97 (memory_write enqueue) calls into the embedding-aware worker; the worker uses T91's `generate_embedding`. Both must be merged into `phase-4` first.
+- **Don't dispatch Wave 6 until Wave 5 merged green.** T98 (drawer) wires UI affordances over services from earlier waves.
+- **Token-spend rough estimate**: Phase 4 should be ~70-80% the size of Phase 3 (similar scope, larger per-task because vector + branching are non-trivial). Per-task spend similar to Phase 3's larger tasks (T59, T64).
+- **DO NOT break existing v1/v2/v3/v3.5 surface contracts.** Every test file that was green at the start of Phase 4 must stay green at the end. The cross-feature integration tests from Phase 3 (`tests/test_phase3_integration.py`) are particularly load-bearing.
diff --git a/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json b/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json
new file mode 100644
index 0000000..7dc377e
--- /dev/null
+++ b/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json
@@ -0,0 +1,22 @@
+{
+  "planPath": "docs/plans/2026-04-27-v4-phase4-implementation.md",
+  "tasks": [
+    {"id": 88, "subject": "T88: embeddings table + projector handlers (sqlite-vec)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 89, "subject": "T89: branches table + projector handlers", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 90, "subject": "T90: phase 3.6 carry-overs (chat-id pushdown + lifecycle wording + legacy fn consolidation)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
+    {"id": 91, "subject": "T91: embedding generation service", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]},
+    {"id": 92, "subject": "T92: vector search service via sqlite-vec", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]},
+    {"id": 93, "subject": "T93: cross-chat search service (FTS5 over all owners)", "status": "pending", "wave": 2, "parallelGroup": "wave-2"},
+    {"id": 94, "subject": "T94: branch_from_event service", "status": "pending", "wave": 3, "parallelGroup": "wave-3", "blockedBy": [89]},
+    {"id": 95, "subject": "T95: delete-impact computation service", "status": "pending", "wave": 3, "parallelGroup": "wave-3"},
+    {"id": 96, "subject": "T96: combined FTS + vector retrieval ranking in search_memories", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [91, 92]},
+    {"id": 97, "subject": "T97: memory_write enqueues embedding job + backfill script", "status": "pending", "wave": 5, "parallelGroup": null, "blockedBy": [91, 96]},
+    {"id": 98, "subject": "T98: drawer Phase 4 bundle (branching + sig review + hide + surgical delete + remaining edits)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [94, 95, 97]},
+    {"id": 99, "subject": "T99: snapshot UX (manual trigger + list + restore + preview)", "status": "pending", "wave": 7, "parallelGroup": "wave-7"},
+    {"id": 100, "subject": "T100: cross-chat search UX (top-bar + results page)", "status": "pending", "wave": 7, "parallelGroup": "wave-7", "blockedBy": [93]},
+    {"id": 101, "subject": "T101: cross-feature integration tests (vector × branching × delete × snapshot × search)", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]},
+    {"id": 102, "subject": "T102: Phase 4 documentation update", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]}
+  ],
+  "lastUpdated": "2026-04-27T00:00:00Z",
+  "notes": "15 tasks across 8 waves. Adds vector retrieval (sqlite-vec), branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view, and cross-chat search. Phase 3.6 carry-overs (3 small fixes) bundled into T90. External dependency: sqlite-vec must be installed BEFORE Wave 1 dispatch. Embedding model choice (default: 384-dim small model) pinned in T91 spec before dispatch — schema 0012 hardcodes 384 dim. Two new schema migrations (0012 embeddings, 0013 branches), final schema version 13. Uses task ids T88-T102."
+}