docs: add Phase 4 implementation plan (vector retrieval + branching + polish)

15 tasks across 8 waves landing the Phase 4 deliverables per
requirements doc §13 + §14:

- Vector retrieval via sqlite-vec (new external dependency)
- Branching UI (event log forks)
- Drawer-edit on every field (significance review, hide-from-view,
  surgical delete with cascade preview, branching affordances)
- Backup tooling (snapshot UX surface)
- Cross-chat search

Plus the 3 Phase 3.6 carry-over fixes (T90 bundle).

Wave structure:
- W1 (parallel 3-way): schema foundation + carry-overs
- W2 (parallel 3-way): embedding/search services
- W3 (parallel 2-way): branching + delete services
- W4 (single): combined retrieval ranking
- W5 (single): memory write hook + backfill
- W6 (single): drawer Phase 4 bundle (5 sub-features)
- W7 (parallel 2-way): snapshot UX + cross-chat search UX
- W8 (parallel 2-way): integration tests + docs

External dependency: sqlite-vec must be installed BEFORE Wave 1.
Embedding model choice (384-dim default) pinned in T91 before dispatch
since the migration hardcodes the dimension.

Schema baseline: 11 -> 13 (adds 0012_embeddings.sql + 0013_branches.sql).
Task ids T88-T102 to avoid collision with prior phases.
This commit is contained in:
Joseph Doherty
2026-04-27 02:03:08 -04:00
parent 1b66a2821c
commit bffd9a2f38
2 changed files with 854 additions and 0 deletions
@@ -0,0 +1,832 @@
# Roleplay Engine — Phase 4 Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
**Goal:** Land Phase 4 polish per requirements doc §13 + §14: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view soft delete, plus cross-chat search and the small Phase 3.6 carry-over fixes.
**Architecture:** Builds on Phase 3.5's stable base. Two new tables (`embeddings`, `branches`) and one external dependency (sqlite-vec extension). Embedding generation runs as a deferred async job — NOT inline with turns — so the play loop stays fast even when the embedding endpoint is slow. Branching is data-model-only at first (events + selectors); UI grafts on top. Surgical delete + cascade preview reuses the existing rewind-and-supersede plumbing. Cross-chat search piggybacks on the existing FTS5 + (now) vector retrieval.
**Tech Stack:**
- **NEW dependency: `sqlite-vec`** (or `sqlite-vss` — Phase 4 picks; recommended `sqlite-vec` for simpler load semantics and active maintenance). Add to `pyproject.toml`.
- **Embedding model selection** is part of T91 spec. Recommended default: a small model on Featherless (e.g., `BAAI/bge-small-en-v1.5` if available) or a local CPU-friendly model via `sentence-transformers`. Document choice in CLAUDE.md.
- Same as Phase 3 otherwise (Python 3.11+, FastAPI, HTMX, SQLite).
**Source-of-truth references:**
- Phase 4 scope: requirements doc §13 "Phase 4 — polish" + §14 "Open / Deferred Decisions".
- Behavioral details: §6 (prompt assembly + retrieval), §10 (rewind / regenerate / reset), §11 (compression + significance), §12 (snapshots).
- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 3 status" + §"Phase 3.5 status".
- Phase 3.5 cleanup plan (style, file-bundling pattern): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md).
---
## Pre-flight
**Branch:** create `phase-4` from the latest `main` after Phase 3.5 has merged (it has — main is at `1b66a28`):
```bash
git checkout main && git pull && git checkout -b phase-4
```
**Schema baseline:** Phase 3.5 leaves the DB at version 11. Phase 4 adds two migrations: `0012_embeddings.sql` and `0013_branches.sql`. Final schema version: 13.
**External dependency setup (BEFORE T88 dispatch):**
The controlling agent should add `sqlite-vec` to `pyproject.toml` and run `pip install -e .` (or equivalent) so all worktrees pick up the new dependency. Confirm `sqlite_vec` imports cleanly:
```bash
python -c "import sqlite_vec; print(sqlite_vec.__version__)"
```
If `sqlite_vec` isn't on PyPI when this plan executes, fall back to `sqlite-vss` and adapt T88/T92 accordingly. Both expose vector-search SQL via a loadable extension.
**Pinned non-negotiables (carried forward):**
- State changes go through the event log. Use `append_and_apply(conn, kind, payload)` for the live path; `apply_event` only after a fresh `append_event` returning the new id.
- Witness filter every memory read at SQL level (hard `WHERE` constraint; never a soft signal).
- Per-POV scene summaries — never write omniscient narration.
- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
- One commit per task minimum. Tasks that bundle multiple sub-features SHOULD split commits internally.
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. Don't assume green.
---
## Phase 3.6 carry-overs folded in
Three small items from Phase 3.6 backlog are bundled into Phase 4's Wave 1 trivial-fixes task (T90):
1. `read_recent_dialogue` chat-id pushdown into SQL (T80 review nit)
2. Lifecycle warning wording in regenerate (T83.4 — "at-or-after turn X" tightening)
3. Legacy single-bot `record_turn_memory` consolidation (T84 review nit)
Three items remain DEFERRED beyond Phase 4 (Phase 4.5 if needed):
- Scene-close-on-cancel UX revisit (no action unless real play surfaces a regression).
- Cross-feature canned-queue brittleness (structured fixture builder for tests — not blocking).
- Full lifecycle-rollback in regenerate (warning log already shipped in T83.4; proper rollback needs schema-level back-references, deferred indefinitely).
---
## Parallel-Execution Strategy
Same pattern as Phase 3.5. Eight waves: parallel within each wave (file-disjoint), serial across waves.
### How to dispatch a wave in parallel
Use the **Agent tool with `isolation: "worktree"`** so each subagent gets its own git worktree. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4` from inside the chat repo.)
Dispatch all tasks in a wave in a single message:
```
Agent({ description: "Wave 1 — T88 embeddings table", prompt: "...", isolation: "worktree" })
Agent({ description: "Wave 1 — T89 branches table", ... })
Agent({ description: "Wave 1 — T90 phase 3.6 carry-overs", ... })
```
### After a wave completes
1. Each subagent returns its worktree path and commit SHA(s).
2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval tasks (T91, T92, T96, T97) since the integration surface is wider.
3. **Merge the wave into `phase-4`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
4. **Run the full test suite** on the merged `phase-4`. If red, the wave's mutual-independence assumption was violated — bisect, fix, re-merge.
5. **Push `phase-4`** to gitea.
6. Optionally clean up worktrees.
### Conflict prevention checklist
For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan: `chat/web/drawer.py` + `chat/templates/_drawer.html` (T98 only — bundled), `chat/state/memory.py` (T96 only), `chat/services/memory_write.py` (T90 + T97 — sequential), `chat/web/turns.py` (T98 only via delete affordance — sequential after T96).
### Why each wave is parallel-safe
| Wave | Tasks | Hot files touched | Disjoint? |
|------|-------|-------------------|-----------|
| 1 | T88, T89, T90 | new migrations + new state modules; T90 touches `turn_common.py` + `regenerate.py` + `memory_write.py` (additive only) | ✅ |
| 2 | T91, T92, T93 | new service modules (embeddings, vector_search, cross_chat_search) | ✅ |
| 3 | T94, T95 | new service modules (branching, delete_impact) | ✅ |
| 4 | T96 | `chat/state/memory.py` (combined retrieval ranking) | (single task) |
| 5 | T97 | `chat/services/memory_write.py` + new backfill script | (single task) |
| 6 | T98 | `chat/web/drawer.py` + `chat/templates/_drawer.html` (drawer Phase 4 bundle) | (single task) |
| 7 | T99, T100 | new files: `chat/web/snapshots.py` + `chat/templates/snapshots.html` (T99); `chat/web/search.py` + `chat/templates/search.html` + small chat.html top-bar addition (T100) | ✅ (disjoint) |
| 8 | T101, T102 | new test file (T101); CLAUDE.md + design doc (T102) | ✅ |
---
## Task overview
```
Wave 1 ─┬─ T88: embeddings table + projector handlers
├─ T89: branches table + projector handlers
└─ T90: Phase 3.6 carry-overs trio (chat-id SQL pushdown + lifecycle wording + legacy-fn consolidation)
Wave 2 ─┬─ T91: embedding generation service (Featherless or local)
├─ T92: vector search service via sqlite-vec
└─ T93: cross-chat search service (FTS over all owners)
Wave 3 ─┬─ T94: branch_from_event service (event-log fork, branch metadata)
└─ T95: delete-impact computation service (cascade preview)
Wave 4 ─── T96: combined FTS + vector retrieval ranking in search_memories
Wave 5 ─── T97: memory_write enqueues embedding job + backfill script for existing memories
Wave 6 ─── T98: drawer Phase 4 bundle — branching UI + significance review + hide-from-view + surgical delete + remaining v1 edits
Wave 7 ─┬─ T99: snapshot UX (manual trigger, retention display, restore-from-snapshot UI)
└─ T100: cross-chat search UX (top-bar input + search results page)
Wave 8 ─┬─ T101: cross-feature integration tests (vector × branching × delete × snapshot × search)
└─ T102: Phase 4 documentation update
```
Critical path: 8 sequential merge points. Total tasks: 15. Parallelism: Waves 1, 2, 3, 7, 8 dispatch concurrently (3-way and 2-way). Waves 4, 5, 6 are single-task by hot-file constraint.
---
## Wave 1 — Schema foundation + Phase 3.6 carry-overs (parallel)
### Task 88: Embeddings table + projector handlers
**Files:**
- Create: `chat/db/migrations/0012_embeddings.sql`
- Create: `chat/state/embeddings.py`
- Create: `tests/test_embeddings_state.py`
- Modify: `pyproject.toml` (add `sqlite-vec` dependency — controlling agent should pre-install before dispatch; the worktree commits the dependency declaration)
**Spec:**
Adds the `embeddings` table that stores per-memory embedding vectors for vector retrieval. Uses `sqlite-vec` virtual-table syntax for cosine-similarity search. Schema:
```sql
-- Load sqlite-vec extension at connection time (handled in chat/db/connection.py).
-- Embeddings are stored as blobs in a vec0 virtual table for fast similarity search.
CREATE VIRTUAL TABLE embeddings USING vec0(
memory_id INTEGER PRIMARY KEY,
embedding FLOAT[384] -- 384-dim default; adjust per chosen model
);
-- Sidecar table for non-vector metadata (model used, dim, indexed_at).
CREATE TABLE embeddings_meta (
memory_id INTEGER PRIMARY KEY,
model TEXT NOT NULL,
dim INTEGER NOT NULL,
indexed_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (memory_id) REFERENCES memories(id)
);
```
(If `sqlite-vss` is chosen instead, replace `vec0` with `vss0` and adapt the dim declaration. Both have similar Python loading semantics.)
**`chat/state/embeddings.py`:**
- `@on("embedding_indexed")` payload `{memory_id, model, dim, vector: list[float]}`. Inserts into both `embeddings` and `embeddings_meta`. Idempotent via `INSERT OR REPLACE` (re-indexing a memory replaces the prior vector).
- `@on("embedding_deindexed")` payload `{memory_id}`. Deletes from both tables. Used when a memory is purged via reset/cascade.
- Reader `get_embedding_meta(conn, memory_id) -> dict | None` returns the meta row.
The `chat/db/connection.py` `open_db` helper needs to load the sqlite-vec extension on each connection. Add:
```python
import sqlite_vec
# Inside open_db, after connection is opened:
conn.enable_load_extension(True)
sqlite_vec.load(conn)
conn.enable_load_extension(False)
```
This is a small modification to `connection.py`. Include it in T88's diff.
**Tests:** 3 minimum.
1. `test_embedding_indexed_inserts_row`: append `bot_authored`, `chat_created`, `memory_written` (creates a memory), then `embedding_indexed` with `vector=[0.1] * 384`. Project. Assert `embeddings_meta` row exists for that memory_id with the right model.
2. `test_embedding_deindexed_removes_row`: same setup; index then de-index; assert row is gone.
3. `test_vector_similarity_search_returns_nearest`: index two memories with distinct vectors; query for nearest neighbor of one vector; assert correct memory_id returned. Uses `sqlite-vec`'s `MATCH '...'` syntax (verify against actual sqlite-vec docs; adapt if needed).
If running tests requires sqlite-vec to be loaded, the test fixture may need to skip / xfail when the extension isn't installed. Use `pytest.importorskip("sqlite_vec")` at the top of the test file.
**Commit:** `feat: embeddings table + projector handlers via sqlite-vec (T88)`.
**Notes:**
- Schema version after migration alone: 12. T89 adds 0013, taking final to 13. The schema_version assertion in `tests/test_world.py` updates to 13 in the wave-merge step.
- The `connection.py` change is small but cross-cutting — affects every `open_db` call. Verify the existing 343 tests still pass after the change.
---
### Task 89: Branches table + projector handlers
**Files:**
- Create: `chat/db/migrations/0013_branches.sql`
- Create: `chat/state/branches.py`
- Create: `tests/test_branches_state.py`
**Spec:**
Adds the `branches` table that records named alternate event-log forks. A branch is metadata: a name, an `origin_event_id` (the event we forked from), and a `head_event_id` (the latest event in this branch). The event log itself is unchanged — the branch table just **labels** linear ranges of event ids.
```sql
CREATE TABLE branches (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
origin_event_id INTEGER NOT NULL,
head_event_id INTEGER NOT NULL,
chat_id TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
is_active INTEGER NOT NULL DEFAULT 0
);
-- Exactly one row may have is_active = 1 at any time.
CREATE UNIQUE INDEX branches_active_idx ON branches(is_active) WHERE is_active = 1;
```
The "main" branch is implicit and bootstrapped by the migration: `INSERT INTO branches (name, origin_event_id, head_event_id, is_active) VALUES ('main', 0, 0, 1);`. Subsequent branches reference an `origin_event_id` (the event that the branch forked from).
`chat/state/branches.py`:
- `@on("branch_created")` payload `{name, origin_event_id, chat_id?, head_event_id}`. Inserts a new row with `is_active=0`. Idempotent re-insertion via `INSERT OR IGNORE`.
- `@on("branch_switched")` payload `{name}`. Sets `is_active=1` on the named branch and `is_active=0` on all others. Atomic via a single UPDATE.
- `@on("branch_head_updated")` payload `{name, head_event_id}`. Updates `head_event_id` on the named branch. Used by the orchestrator when new events extend the branch.
- Readers: `get_branch(conn, name)`, `list_branches(conn, chat_id=None)`, `active_branch(conn)`.
**Tests:** 3 minimum.
1. `test_branch_created_inserts_row`: append `branch_created` with name="experiment", origin_event_id=42; project; assert `get_branch(conn, "experiment")` returns the row.
2. `test_branch_switched_atomic`: seed two branches; switch from one to the other; assert exactly one is active.
3. `test_main_branch_bootstrapped_by_migration`: open a fresh DB, apply migrations; assert `active_branch(conn)["name"] == "main"`.
**Commit:** `feat: branches table + projector handlers (T89)`.
**Notes:**
- Schema version after this migration alone: 13. Combined with T88: 13 (since T88 was 12, T89 stacks). Wave-merge bumps `tests/test_world.py` schema_version assertion to 13.
- This task does NOT yet teach the orchestrator to consult `is_active` — the existing event_log queries assume a single timeline. T98 (drawer branching UI) will enable user-driven switches, but the actual "follow only the active branch" filter on event reads is a follow-up (Phase 4.5 nit; document in T102 docs sweep).
---
### Task 90: Phase 3.6 carry-overs trio
**Files:**
- Modify: `chat/services/turn_common.py` (push chat_id filter into SQL)
- Modify: `chat/services/regenerate.py` (lifecycle warning wording tightening)
- Modify: `chat/services/memory_write.py` (consolidate legacy `record_turn_memory` into the unified API or delete it)
- Modify: `tests/test_turn_common.py`, `tests/test_regenerate.py`, `tests/test_memory_write.py`
**Spec:** Three small Phase 3.6 carry-over fixes bundled because each is 1-line + 1-test.
#### 90.1 — `read_recent_dialogue` chat-id SQL pushdown
Per T80 review nit. Currently `read_recent_dialogue` filters chat_id post-fetch in Python. Push into SQL for tighter LIMIT semantics:
```sql
SELECT id, kind, payload_json
FROM event_log
WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn')
AND superseded_by IS NULL
AND hidden = 0
AND json_extract(payload_json, '$.chat_id') = ?
ORDER BY id DESC
LIMIT ?
```
Then the post-fetch loop becomes a simple reverse + slice — no chat_id check needed.
**Test added:** `test_read_recent_dialogue_limit_respects_chat_scope` — seed two chats with 60 turns each; query chat_a with `limit=50`; assert returned rows are exactly 50 chat_a rows (not 50 cross-chat rows that filter down to <50 after Python).
**Commit:** `perf: read_recent_dialogue pushes chat-id filter into SQL (T90.1)`.
#### 90.2 — Lifecycle warning wording tightening
Per T83.4 review nit. Current warning lists "lifecycle transitions from superseded turn are NOT being rolled back". When user regenerates an OLDER turn (T29 supports this), the warning lists intervening-turn transitions that legitimately stand. Tighten wording to "lifecycle transitions at-or-after turn X" so operators reading logs aren't misled.
Change is one log message string. Test asserts the new wording appears.
**Commit:** `chore: clarify regenerate lifecycle warning wording (T90.2)`.
#### 90.3 — Legacy `record_turn_memory` consolidation
Per T84 review nit. The original Phase 1 single-bot `record_turn_memory` function still exists alongside the unified `record_turn_memory_for_present`. Either:
- (a) Remove the legacy function entirely; update any remaining callers to use the unified API.
- (b) Convert it to a thin wrapper for backward compat.
Pick (a) if there are zero remaining callers; (b) if any callers exist. Read the codebase to confirm. The mock-data seed scripts may still use the legacy fn.
**Commit:** `refactor: consolidate legacy record_turn_memory into unified API (T90.3)`.
**TDD process for T90:**
1. Read all 3 affected files + their tests.
2. Implement 90.1 with test; commit.
3. Implement 90.2 with test; commit.
4. Implement 90.3 with test; commit.
5. Run full suite — should be 343 + 3 = 346 (or +2 if 90.3 had no behavioral change).
---
## Wave 2 — Embedding & search services (parallel)
Three new service modules. Fully file-disjoint.
### Task 91: Embedding generation service
**Files:**
- Create: `chat/services/embeddings.py`
- Create: `tests/test_embeddings.py`
**Spec:** Wraps the embedding API call. Signature:
```python
class EmbeddingResult(BaseModel):
vector: list[float]
model: str
dim: int
async def generate_embedding(
client: LLMClient, # or a separate embedding-specific client
*,
text: str,
model: str,
timeout_s: float = 30.0,
) -> EmbeddingResult:
"""Generate an embedding vector for the given text. Falls back to a
zero-vector with model='fallback' on failure (so callers get a deterministic
sentinel they can detect and skip indexing)."""
```
**Implementation:** call the embedding endpoint (Featherless OpenAI-compatible `/v1/embeddings`, or a local `sentence-transformers` model). Add a new method `client.embed(text, model)` to `LLMClient` Protocol (and to `MockLLMClient` and `FeatherlessClient`).
**Embedding model choice:**
Default to a small CPU-friendly model accessible through the existing Featherless setup:
- If Featherless has `BAAI/bge-small-en-v1.5` or similar 384-dim model: use that.
- If not: fall back to local `sentence-transformers/all-MiniLM-L6-v2` (384-dim, runs CPU). Add `sentence-transformers` to `pyproject.toml`.
- Document choice in CLAUDE.md (T102 docs sweep).
The 384 dim is hardcoded in T88's migration. If a different model with different dim is chosen, update T88's schema accordingly BEFORE T88 dispatches.
**Tests:** 3 minimum.
1. `test_generate_embedding_returns_vector_of_correct_dim`: mock embedding response with a 384-element vector; assert returned `vector` length is 384.
2. `test_generate_embedding_returns_correct_model_metadata`: assert `result.model` matches the input.
3. `test_generate_embedding_falls_back_on_failure`: mock the client to raise; assert the result is a 384-element zero vector with `model="fallback"`.
**Commit:** `feat: embedding generation service (T91)`.
---
### Task 92: Vector search service via sqlite-vec
**Files:**
- Create: `chat/services/vector_search.py`
- Create: `tests/test_vector_search.py`
**Spec:** Wraps sqlite-vec's `MATCH` syntax for cosine-similarity search over the `embeddings` virtual table. Witness-filter aware (joins through `memories` table for the witness check).
```python
def vector_search(
conn,
*,
owner_id: str,
witness_role: str, # "you" | "host" | "guest"
query_vector: list[float],
k: int = 4,
) -> list[dict]:
"""Return top-K memories by cosine similarity to query_vector,
witness-filtered for the requesting bot's POV. Returns same row
shape as state.memory.search_memories for combined-ranking
compatibility."""
```
SQL pattern (sqlite-vec):
```sql
SELECT m.id, m.text, m.pov_summary, m.significance, e.distance
FROM embeddings e
JOIN memories m ON m.id = e.memory_id
WHERE e.embedding MATCH ?
AND k = ?
AND m.owner_id = ?
AND m.witness_<role> = 1
ORDER BY e.distance ASC
LIMIT ?
```
(Adapt to actual sqlite-vec syntax — use `vec0` MATCH semantics. The `witness_<role>` interpolation needs the same allowlist guard pattern as Phase 2.5 T72.3.)
**Tests:** 3 minimum.
1. `test_vector_search_returns_nearest_neighbors`: index 5 memories with synthetic vectors; query for nearest 3; assert correct order.
2. `test_vector_search_respects_witness_filter`: index a memory with witness `[1, 1, 0]`; query with `witness_role="guest"`; assert empty result.
3. `test_vector_search_respects_owner_filter`: index memories for two owners; assert query for owner_a doesn't return owner_b's memories.
**Commit:** `feat: vector search service via sqlite-vec (T92)`.
---
### Task 93: Cross-chat search service
**Files:**
- Create: `chat/services/cross_chat_search.py`
- Create: `tests/test_cross_chat_search.py`
**Spec:** FTS5-based search across ALL chats and all owners (admin-style search; no witness filter). For "where did I last see this person mention X?" queries.
```python
def search_all_memories(
conn,
*,
query: str,
k: int = 20,
) -> list[dict]:
"""Search FTS across all owners and chats. Returns rows with
{memory_id, owner_id, chat_id, text, pov_summary, scene_id,
significance, ts}. Sorted by FTS rank."""
```
This is intentionally NOT witness-filtered — it's a power-user search surface. The UI (T100) prompts the user to acknowledge they're seeing memories across POVs.
**Tests:** 3 minimum.
1. `test_search_all_memories_returns_matches_across_owners`: seed 2 owners with overlapping keyword; search; assert both owner's matches appear.
2. `test_search_all_memories_orders_by_fts_rank`: seed memories with varying FTS-match strength; assert order.
3. `test_search_all_memories_respects_k_limit`.
**Commit:** `feat: cross-chat search service (FTS5 over all owners) (T93)`.
---
## Wave 3 — Branching + delete services (parallel)
Two new service modules. Fully file-disjoint.
### Task 94: branch_from_event service
**Files:**
- Create: `chat/services/branching.py`
- Create: `tests/test_branching.py`
**Spec:**
```python
def branch_from_event(
conn,
*,
name: str,
origin_event_id: int,
chat_id: str | None = None,
) -> int:
"""Create a new named branch forking from origin_event_id.
Emits a branch_created event. Returns the new branch's row id.
Raises ValueError if name already exists."""
def switch_active_branch(conn, *, name: str) -> None:
"""Make the named branch active. Emits branch_switched. Subsequent
event reads should consult is_active to filter."""
def list_branches_with_metadata(conn, chat_id: str | None = None) -> list[dict]:
"""List branches with: name, origin_event_id, head_event_id, is_active,
event_count (number of events between origin and head, inclusive),
created_at."""
```
Tests cover: basic create, duplicate-name raises, switch updates `is_active` exclusively, list returns metadata.
**Commit:** `feat: branching service (T94)`.
---
### Task 95: Delete-impact computation service
**Files:**
- Create: `chat/services/delete_impact.py`
- Create: `tests/test_delete_impact.py`
**Spec:** Computes the cascade impact of deleting a single event_log row (or a turn group: user_turn + assistant_turn + interjection if any). Returns a structured `ImpactReport` for the UI to render.
```python
class DeletedItem(BaseModel):
kind: str # "memory" | "edge_update" | "scene_close" | etc.
description: str # human-readable
target_id: int | str | None
class ImpactReport(BaseModel):
target_event_id: int
cascading: list[DeletedItem]
notes: list[str] # warnings, e.g. "this turn opened scene_X which has 3 subsequent turns"
def compute_delete_impact(conn, *, target_event_id: int) -> ImpactReport:
"""Walk the event log forward from target_event_id and identify
everything that depends on this event: child memory_written events,
edge_update events with this turn as source, scene_closed events
triggered by this turn, etc. Also identify subsequent turns that
REFERENCE this event (regenerated_from chains, etc.).
Does NOT mutate the database. Pure computation for preview."""
```
The actual delete (truncate + supersede) is the existing rewind path from Phase 1 T31. T95 just builds the preview.
**Tests:** 4 minimum.
1. `test_impact_for_simple_turn_lists_memory_and_edges`: seed a chat with a turn that wrote 1 memory + 2 edge_updates. Compute impact. Assert the 3 items appear in `cascading`.
2. `test_impact_for_scene_opening_turn_warns_about_subsequent_turns`: seed a turn that opened a scene + 5 subsequent turns. Assert `notes` mentions the dependency.
3. `test_impact_for_regenerated_turn_lists_supersede_chain`: seed a turn that's been regenerated (has `superseded_by`). Compute impact for the original. Assert the chain appears.
4. `test_impact_does_not_mutate_database`: snapshot event_log before + after; assert byte-identical.
**Commit:** `feat: delete-impact computation service (T95)`.
---
## Wave 4 — Combined retrieval ranking (single)
### Task 96: Combined FTS + vector retrieval ranking
**Files:**
- Modify: `chat/state/memory.py` — extend `search_memories` to optionally include vector hits
- Modify: `tests/test_memory_search.py` — add 4 tests
**Spec:**
`search_memories` currently does FTS5 + Python-side significance/recency re-rank. Phase 4 adds:
- An optional `query_vector: list[float] | None = None` kwarg.
- When `query_vector` is provided, run `vector_search` (T92) for top-K-vector candidates.
- Merge with FTS top-K candidates via reciprocal-rank fusion (RRF) or a simpler sum-of-ranks scheme — implementer's choice. Document the merge formula.
- Final result is top-K from the fused set, with the existing significance + recency boosts applied as a final pass.
When `query_vector` is None: existing behavior unchanged. Phase 1/2/3 callers that don't pass `query_vector` see no change.
**Implementation note:** the embedding for the query (the speaker's recent context) must be generated by the caller (Wave 5 T97 wires the prompt-assembly pipeline to call `generate_embedding` on the dialogue tail). T96 only handles the search side — assumes the vector is pre-computed.
**Tests:** 4 added.
1. `test_search_memories_without_query_vector_uses_fts_only`: regression — call without `query_vector`; assert the existing FTS+rerank behavior.
2. `test_search_memories_with_query_vector_includes_vector_hits`: index 5 memories where 1 is FTS-only-matching, 1 is vector-only-matching, 3 are unrelated. Pass both `query=...` and `query_vector=...`. Assert both the FTS hit and the vector hit appear in results.
3. `test_search_memories_fusion_significance_bias_still_applies`: confirm the existing significance bias rerank still works on top of fused results.
4. `test_search_memories_fusion_handles_empty_vector_results`: pass a vector for a memory that has no embeddings indexed; assert FTS-only results still come back.
**Commit:** `feat: combined FTS + vector retrieval ranking (T96)`.
---
## Wave 5 — Memory write hook + backfill (single)
### Task 97: Embedding generation hook + backfill script
**Files:**
- Modify: `chat/services/memory_write.py` — after each `memory_written` event, enqueue a background embedding job
- Create: `chat/services/embedding_worker.py` — async worker that consumes the queue and emits `embedding_indexed` events
- Create: `scripts/backfill_embeddings.py` — one-time script that walks all existing memories and embeds them
- Modify: `chat/app.py` — wire the embedding worker into the lifespan startup
- Modify: `tests/test_memory_write.py` — add 2 tests for the enqueue hook
- Create: `tests/test_embedding_worker.py` — 3 tests for the worker drain logic
**Spec:**
After each successful `memory_written` event, enqueue an embedding job. The worker dequeues and:
1. Reads the memory text (via `get_memory(conn, memory_id)`).
2. Calls `generate_embedding(client, text=memory.text, model=settings.embedding_model)`.
3. Appends `embedding_indexed` event with the result. (Skip if `result.model == "fallback"` — leave the memory un-indexed; will retry later via backfill.)
The worker pattern mirrors Phase 1's `chat/services/significance.py` SignificanceWorker. Reuse its queue + lifecycle pattern.
**Backfill script:**
```bash
.venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run]
```
Walks all memories where no `embeddings_meta` row exists. For each, generates an embedding and emits `embedding_indexed`. Useful for the initial migration after Phase 4 lands AND for periodic re-runs if an embedding model changes.
**Tests:**
`tests/test_memory_write.py`:
1. `test_record_turn_memory_enqueues_embedding_job`: monkeypatch the worker's enqueue method; record_turn_memory_for_present; assert the worker received a job per memory.
`tests/test_embedding_worker.py`:
1. `test_worker_drains_jobs_and_emits_indexed_events`: enqueue 3 jobs with mock embeddings; run worker; assert 3 `embedding_indexed` events landed.
2. `test_worker_skips_fallback_results`: mock the embedding service to return a fallback result; assert NO `embedding_indexed` event landed for that job.
3. `test_worker_handles_concurrent_jobs_serially`: pin the Featherless 2-conn cap behavior (worker calls embed sequentially under the existing semaphore).
**Commit (split):**
- `feat: embedding worker drains queue and emits embedding_indexed events (T97.1)`
- `feat: memory_write enqueues embedding job after each memory_written (T97.2)`
- `feat: backfill_embeddings script for existing memories (T97.3)`
**Verification gates:**
- All Phase 1/2/3/3.5 memory tests still pass (regression critical).
- New tests pass.
- Manual smoke: run `scripts/backfill_embeddings.py --dry-run` against a seeded DB and verify expected count.
---
## Wave 6 — Drawer Phase 4 bundle (single task)
### Task 98: Drawer Phase 4 features
**Files:**
- Modify: `chat/web/drawer.py` (add many new POST routes and GET extensions)
- Modify: `chat/templates/_drawer.html` (add 5 new sections)
- Create: `tests/test_drawer_phase4.py`
**Spec:** Drawer affordances for 5 Phase 4 features. Single task by hot-file constraint; split into 5 commits internally.
#### 98.1 — Branching UI
GET drawer extension: `list_branches_with_metadata(conn)` → render in a "Branches" section (active branch highlighted + count of events).
POST routes:
- `/drawer/branch/create` — form `{name, origin_event_id}``branch_from_event` service.
- `/drawer/branch/switch` — form `{name}``switch_active_branch`.
- `/drawer/branch/from-turn/{event_id}` — convenience: branch from a specific turn (used by per-turn UI affordance).
#### 98.2 — Significance review panel
GET extension: significance distribution per chat (`SELECT significance, COUNT(*) GROUP BY significance`) → render histogram.
POST route:
- `/drawer/memory/significance/{memory_id}` — form `{new_value}` (already supported via T22 `manual_edit` `target_kind=memory_significance`); just add the UI form.
Bulk re-rate is a Phase 4.5 polish — not in scope here. Just per-memory edit + distribution display.
#### 98.3 — Hide-from-view toggle
POST route:
- `/drawer/turn/hide/{event_id}` — form `{hidden: bool}` → emits a `manual_edit` with `target_kind="turn_hidden"`.
NEW `manual_edit` projector branch for `turn_hidden`: sets `event_log.hidden = ?` for the target event. Reuses the existing `hidden` column.
UI affordance: per-turn checkbox in the chat surface or drawer (per-turn list with hide toggle).
#### 98.4 — Surgical delete with cascade preview
GET extension:
- `/drawer/turn/delete-preview/{event_id}` → returns the `ImpactReport` (T95) rendered as a modal.
POST route:
- `/drawer/turn/delete/{event_id}` — invokes the rewind-and-truncate path (Phase 1 T31's `rewind_to_turn`) restricted to the target turn group.
Important: this reuses the existing pre-rewind snapshot path so the action is undoable.
#### 98.5 — Remaining v1 edits
Audit: are any v1 fields STILL not editable from the drawer? Phase 2.5 T72.1 added edge_trust/edge_summary/memory_pov_summary/edge_knowledge_facts. T72.3 added witness flags. Anything left?
Likely candidates: scene `narrative_anchor`, scene `weather`, container `properties` JSON. Add edit forms for any that surface during the audit. If none, this sub-fix is a no-op.
**Tests:** 8+ in `tests/test_drawer_phase4.py` (one per sub-feature × happy path; plus 1 for the cascade-preview rendering).
**Commits (5):**
- `feat: drawer branching UI (T98.1)`
- `feat: drawer significance review panel (T98.2)`
- `feat: drawer hide-from-view toggle + manual_edit turn_hidden branch (T98.3)`
- `feat: drawer surgical delete with cascade preview (T98.4)`
- `feat: drawer remaining v1 field edits (T98.5)` (or "no-op audit" if nothing left)
---
## Wave 7 — Snapshot + cross-chat search UX (parallel)
### Task 99: Snapshot UX
**Files:**
- Create: `chat/web/snapshots.py` (new route module)
- Create: `chat/templates/snapshots.html` (snapshot list page)
- Modify: `chat/templates/layout.html` (add "Snapshots" nav link)
- Create: `tests/test_snapshot_ux.py`
**Spec:** Surface the existing snapshot infrastructure (Phase 1 T20 wrote snapshots; Phase 4 makes them visible).
GET `/snapshots` — list all snapshots (periodic + pre-rewind) with metadata: kind, created_at, event_log_size, file_size_bytes.
POST `/snapshots/take` — manually trigger a snapshot now.
POST `/snapshots/restore/{snapshot_id}` — restore from snapshot (with hard confirmation).
GET `/snapshots/{snapshot_id}/preview` — show what's in the snapshot vs. current state.
**Tests:** 4 minimum (list, take, restore, preview).
**Commit:** `feat: snapshot UX (manual trigger, list, restore) (T99)`.
---
### Task 100: Cross-chat search UX
**Files:**
- Create: `chat/web/search.py` (new route module)
- Create: `chat/templates/search.html` (search results page)
- Modify: `chat/templates/layout.html` (add top-bar search input)
- Create: `tests/test_search_ux.py`
**Spec:** Top-bar search box submits to `/search?q=...`. Results page shows up to 50 matches across all chats and all owners (uses T93's `search_all_memories`). Each result shows: chat name, owner bot name, scene context, memory text excerpt with FTS highlight, "Open chat at this turn" link.
**Tests:** 3 minimum.
1. Search returns results from multiple chats.
2. Empty query returns empty result set.
3. Result links navigate to the right chat anchor.
**Commit:** `feat: cross-chat search UX (top-bar input + results page) (T100)`.
---
## Wave 8 — Polish (parallel)
### Task 101: Cross-feature integration tests
**Files:**
- Create: `tests/test_phase4_integration.py`
**Spec:** End-to-end multi-feature flows. 5 tests minimum.
1. **Vector retrieval feedback loop**: write a memory → embedding worker indexes it → search retrieves it via vector path.
2. **Branch + diverge**: create branch B from turn 10 → switch to B → play 3 new turns → switch back to main → assert main's turn 11+ are still intact.
3. **Surgical delete**: compute impact for a turn → confirm → assert event log truncated correctly + pre-rewind snapshot saved.
4. **Hide + retrieval**: hide a turn → assert it doesn't appear in `read_recent_dialogue` (existing `hidden = 0` filter) → unhide → assert it reappears.
5. **Cross-chat search**: write memories in 3 chats → search for keyword present in all 3 → assert all 3 appear in results.
**Commit:** `test: phase 4 cross-feature integration coverage (T101)`.
---
### Task 102: Phase 4 documentation update
**Files:**
- Modify: `CLAUDE.md` (add "Phase 4 status" section; update behavioral defaults; add "Phase 4.5 / 5 backlog" with carry-overs)
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 as **Status: shipped 2026-04-27**)
**Spec:**
Mirror the Phase 3 / 3.5 status sections. Document:
- **Vector retrieval**: sqlite-vec virtual table, embedding worker async pipeline, combined FTS + vector ranking via RRF.
- **Branching**: forks the event log; UI in drawer; `is_active` flag plus orchestrator filter (caveat — see backlog if filter not yet wired into all readers).
- **Drawer-edit on every field**: branching, significance review, hide-from-view, surgical delete with preview, plus any audit findings.
- **Backup tooling**: snapshots panel surfaces existing infra.
- **Significance review UI**: distribution + per-memory edit.
- **Surgical delete + cascade preview**: piggybacks on rewind path; impact report from T95.
- **Hide-from-view soft delete**: `manual_edit` `turn_hidden` branch.
- **Cross-chat search**: top-bar + results page over T93's service.
**Phase 4.5 / 5 backlog candidates** (reflect any discovered during execution):
- Branching read-side filter — if T89's `is_active` isn't yet consulted by every event reader, this is the work to do.
- Bulk significance re-rate (per T98.2 deferral).
- Snapshot retention policy UI controls (per Phase 1 T19 deferred).
- Auto-pin override UI (per Phase 2 design).
- Embedding model swap migration tooling (when changing embedding model, need to re-embed everything).
- Vector index optimization (HNSW vs flat — Phase 5 if needed).
- Carry-overs that remained deferred from Phase 3.6: scene-close-on-cancel UX revisit, canned-queue brittleness fixture builder, full lifecycle rollback in regenerate.
**Commit:** `docs: phase 4 status, behavioral defaults, deferred items (T102)`.
---
## Wrap-up
After Wave 8 lands:
1. **Run full suite** on `phase-4`: should be ~390+ tests passing (343 from Phase 3.5 + ~50 new).
2. **Manual smoke** (recommended before opening the PR):
- Run `scripts/backfill_embeddings.py` against a seeded DB to verify vector indexing works.
- Search for a phrase that's substring-distinct but semantically similar to a memory; verify vector path returns it (FTS would miss).
- Create a branch from an old turn; switch; play a few turns; switch back.
- Trigger surgical delete on a turn; verify the impact preview matches what actually gets removed.
- Hide a turn; verify it disappears from the chat surface; unhide.
- Use top-bar search to find a phrase; verify cross-chat results appear.
- Click the "Snapshots" nav link; trigger a manual snapshot; verify it appears.
3. **Push `phase-4`** to gitea.
4. **Open PR** `phase-4 → main`.
---
## Notes for the controller running this plan
- **External dependency**: `sqlite-vec` (or `sqlite-vss`) MUST be added to `pyproject.toml` and installed BEFORE Wave 1 dispatches. The migration in T88 expects the extension to be loadable.
- **Embedding model choice**: pin in T91 spec before dispatch. The 384 dim is hardcoded in T88's migration; if a different dim is used, update T88 first.
- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval and integration tasks (T91, T96, T97, T98, T101) — surface area is larger.
- **Don't dispatch Wave 5 until Wave 4 merged green.** T97 (memory_write enqueue) calls into the embedding-aware worker; the worker uses T91's `generate_embedding`. Both must be merged into `phase-4` first.
- **Don't dispatch Wave 6 until Wave 5 merged green.** T98 (drawer) wires UI affordances over services from earlier waves.
- **Token-spend rough estimate**: Phase 4 should be ~70-80% the size of Phase 3 (similar scope, larger per-task because vector + branching are non-trivial). Per-task spend similar to Phase 3's larger tasks (T59, T64).
- **DO NOT break existing v1/v2/v3/v3.5 surface contracts.** Every test file that was green at the start of Phase 4 must stay green at the end. The cross-feature integration tests from Phase 3 (`tests/test_phase3_integration.py`) are particularly load-bearing.
@@ -0,0 +1,22 @@
{
"planPath": "docs/plans/2026-04-27-v4-phase4-implementation.md",
"tasks": [
{"id": 88, "subject": "T88: embeddings table + projector handlers (sqlite-vec)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 89, "subject": "T89: branches table + projector handlers", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 90, "subject": "T90: phase 3.6 carry-overs (chat-id pushdown + lifecycle wording + legacy fn consolidation)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 91, "subject": "T91: embedding generation service", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]},
{"id": 92, "subject": "T92: vector search service via sqlite-vec", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]},
{"id": 93, "subject": "T93: cross-chat search service (FTS5 over all owners)", "status": "pending", "wave": 2, "parallelGroup": "wave-2"},
{"id": 94, "subject": "T94: branch_from_event service", "status": "pending", "wave": 3, "parallelGroup": "wave-3", "blockedBy": [89]},
{"id": 95, "subject": "T95: delete-impact computation service", "status": "pending", "wave": 3, "parallelGroup": "wave-3"},
{"id": 96, "subject": "T96: combined FTS + vector retrieval ranking in search_memories", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [91, 92]},
{"id": 97, "subject": "T97: memory_write enqueues embedding job + backfill script", "status": "pending", "wave": 5, "parallelGroup": null, "blockedBy": [91, 96]},
{"id": 98, "subject": "T98: drawer Phase 4 bundle (branching + sig review + hide + surgical delete + remaining edits)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [94, 95, 97]},
{"id": 99, "subject": "T99: snapshot UX (manual trigger + list + restore + preview)", "status": "pending", "wave": 7, "parallelGroup": "wave-7"},
{"id": 100, "subject": "T100: cross-chat search UX (top-bar + results page)", "status": "pending", "wave": 7, "parallelGroup": "wave-7", "blockedBy": [93]},
{"id": 101, "subject": "T101: cross-feature integration tests (vector × branching × delete × snapshot × search)", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]},
{"id": 102, "subject": "T102: Phase 4 documentation update", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]}
],
"lastUpdated": "2026-04-27T00:00:00Z",
"notes": "15 tasks across 8 waves. Adds vector retrieval (sqlite-vec), branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view, and cross-chat search. Phase 3.6 carry-overs (3 small fixes) bundled into T90. External dependency: sqlite-vec must be installed BEFORE Wave 1 dispatch. Embedding model choice (default: 384-dim small model) pinned in T91 spec before dispatch — schema 0012 hardcodes 384 dim. Two new schema migrations (0012 embeddings, 0013 branches), final schema version 13. Uses task ids T88-T102."
}