From bffd9a2f384633e272ca2633c0938024f702a248 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 27 Apr 2026 02:03:08 -0400 Subject: [PATCH] docs: add Phase 4 implementation plan (vector retrieval + branching + polish) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 15 tasks across 8 waves landing the Phase 4 deliverables per requirements doc §13 + §14: - Vector retrieval via sqlite-vec (new external dependency) - Branching UI (event log forks) - Drawer-edit on every field (significance review, hide-from-view, surgical delete with cascade preview, branching affordances) - Backup tooling (snapshot UX surface) - Cross-chat search Plus the 3 Phase 3.6 carry-over fixes (T90 bundle). Wave structure: - W1 (parallel 3-way): schema foundation + carry-overs - W2 (parallel 3-way): embedding/search services - W3 (parallel 2-way): branching + delete services - W4 (single): combined retrieval ranking - W5 (single): memory write hook + backfill - W6 (single): drawer Phase 4 bundle (5 sub-features) - W7 (parallel 2-way): snapshot UX + cross-chat search UX - W8 (parallel 2-way): integration tests + docs External dependency: sqlite-vec must be installed BEFORE Wave 1. Embedding model choice (384-dim default) pinned in T91 before dispatch since the migration hardcodes the dimension. Schema baseline: 11 -> 13 (adds 0012_embeddings.sql + 0013_branches.sql). Task ids T88-T102 to avoid collision with prior phases. --- .../2026-04-27-v4-phase4-implementation.md | 832 ++++++++++++++++++ ...-27-v4-phase4-implementation.md.tasks.json | 22 + 2 files changed, 854 insertions(+) create mode 100644 docs/plans/2026-04-27-v4-phase4-implementation.md create mode 100644 docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json diff --git a/docs/plans/2026-04-27-v4-phase4-implementation.md b/docs/plans/2026-04-27-v4-phase4-implementation.md new file mode 100644 index 0000000..fb89991 --- /dev/null +++ b/docs/plans/2026-04-27-v4-phase4-implementation.md @@ -0,0 +1,832 @@ +# Roleplay Engine — Phase 4 Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves. + +**Goal:** Land Phase 4 polish per requirements doc §13 + §14: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view soft delete, plus cross-chat search and the small Phase 3.6 carry-over fixes. + +**Architecture:** Builds on Phase 3.5's stable base. Two new tables (`embeddings`, `branches`) and one external dependency (sqlite-vec extension). Embedding generation runs as a deferred async job — NOT inline with turns — so the play loop stays fast even when the embedding endpoint is slow. Branching is data-model-only at first (events + selectors); UI grafts on top. Surgical delete + cascade preview reuses the existing rewind-and-supersede plumbing. Cross-chat search piggybacks on the existing FTS5 + (now) vector retrieval. + +**Tech Stack:** + +- **NEW dependency: `sqlite-vec`** (or `sqlite-vss` — Phase 4 picks; recommended `sqlite-vec` for simpler load semantics and active maintenance). Add to `pyproject.toml`. +- **Embedding model selection** is part of T91 spec. Recommended default: a small model on Featherless (e.g., `BAAI/bge-small-en-v1.5` if available) or a local CPU-friendly model via `sentence-transformers`. Document choice in CLAUDE.md. +- Same as Phase 3 otherwise (Python 3.11+, FastAPI, HTMX, SQLite). + +**Source-of-truth references:** + +- Phase 4 scope: requirements doc §13 "Phase 4 — polish" + §14 "Open / Deferred Decisions". +- Behavioral details: §6 (prompt assembly + retrieval), §10 (rewind / regenerate / reset), §11 (compression + significance), §12 (snapshots). +- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 3 status" + §"Phase 3.5 status". +- Phase 3.5 cleanup plan (style, file-bundling pattern): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md). + +--- + +## Pre-flight + +**Branch:** create `phase-4` from the latest `main` after Phase 3.5 has merged (it has — main is at `1b66a28`): + +```bash +git checkout main && git pull && git checkout -b phase-4 +``` + +**Schema baseline:** Phase 3.5 leaves the DB at version 11. Phase 4 adds two migrations: `0012_embeddings.sql` and `0013_branches.sql`. Final schema version: 13. + +**External dependency setup (BEFORE T88 dispatch):** + +The controlling agent should add `sqlite-vec` to `pyproject.toml` and run `pip install -e .` (or equivalent) so all worktrees pick up the new dependency. Confirm `sqlite_vec` imports cleanly: + +```bash +python -c "import sqlite_vec; print(sqlite_vec.__version__)" +``` + +If `sqlite_vec` isn't on PyPI when this plan executes, fall back to `sqlite-vss` and adapt T88/T92 accordingly. Both expose vector-search SQL via a loadable extension. + +**Pinned non-negotiables (carried forward):** + +- State changes go through the event log. Use `append_and_apply(conn, kind, payload)` for the live path; `apply_event` only after a fresh `append_event` returning the new id. +- Witness filter every memory read at SQL level (hard `WHERE` constraint; never a soft signal). +- Per-POV scene summaries — never write omniscient narration. +- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor). +- One commit per task minimum. Tasks that bundle multiple sub-features SHOULD split commits internally. + +**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. Don't assume green. + +--- + +## Phase 3.6 carry-overs folded in + +Three small items from Phase 3.6 backlog are bundled into Phase 4's Wave 1 trivial-fixes task (T90): + +1. `read_recent_dialogue` chat-id pushdown into SQL (T80 review nit) +2. Lifecycle warning wording in regenerate (T83.4 — "at-or-after turn X" tightening) +3. Legacy single-bot `record_turn_memory` consolidation (T84 review nit) + +Three items remain DEFERRED beyond Phase 4 (Phase 4.5 if needed): + +- Scene-close-on-cancel UX revisit (no action unless real play surfaces a regression). +- Cross-feature canned-queue brittleness (structured fixture builder for tests — not blocking). +- Full lifecycle-rollback in regenerate (warning log already shipped in T83.4; proper rollback needs schema-level back-references, deferred indefinitely). + +--- + +## Parallel-Execution Strategy + +Same pattern as Phase 3.5. Eight waves: parallel within each wave (file-disjoint), serial across waves. + +### How to dispatch a wave in parallel + +Use the **Agent tool with `isolation: "worktree"`** so each subagent gets its own git worktree. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/- -b / phase-4` from inside the chat repo.) + +Dispatch all tasks in a wave in a single message: + +``` +Agent({ description: "Wave 1 — T88 embeddings table", prompt: "...", isolation: "worktree" }) +Agent({ description: "Wave 1 — T89 branches table", ... }) +Agent({ description: "Wave 1 — T90 phase 3.6 carry-overs", ... }) +``` + +### After a wave completes + +1. Each subagent returns its worktree path and commit SHA(s). +2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval tasks (T91, T92, T96, T97) since the integration surface is wider. +3. **Merge the wave into `phase-4`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`. +4. **Run the full test suite** on the merged `phase-4`. If red, the wave's mutual-independence assumption was violated — bisect, fix, re-merge. +5. **Push `phase-4`** to gitea. +6. Optionally clean up worktrees. + +### Conflict prevention checklist + +For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan: `chat/web/drawer.py` + `chat/templates/_drawer.html` (T98 only — bundled), `chat/state/memory.py` (T96 only), `chat/services/memory_write.py` (T90 + T97 — sequential), `chat/web/turns.py` (T98 only via delete affordance — sequential after T96). + +### Why each wave is parallel-safe + +| Wave | Tasks | Hot files touched | Disjoint? | +|------|-------|-------------------|-----------| +| 1 | T88, T89, T90 | new migrations + new state modules; T90 touches `turn_common.py` + `regenerate.py` + `memory_write.py` (additive only) | ✅ | +| 2 | T91, T92, T93 | new service modules (embeddings, vector_search, cross_chat_search) | ✅ | +| 3 | T94, T95 | new service modules (branching, delete_impact) | ✅ | +| 4 | T96 | `chat/state/memory.py` (combined retrieval ranking) | (single task) | +| 5 | T97 | `chat/services/memory_write.py` + new backfill script | (single task) | +| 6 | T98 | `chat/web/drawer.py` + `chat/templates/_drawer.html` (drawer Phase 4 bundle) | (single task) | +| 7 | T99, T100 | new files: `chat/web/snapshots.py` + `chat/templates/snapshots.html` (T99); `chat/web/search.py` + `chat/templates/search.html` + small chat.html top-bar addition (T100) | ✅ (disjoint) | +| 8 | T101, T102 | new test file (T101); CLAUDE.md + design doc (T102) | ✅ | + +--- + +## Task overview + +``` +Wave 1 ─┬─ T88: embeddings table + projector handlers + ├─ T89: branches table + projector handlers + └─ T90: Phase 3.6 carry-overs trio (chat-id SQL pushdown + lifecycle wording + legacy-fn consolidation) + +Wave 2 ─┬─ T91: embedding generation service (Featherless or local) + ├─ T92: vector search service via sqlite-vec + └─ T93: cross-chat search service (FTS over all owners) + +Wave 3 ─┬─ T94: branch_from_event service (event-log fork, branch metadata) + └─ T95: delete-impact computation service (cascade preview) + +Wave 4 ─── T96: combined FTS + vector retrieval ranking in search_memories + +Wave 5 ─── T97: memory_write enqueues embedding job + backfill script for existing memories + +Wave 6 ─── T98: drawer Phase 4 bundle — branching UI + significance review + hide-from-view + surgical delete + remaining v1 edits + +Wave 7 ─┬─ T99: snapshot UX (manual trigger, retention display, restore-from-snapshot UI) + └─ T100: cross-chat search UX (top-bar input + search results page) + +Wave 8 ─┬─ T101: cross-feature integration tests (vector × branching × delete × snapshot × search) + └─ T102: Phase 4 documentation update +``` + +Critical path: 8 sequential merge points. Total tasks: 15. Parallelism: Waves 1, 2, 3, 7, 8 dispatch concurrently (3-way and 2-way). Waves 4, 5, 6 are single-task by hot-file constraint. + +--- + +## Wave 1 — Schema foundation + Phase 3.6 carry-overs (parallel) + +### Task 88: Embeddings table + projector handlers + +**Files:** + +- Create: `chat/db/migrations/0012_embeddings.sql` +- Create: `chat/state/embeddings.py` +- Create: `tests/test_embeddings_state.py` +- Modify: `pyproject.toml` (add `sqlite-vec` dependency — controlling agent should pre-install before dispatch; the worktree commits the dependency declaration) + +**Spec:** + +Adds the `embeddings` table that stores per-memory embedding vectors for vector retrieval. Uses `sqlite-vec` virtual-table syntax for cosine-similarity search. Schema: + +```sql +-- Load sqlite-vec extension at connection time (handled in chat/db/connection.py). +-- Embeddings are stored as blobs in a vec0 virtual table for fast similarity search. + +CREATE VIRTUAL TABLE embeddings USING vec0( + memory_id INTEGER PRIMARY KEY, + embedding FLOAT[384] -- 384-dim default; adjust per chosen model +); + +-- Sidecar table for non-vector metadata (model used, dim, indexed_at). +CREATE TABLE embeddings_meta ( + memory_id INTEGER PRIMARY KEY, + model TEXT NOT NULL, + dim INTEGER NOT NULL, + indexed_at TEXT NOT NULL DEFAULT (datetime('now')), + FOREIGN KEY (memory_id) REFERENCES memories(id) +); +``` + +(If `sqlite-vss` is chosen instead, replace `vec0` with `vss0` and adapt the dim declaration. Both have similar Python loading semantics.) + +**`chat/state/embeddings.py`:** + +- `@on("embedding_indexed")` payload `{memory_id, model, dim, vector: list[float]}`. Inserts into both `embeddings` and `embeddings_meta`. Idempotent via `INSERT OR REPLACE` (re-indexing a memory replaces the prior vector). +- `@on("embedding_deindexed")` payload `{memory_id}`. Deletes from both tables. Used when a memory is purged via reset/cascade. +- Reader `get_embedding_meta(conn, memory_id) -> dict | None` returns the meta row. + +The `chat/db/connection.py` `open_db` helper needs to load the sqlite-vec extension on each connection. Add: + +```python +import sqlite_vec +# Inside open_db, after connection is opened: +conn.enable_load_extension(True) +sqlite_vec.load(conn) +conn.enable_load_extension(False) +``` + +This is a small modification to `connection.py`. Include it in T88's diff. + +**Tests:** 3 minimum. + +1. `test_embedding_indexed_inserts_row`: append `bot_authored`, `chat_created`, `memory_written` (creates a memory), then `embedding_indexed` with `vector=[0.1] * 384`. Project. Assert `embeddings_meta` row exists for that memory_id with the right model. +2. `test_embedding_deindexed_removes_row`: same setup; index then de-index; assert row is gone. +3. `test_vector_similarity_search_returns_nearest`: index two memories with distinct vectors; query for nearest neighbor of one vector; assert correct memory_id returned. Uses `sqlite-vec`'s `MATCH '...'` syntax (verify against actual sqlite-vec docs; adapt if needed). + +If running tests requires sqlite-vec to be loaded, the test fixture may need to skip / xfail when the extension isn't installed. Use `pytest.importorskip("sqlite_vec")` at the top of the test file. + +**Commit:** `feat: embeddings table + projector handlers via sqlite-vec (T88)`. + +**Notes:** + +- Schema version after migration alone: 12. T89 adds 0013, taking final to 13. The schema_version assertion in `tests/test_world.py` updates to 13 in the wave-merge step. +- The `connection.py` change is small but cross-cutting — affects every `open_db` call. Verify the existing 343 tests still pass after the change. + +--- + +### Task 89: Branches table + projector handlers + +**Files:** + +- Create: `chat/db/migrations/0013_branches.sql` +- Create: `chat/state/branches.py` +- Create: `tests/test_branches_state.py` + +**Spec:** + +Adds the `branches` table that records named alternate event-log forks. A branch is metadata: a name, an `origin_event_id` (the event we forked from), and a `head_event_id` (the latest event in this branch). The event log itself is unchanged — the branch table just **labels** linear ranges of event ids. + +```sql +CREATE TABLE branches ( + id INTEGER PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + origin_event_id INTEGER NOT NULL, + head_event_id INTEGER NOT NULL, + chat_id TEXT, + created_at TEXT NOT NULL DEFAULT (datetime('now')), + is_active INTEGER NOT NULL DEFAULT 0 +); + +-- Exactly one row may have is_active = 1 at any time. +CREATE UNIQUE INDEX branches_active_idx ON branches(is_active) WHERE is_active = 1; +``` + +The "main" branch is implicit and bootstrapped by the migration: `INSERT INTO branches (name, origin_event_id, head_event_id, is_active) VALUES ('main', 0, 0, 1);`. Subsequent branches reference an `origin_event_id` (the event that the branch forked from). + +`chat/state/branches.py`: + +- `@on("branch_created")` payload `{name, origin_event_id, chat_id?, head_event_id}`. Inserts a new row with `is_active=0`. Idempotent re-insertion via `INSERT OR IGNORE`. +- `@on("branch_switched")` payload `{name}`. Sets `is_active=1` on the named branch and `is_active=0` on all others. Atomic via a single UPDATE. +- `@on("branch_head_updated")` payload `{name, head_event_id}`. Updates `head_event_id` on the named branch. Used by the orchestrator when new events extend the branch. +- Readers: `get_branch(conn, name)`, `list_branches(conn, chat_id=None)`, `active_branch(conn)`. + +**Tests:** 3 minimum. + +1. `test_branch_created_inserts_row`: append `branch_created` with name="experiment", origin_event_id=42; project; assert `get_branch(conn, "experiment")` returns the row. +2. `test_branch_switched_atomic`: seed two branches; switch from one to the other; assert exactly one is active. +3. `test_main_branch_bootstrapped_by_migration`: open a fresh DB, apply migrations; assert `active_branch(conn)["name"] == "main"`. + +**Commit:** `feat: branches table + projector handlers (T89)`. + +**Notes:** + +- Schema version after this migration alone: 13. Combined with T88: 13 (since T88 was 12, T89 stacks). Wave-merge bumps `tests/test_world.py` schema_version assertion to 13. +- This task does NOT yet teach the orchestrator to consult `is_active` — the existing event_log queries assume a single timeline. T98 (drawer branching UI) will enable user-driven switches, but the actual "follow only the active branch" filter on event reads is a follow-up (Phase 4.5 nit; document in T102 docs sweep). + +--- + +### Task 90: Phase 3.6 carry-overs trio + +**Files:** + +- Modify: `chat/services/turn_common.py` (push chat_id filter into SQL) +- Modify: `chat/services/regenerate.py` (lifecycle warning wording tightening) +- Modify: `chat/services/memory_write.py` (consolidate legacy `record_turn_memory` into the unified API or delete it) +- Modify: `tests/test_turn_common.py`, `tests/test_regenerate.py`, `tests/test_memory_write.py` + +**Spec:** Three small Phase 3.6 carry-over fixes bundled because each is 1-line + 1-test. + +#### 90.1 — `read_recent_dialogue` chat-id SQL pushdown + +Per T80 review nit. Currently `read_recent_dialogue` filters chat_id post-fetch in Python. Push into SQL for tighter LIMIT semantics: + +```sql +SELECT id, kind, payload_json +FROM event_log +WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') + AND superseded_by IS NULL + AND hidden = 0 + AND json_extract(payload_json, '$.chat_id') = ? +ORDER BY id DESC +LIMIT ? +``` + +Then the post-fetch loop becomes a simple reverse + slice — no chat_id check needed. + +**Test added:** `test_read_recent_dialogue_limit_respects_chat_scope` — seed two chats with 60 turns each; query chat_a with `limit=50`; assert returned rows are exactly 50 chat_a rows (not 50 cross-chat rows that filter down to <50 after Python). + +**Commit:** `perf: read_recent_dialogue pushes chat-id filter into SQL (T90.1)`. + +#### 90.2 — Lifecycle warning wording tightening + +Per T83.4 review nit. Current warning lists "lifecycle transitions from superseded turn are NOT being rolled back". When user regenerates an OLDER turn (T29 supports this), the warning lists intervening-turn transitions that legitimately stand. Tighten wording to "lifecycle transitions at-or-after turn X" so operators reading logs aren't misled. + +Change is one log message string. Test asserts the new wording appears. + +**Commit:** `chore: clarify regenerate lifecycle warning wording (T90.2)`. + +#### 90.3 — Legacy `record_turn_memory` consolidation + +Per T84 review nit. The original Phase 1 single-bot `record_turn_memory` function still exists alongside the unified `record_turn_memory_for_present`. Either: + +- (a) Remove the legacy function entirely; update any remaining callers to use the unified API. +- (b) Convert it to a thin wrapper for backward compat. + +Pick (a) if there are zero remaining callers; (b) if any callers exist. Read the codebase to confirm. The mock-data seed scripts may still use the legacy fn. + +**Commit:** `refactor: consolidate legacy record_turn_memory into unified API (T90.3)`. + +**TDD process for T90:** + +1. Read all 3 affected files + their tests. +2. Implement 90.1 with test; commit. +3. Implement 90.2 with test; commit. +4. Implement 90.3 with test; commit. +5. Run full suite — should be 343 + 3 = 346 (or +2 if 90.3 had no behavioral change). + +--- + +## Wave 2 — Embedding & search services (parallel) + +Three new service modules. Fully file-disjoint. + +### Task 91: Embedding generation service + +**Files:** + +- Create: `chat/services/embeddings.py` +- Create: `tests/test_embeddings.py` + +**Spec:** Wraps the embedding API call. Signature: + +```python +class EmbeddingResult(BaseModel): + vector: list[float] + model: str + dim: int + +async def generate_embedding( + client: LLMClient, # or a separate embedding-specific client + *, + text: str, + model: str, + timeout_s: float = 30.0, +) -> EmbeddingResult: + """Generate an embedding vector for the given text. Falls back to a + zero-vector with model='fallback' on failure (so callers get a deterministic + sentinel they can detect and skip indexing).""" +``` + +**Implementation:** call the embedding endpoint (Featherless OpenAI-compatible `/v1/embeddings`, or a local `sentence-transformers` model). Add a new method `client.embed(text, model)` to `LLMClient` Protocol (and to `MockLLMClient` and `FeatherlessClient`). + +**Embedding model choice:** + +Default to a small CPU-friendly model accessible through the existing Featherless setup: + +- If Featherless has `BAAI/bge-small-en-v1.5` or similar 384-dim model: use that. +- If not: fall back to local `sentence-transformers/all-MiniLM-L6-v2` (384-dim, runs CPU). Add `sentence-transformers` to `pyproject.toml`. +- Document choice in CLAUDE.md (T102 docs sweep). + +The 384 dim is hardcoded in T88's migration. If a different model with different dim is chosen, update T88's schema accordingly BEFORE T88 dispatches. + +**Tests:** 3 minimum. + +1. `test_generate_embedding_returns_vector_of_correct_dim`: mock embedding response with a 384-element vector; assert returned `vector` length is 384. +2. `test_generate_embedding_returns_correct_model_metadata`: assert `result.model` matches the input. +3. `test_generate_embedding_falls_back_on_failure`: mock the client to raise; assert the result is a 384-element zero vector with `model="fallback"`. + +**Commit:** `feat: embedding generation service (T91)`. + +--- + +### Task 92: Vector search service via sqlite-vec + +**Files:** + +- Create: `chat/services/vector_search.py` +- Create: `tests/test_vector_search.py` + +**Spec:** Wraps sqlite-vec's `MATCH` syntax for cosine-similarity search over the `embeddings` virtual table. Witness-filter aware (joins through `memories` table for the witness check). + +```python +def vector_search( + conn, + *, + owner_id: str, + witness_role: str, # "you" | "host" | "guest" + query_vector: list[float], + k: int = 4, +) -> list[dict]: + """Return top-K memories by cosine similarity to query_vector, + witness-filtered for the requesting bot's POV. Returns same row + shape as state.memory.search_memories for combined-ranking + compatibility.""" +``` + +SQL pattern (sqlite-vec): + +```sql +SELECT m.id, m.text, m.pov_summary, m.significance, e.distance +FROM embeddings e +JOIN memories m ON m.id = e.memory_id +WHERE e.embedding MATCH ? + AND k = ? + AND m.owner_id = ? + AND m.witness_ = 1 +ORDER BY e.distance ASC +LIMIT ? +``` + +(Adapt to actual sqlite-vec syntax — use `vec0` MATCH semantics. The `witness_` interpolation needs the same allowlist guard pattern as Phase 2.5 T72.3.) + +**Tests:** 3 minimum. + +1. `test_vector_search_returns_nearest_neighbors`: index 5 memories with synthetic vectors; query for nearest 3; assert correct order. +2. `test_vector_search_respects_witness_filter`: index a memory with witness `[1, 1, 0]`; query with `witness_role="guest"`; assert empty result. +3. `test_vector_search_respects_owner_filter`: index memories for two owners; assert query for owner_a doesn't return owner_b's memories. + +**Commit:** `feat: vector search service via sqlite-vec (T92)`. + +--- + +### Task 93: Cross-chat search service + +**Files:** + +- Create: `chat/services/cross_chat_search.py` +- Create: `tests/test_cross_chat_search.py` + +**Spec:** FTS5-based search across ALL chats and all owners (admin-style search; no witness filter). For "where did I last see this person mention X?" queries. + +```python +def search_all_memories( + conn, + *, + query: str, + k: int = 20, +) -> list[dict]: + """Search FTS across all owners and chats. Returns rows with + {memory_id, owner_id, chat_id, text, pov_summary, scene_id, + significance, ts}. Sorted by FTS rank.""" +``` + +This is intentionally NOT witness-filtered — it's a power-user search surface. The UI (T100) prompts the user to acknowledge they're seeing memories across POVs. + +**Tests:** 3 minimum. + +1. `test_search_all_memories_returns_matches_across_owners`: seed 2 owners with overlapping keyword; search; assert both owner's matches appear. +2. `test_search_all_memories_orders_by_fts_rank`: seed memories with varying FTS-match strength; assert order. +3. `test_search_all_memories_respects_k_limit`. + +**Commit:** `feat: cross-chat search service (FTS5 over all owners) (T93)`. + +--- + +## Wave 3 — Branching + delete services (parallel) + +Two new service modules. Fully file-disjoint. + +### Task 94: branch_from_event service + +**Files:** + +- Create: `chat/services/branching.py` +- Create: `tests/test_branching.py` + +**Spec:** + +```python +def branch_from_event( + conn, + *, + name: str, + origin_event_id: int, + chat_id: str | None = None, +) -> int: + """Create a new named branch forking from origin_event_id. + Emits a branch_created event. Returns the new branch's row id. + Raises ValueError if name already exists.""" + +def switch_active_branch(conn, *, name: str) -> None: + """Make the named branch active. Emits branch_switched. Subsequent + event reads should consult is_active to filter.""" + +def list_branches_with_metadata(conn, chat_id: str | None = None) -> list[dict]: + """List branches with: name, origin_event_id, head_event_id, is_active, + event_count (number of events between origin and head, inclusive), + created_at.""" +``` + +Tests cover: basic create, duplicate-name raises, switch updates `is_active` exclusively, list returns metadata. + +**Commit:** `feat: branching service (T94)`. + +--- + +### Task 95: Delete-impact computation service + +**Files:** + +- Create: `chat/services/delete_impact.py` +- Create: `tests/test_delete_impact.py` + +**Spec:** Computes the cascade impact of deleting a single event_log row (or a turn group: user_turn + assistant_turn + interjection if any). Returns a structured `ImpactReport` for the UI to render. + +```python +class DeletedItem(BaseModel): + kind: str # "memory" | "edge_update" | "scene_close" | etc. + description: str # human-readable + target_id: int | str | None + +class ImpactReport(BaseModel): + target_event_id: int + cascading: list[DeletedItem] + notes: list[str] # warnings, e.g. "this turn opened scene_X which has 3 subsequent turns" + +def compute_delete_impact(conn, *, target_event_id: int) -> ImpactReport: + """Walk the event log forward from target_event_id and identify + everything that depends on this event: child memory_written events, + edge_update events with this turn as source, scene_closed events + triggered by this turn, etc. Also identify subsequent turns that + REFERENCE this event (regenerated_from chains, etc.). + + Does NOT mutate the database. Pure computation for preview.""" +``` + +The actual delete (truncate + supersede) is the existing rewind path from Phase 1 T31. T95 just builds the preview. + +**Tests:** 4 minimum. + +1. `test_impact_for_simple_turn_lists_memory_and_edges`: seed a chat with a turn that wrote 1 memory + 2 edge_updates. Compute impact. Assert the 3 items appear in `cascading`. +2. `test_impact_for_scene_opening_turn_warns_about_subsequent_turns`: seed a turn that opened a scene + 5 subsequent turns. Assert `notes` mentions the dependency. +3. `test_impact_for_regenerated_turn_lists_supersede_chain`: seed a turn that's been regenerated (has `superseded_by`). Compute impact for the original. Assert the chain appears. +4. `test_impact_does_not_mutate_database`: snapshot event_log before + after; assert byte-identical. + +**Commit:** `feat: delete-impact computation service (T95)`. + +--- + +## Wave 4 — Combined retrieval ranking (single) + +### Task 96: Combined FTS + vector retrieval ranking + +**Files:** + +- Modify: `chat/state/memory.py` — extend `search_memories` to optionally include vector hits +- Modify: `tests/test_memory_search.py` — add 4 tests + +**Spec:** + +`search_memories` currently does FTS5 + Python-side significance/recency re-rank. Phase 4 adds: + +- An optional `query_vector: list[float] | None = None` kwarg. +- When `query_vector` is provided, run `vector_search` (T92) for top-K-vector candidates. +- Merge with FTS top-K candidates via reciprocal-rank fusion (RRF) or a simpler sum-of-ranks scheme — implementer's choice. Document the merge formula. +- Final result is top-K from the fused set, with the existing significance + recency boosts applied as a final pass. + +When `query_vector` is None: existing behavior unchanged. Phase 1/2/3 callers that don't pass `query_vector` see no change. + +**Implementation note:** the embedding for the query (the speaker's recent context) must be generated by the caller (Wave 5 T97 wires the prompt-assembly pipeline to call `generate_embedding` on the dialogue tail). T96 only handles the search side — assumes the vector is pre-computed. + +**Tests:** 4 added. + +1. `test_search_memories_without_query_vector_uses_fts_only`: regression — call without `query_vector`; assert the existing FTS+rerank behavior. +2. `test_search_memories_with_query_vector_includes_vector_hits`: index 5 memories where 1 is FTS-only-matching, 1 is vector-only-matching, 3 are unrelated. Pass both `query=...` and `query_vector=...`. Assert both the FTS hit and the vector hit appear in results. +3. `test_search_memories_fusion_significance_bias_still_applies`: confirm the existing significance bias rerank still works on top of fused results. +4. `test_search_memories_fusion_handles_empty_vector_results`: pass a vector for a memory that has no embeddings indexed; assert FTS-only results still come back. + +**Commit:** `feat: combined FTS + vector retrieval ranking (T96)`. + +--- + +## Wave 5 — Memory write hook + backfill (single) + +### Task 97: Embedding generation hook + backfill script + +**Files:** + +- Modify: `chat/services/memory_write.py` — after each `memory_written` event, enqueue a background embedding job +- Create: `chat/services/embedding_worker.py` — async worker that consumes the queue and emits `embedding_indexed` events +- Create: `scripts/backfill_embeddings.py` — one-time script that walks all existing memories and embeds them +- Modify: `chat/app.py` — wire the embedding worker into the lifespan startup +- Modify: `tests/test_memory_write.py` — add 2 tests for the enqueue hook +- Create: `tests/test_embedding_worker.py` — 3 tests for the worker drain logic + +**Spec:** + +After each successful `memory_written` event, enqueue an embedding job. The worker dequeues and: + +1. Reads the memory text (via `get_memory(conn, memory_id)`). +2. Calls `generate_embedding(client, text=memory.text, model=settings.embedding_model)`. +3. Appends `embedding_indexed` event with the result. (Skip if `result.model == "fallback"` — leave the memory un-indexed; will retry later via backfill.) + +The worker pattern mirrors Phase 1's `chat/services/significance.py` SignificanceWorker. Reuse its queue + lifecycle pattern. + +**Backfill script:** + +```bash +.venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run] +``` + +Walks all memories where no `embeddings_meta` row exists. For each, generates an embedding and emits `embedding_indexed`. Useful for the initial migration after Phase 4 lands AND for periodic re-runs if an embedding model changes. + +**Tests:** + +`tests/test_memory_write.py`: +1. `test_record_turn_memory_enqueues_embedding_job`: monkeypatch the worker's enqueue method; record_turn_memory_for_present; assert the worker received a job per memory. + +`tests/test_embedding_worker.py`: +1. `test_worker_drains_jobs_and_emits_indexed_events`: enqueue 3 jobs with mock embeddings; run worker; assert 3 `embedding_indexed` events landed. +2. `test_worker_skips_fallback_results`: mock the embedding service to return a fallback result; assert NO `embedding_indexed` event landed for that job. +3. `test_worker_handles_concurrent_jobs_serially`: pin the Featherless 2-conn cap behavior (worker calls embed sequentially under the existing semaphore). + +**Commit (split):** + +- `feat: embedding worker drains queue and emits embedding_indexed events (T97.1)` +- `feat: memory_write enqueues embedding job after each memory_written (T97.2)` +- `feat: backfill_embeddings script for existing memories (T97.3)` + +**Verification gates:** + +- All Phase 1/2/3/3.5 memory tests still pass (regression critical). +- New tests pass. +- Manual smoke: run `scripts/backfill_embeddings.py --dry-run` against a seeded DB and verify expected count. + +--- + +## Wave 6 — Drawer Phase 4 bundle (single task) + +### Task 98: Drawer Phase 4 features + +**Files:** + +- Modify: `chat/web/drawer.py` (add many new POST routes and GET extensions) +- Modify: `chat/templates/_drawer.html` (add 5 new sections) +- Create: `tests/test_drawer_phase4.py` + +**Spec:** Drawer affordances for 5 Phase 4 features. Single task by hot-file constraint; split into 5 commits internally. + +#### 98.1 — Branching UI + +GET drawer extension: `list_branches_with_metadata(conn)` → render in a "Branches" section (active branch highlighted + count of events). + +POST routes: +- `/drawer/branch/create` — form `{name, origin_event_id}` → `branch_from_event` service. +- `/drawer/branch/switch` — form `{name}` → `switch_active_branch`. +- `/drawer/branch/from-turn/{event_id}` — convenience: branch from a specific turn (used by per-turn UI affordance). + +#### 98.2 — Significance review panel + +GET extension: significance distribution per chat (`SELECT significance, COUNT(*) GROUP BY significance`) → render histogram. + +POST route: +- `/drawer/memory/significance/{memory_id}` — form `{new_value}` (already supported via T22 `manual_edit` `target_kind=memory_significance`); just add the UI form. + +Bulk re-rate is a Phase 4.5 polish — not in scope here. Just per-memory edit + distribution display. + +#### 98.3 — Hide-from-view toggle + +POST route: +- `/drawer/turn/hide/{event_id}` — form `{hidden: bool}` → emits a `manual_edit` with `target_kind="turn_hidden"`. + +NEW `manual_edit` projector branch for `turn_hidden`: sets `event_log.hidden = ?` for the target event. Reuses the existing `hidden` column. + +UI affordance: per-turn checkbox in the chat surface or drawer (per-turn list with hide toggle). + +#### 98.4 — Surgical delete with cascade preview + +GET extension: +- `/drawer/turn/delete-preview/{event_id}` → returns the `ImpactReport` (T95) rendered as a modal. + +POST route: +- `/drawer/turn/delete/{event_id}` — invokes the rewind-and-truncate path (Phase 1 T31's `rewind_to_turn`) restricted to the target turn group. + +Important: this reuses the existing pre-rewind snapshot path so the action is undoable. + +#### 98.5 — Remaining v1 edits + +Audit: are any v1 fields STILL not editable from the drawer? Phase 2.5 T72.1 added edge_trust/edge_summary/memory_pov_summary/edge_knowledge_facts. T72.3 added witness flags. Anything left? + +Likely candidates: scene `narrative_anchor`, scene `weather`, container `properties` JSON. Add edit forms for any that surface during the audit. If none, this sub-fix is a no-op. + +**Tests:** 8+ in `tests/test_drawer_phase4.py` (one per sub-feature × happy path; plus 1 for the cascade-preview rendering). + +**Commits (5):** + +- `feat: drawer branching UI (T98.1)` +- `feat: drawer significance review panel (T98.2)` +- `feat: drawer hide-from-view toggle + manual_edit turn_hidden branch (T98.3)` +- `feat: drawer surgical delete with cascade preview (T98.4)` +- `feat: drawer remaining v1 field edits (T98.5)` (or "no-op audit" if nothing left) + +--- + +## Wave 7 — Snapshot + cross-chat search UX (parallel) + +### Task 99: Snapshot UX + +**Files:** + +- Create: `chat/web/snapshots.py` (new route module) +- Create: `chat/templates/snapshots.html` (snapshot list page) +- Modify: `chat/templates/layout.html` (add "Snapshots" nav link) +- Create: `tests/test_snapshot_ux.py` + +**Spec:** Surface the existing snapshot infrastructure (Phase 1 T20 wrote snapshots; Phase 4 makes them visible). + +GET `/snapshots` — list all snapshots (periodic + pre-rewind) with metadata: kind, created_at, event_log_size, file_size_bytes. + +POST `/snapshots/take` — manually trigger a snapshot now. + +POST `/snapshots/restore/{snapshot_id}` — restore from snapshot (with hard confirmation). + +GET `/snapshots/{snapshot_id}/preview` — show what's in the snapshot vs. current state. + +**Tests:** 4 minimum (list, take, restore, preview). + +**Commit:** `feat: snapshot UX (manual trigger, list, restore) (T99)`. + +--- + +### Task 100: Cross-chat search UX + +**Files:** + +- Create: `chat/web/search.py` (new route module) +- Create: `chat/templates/search.html` (search results page) +- Modify: `chat/templates/layout.html` (add top-bar search input) +- Create: `tests/test_search_ux.py` + +**Spec:** Top-bar search box submits to `/search?q=...`. Results page shows up to 50 matches across all chats and all owners (uses T93's `search_all_memories`). Each result shows: chat name, owner bot name, scene context, memory text excerpt with FTS highlight, "Open chat at this turn" link. + +**Tests:** 3 minimum. +1. Search returns results from multiple chats. +2. Empty query returns empty result set. +3. Result links navigate to the right chat anchor. + +**Commit:** `feat: cross-chat search UX (top-bar input + results page) (T100)`. + +--- + +## Wave 8 — Polish (parallel) + +### Task 101: Cross-feature integration tests + +**Files:** + +- Create: `tests/test_phase4_integration.py` + +**Spec:** End-to-end multi-feature flows. 5 tests minimum. + +1. **Vector retrieval feedback loop**: write a memory → embedding worker indexes it → search retrieves it via vector path. +2. **Branch + diverge**: create branch B from turn 10 → switch to B → play 3 new turns → switch back to main → assert main's turn 11+ are still intact. +3. **Surgical delete**: compute impact for a turn → confirm → assert event log truncated correctly + pre-rewind snapshot saved. +4. **Hide + retrieval**: hide a turn → assert it doesn't appear in `read_recent_dialogue` (existing `hidden = 0` filter) → unhide → assert it reappears. +5. **Cross-chat search**: write memories in 3 chats → search for keyword present in all 3 → assert all 3 appear in results. + +**Commit:** `test: phase 4 cross-feature integration coverage (T101)`. + +--- + +### Task 102: Phase 4 documentation update + +**Files:** + +- Modify: `CLAUDE.md` (add "Phase 4 status" section; update behavioral defaults; add "Phase 4.5 / 5 backlog" with carry-overs) +- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 as **Status: shipped 2026-04-27**) + +**Spec:** + +Mirror the Phase 3 / 3.5 status sections. Document: + +- **Vector retrieval**: sqlite-vec virtual table, embedding worker async pipeline, combined FTS + vector ranking via RRF. +- **Branching**: forks the event log; UI in drawer; `is_active` flag plus orchestrator filter (caveat — see backlog if filter not yet wired into all readers). +- **Drawer-edit on every field**: branching, significance review, hide-from-view, surgical delete with preview, plus any audit findings. +- **Backup tooling**: snapshots panel surfaces existing infra. +- **Significance review UI**: distribution + per-memory edit. +- **Surgical delete + cascade preview**: piggybacks on rewind path; impact report from T95. +- **Hide-from-view soft delete**: `manual_edit` `turn_hidden` branch. +- **Cross-chat search**: top-bar + results page over T93's service. + +**Phase 4.5 / 5 backlog candidates** (reflect any discovered during execution): + +- Branching read-side filter — if T89's `is_active` isn't yet consulted by every event reader, this is the work to do. +- Bulk significance re-rate (per T98.2 deferral). +- Snapshot retention policy UI controls (per Phase 1 T19 deferred). +- Auto-pin override UI (per Phase 2 design). +- Embedding model swap migration tooling (when changing embedding model, need to re-embed everything). +- Vector index optimization (HNSW vs flat — Phase 5 if needed). +- Carry-overs that remained deferred from Phase 3.6: scene-close-on-cancel UX revisit, canned-queue brittleness fixture builder, full lifecycle rollback in regenerate. + +**Commit:** `docs: phase 4 status, behavioral defaults, deferred items (T102)`. + +--- + +## Wrap-up + +After Wave 8 lands: + +1. **Run full suite** on `phase-4`: should be ~390+ tests passing (343 from Phase 3.5 + ~50 new). +2. **Manual smoke** (recommended before opening the PR): + - Run `scripts/backfill_embeddings.py` against a seeded DB to verify vector indexing works. + - Search for a phrase that's substring-distinct but semantically similar to a memory; verify vector path returns it (FTS would miss). + - Create a branch from an old turn; switch; play a few turns; switch back. + - Trigger surgical delete on a turn; verify the impact preview matches what actually gets removed. + - Hide a turn; verify it disappears from the chat surface; unhide. + - Use top-bar search to find a phrase; verify cross-chat results appear. + - Click the "Snapshots" nav link; trigger a manual snapshot; verify it appears. +3. **Push `phase-4`** to gitea. +4. **Open PR** `phase-4 → main`. + +--- + +## Notes for the controller running this plan + +- **External dependency**: `sqlite-vec` (or `sqlite-vss`) MUST be added to `pyproject.toml` and installed BEFORE Wave 1 dispatches. The migration in T88 expects the extension to be loadable. +- **Embedding model choice**: pin in T91 spec before dispatch. The 384 dim is hardcoded in T88's migration; if a different dim is used, update T88 first. +- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval and integration tasks (T91, T96, T97, T98, T101) — surface area is larger. +- **Don't dispatch Wave 5 until Wave 4 merged green.** T97 (memory_write enqueue) calls into the embedding-aware worker; the worker uses T91's `generate_embedding`. Both must be merged into `phase-4` first. +- **Don't dispatch Wave 6 until Wave 5 merged green.** T98 (drawer) wires UI affordances over services from earlier waves. +- **Token-spend rough estimate**: Phase 4 should be ~70-80% the size of Phase 3 (similar scope, larger per-task because vector + branching are non-trivial). Per-task spend similar to Phase 3's larger tasks (T59, T64). +- **DO NOT break existing v1/v2/v3/v3.5 surface contracts.** Every test file that was green at the start of Phase 4 must stay green at the end. The cross-feature integration tests from Phase 3 (`tests/test_phase3_integration.py`) are particularly load-bearing. diff --git a/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json b/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json new file mode 100644 index 0000000..7dc377e --- /dev/null +++ b/docs/plans/2026-04-27-v4-phase4-implementation.md.tasks.json @@ -0,0 +1,22 @@ +{ + "planPath": "docs/plans/2026-04-27-v4-phase4-implementation.md", + "tasks": [ + {"id": 88, "subject": "T88: embeddings table + projector handlers (sqlite-vec)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"}, + {"id": 89, "subject": "T89: branches table + projector handlers", "status": "pending", "wave": 1, "parallelGroup": "wave-1"}, + {"id": 90, "subject": "T90: phase 3.6 carry-overs (chat-id pushdown + lifecycle wording + legacy fn consolidation)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"}, + {"id": 91, "subject": "T91: embedding generation service", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]}, + {"id": 92, "subject": "T92: vector search service via sqlite-vec", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [88]}, + {"id": 93, "subject": "T93: cross-chat search service (FTS5 over all owners)", "status": "pending", "wave": 2, "parallelGroup": "wave-2"}, + {"id": 94, "subject": "T94: branch_from_event service", "status": "pending", "wave": 3, "parallelGroup": "wave-3", "blockedBy": [89]}, + {"id": 95, "subject": "T95: delete-impact computation service", "status": "pending", "wave": 3, "parallelGroup": "wave-3"}, + {"id": 96, "subject": "T96: combined FTS + vector retrieval ranking in search_memories", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [91, 92]}, + {"id": 97, "subject": "T97: memory_write enqueues embedding job + backfill script", "status": "pending", "wave": 5, "parallelGroup": null, "blockedBy": [91, 96]}, + {"id": 98, "subject": "T98: drawer Phase 4 bundle (branching + sig review + hide + surgical delete + remaining edits)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [94, 95, 97]}, + {"id": 99, "subject": "T99: snapshot UX (manual trigger + list + restore + preview)", "status": "pending", "wave": 7, "parallelGroup": "wave-7"}, + {"id": 100, "subject": "T100: cross-chat search UX (top-bar + results page)", "status": "pending", "wave": 7, "parallelGroup": "wave-7", "blockedBy": [93]}, + {"id": 101, "subject": "T101: cross-feature integration tests (vector × branching × delete × snapshot × search)", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]}, + {"id": 102, "subject": "T102: Phase 4 documentation update", "status": "pending", "wave": 8, "parallelGroup": "wave-8", "blockedBy": [98, 99, 100]} + ], + "lastUpdated": "2026-04-27T00:00:00Z", + "notes": "15 tasks across 8 waves. Adds vector retrieval (sqlite-vec), branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view, and cross-chat search. Phase 3.6 carry-overs (3 small fixes) bundled into T90. External dependency: sqlite-vec must be installed BEFORE Wave 1 dispatch. Embedding model choice (default: 384-dim small model) pinned in T91 spec before dispatch — schema 0012 hardcodes 384 dim. Two new schema migrations (0012 embeddings, 0013 branches), final schema version 13. Uses task ids T88-T102." +}