Files
chat/docs/plans/2026-04-27-v4-phase4-implementation.md
T
Joseph Doherty bffd9a2f38 docs: add Phase 4 implementation plan (vector retrieval + branching + polish)
15 tasks across 8 waves landing the Phase 4 deliverables per
requirements doc §13 + §14:

- Vector retrieval via sqlite-vec (new external dependency)
- Branching UI (event log forks)
- Drawer-edit on every field (significance review, hide-from-view,
  surgical delete with cascade preview, branching affordances)
- Backup tooling (snapshot UX surface)
- Cross-chat search

Plus the 3 Phase 3.6 carry-over fixes (T90 bundle).

Wave structure:
- W1 (parallel 3-way): schema foundation + carry-overs
- W2 (parallel 3-way): embedding/search services
- W3 (parallel 2-way): branching + delete services
- W4 (single): combined retrieval ranking
- W5 (single): memory write hook + backfill
- W6 (single): drawer Phase 4 bundle (5 sub-features)
- W7 (parallel 2-way): snapshot UX + cross-chat search UX
- W8 (parallel 2-way): integration tests + docs

External dependency: sqlite-vec must be installed BEFORE Wave 1.
Embedding model choice (384-dim default) pinned in T91 before dispatch
since the migration hardcodes the dimension.

Schema baseline: 11 -> 13 (adds 0012_embeddings.sql + 0013_branches.sql).
Task ids T88-T102 to avoid collision with prior phases.
2026-04-27 02:03:08 -04:00

40 KiB
Raw Blame History

Roleplay Engine — Phase 4 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.

Goal: Land Phase 4 polish per requirements doc §13 + §14: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review UI, surgical delete with cascade preview, hide-from-view soft delete, plus cross-chat search and the small Phase 3.6 carry-over fixes.

Architecture: Builds on Phase 3.5's stable base. Two new tables (embeddings, branches) and one external dependency (sqlite-vec extension). Embedding generation runs as a deferred async job — NOT inline with turns — so the play loop stays fast even when the embedding endpoint is slow. Branching is data-model-only at first (events + selectors); UI grafts on top. Surgical delete + cascade preview reuses the existing rewind-and-supersede plumbing. Cross-chat search piggybacks on the existing FTS5 + (now) vector retrieval.

Tech Stack:

  • NEW dependency: sqlite-vec (or sqlite-vss — Phase 4 picks; recommended sqlite-vec for simpler load semantics and active maintenance). Add to pyproject.toml.
  • Embedding model selection is part of T91 spec. Recommended default: a small model on Featherless (e.g., BAAI/bge-small-en-v1.5 if available) or a local CPU-friendly model via sentence-transformers. Document choice in CLAUDE.md.
  • Same as Phase 3 otherwise (Python 3.11+, FastAPI, HTMX, SQLite).

Source-of-truth references:

  • Phase 4 scope: requirements doc §13 "Phase 4 — polish" + §14 "Open / Deferred Decisions".
  • Behavioral details: §6 (prompt assembly + retrieval), §10 (rewind / regenerate / reset), §11 (compression + significance), §12 (snapshots).
  • Conventions: CLAUDE.md §"Behavioral defaults" + §"Phase 3 status" + §"Phase 3.5 status".
  • Phase 3.5 cleanup plan (style, file-bundling pattern): 2026-04-26-v3.5-phase3.5-cleanup.md.

Pre-flight

Branch: create phase-4 from the latest main after Phase 3.5 has merged (it has — main is at 1b66a28):

git checkout main && git pull && git checkout -b phase-4

Schema baseline: Phase 3.5 leaves the DB at version 11. Phase 4 adds two migrations: 0012_embeddings.sql and 0013_branches.sql. Final schema version: 13.

External dependency setup (BEFORE T88 dispatch):

The controlling agent should add sqlite-vec to pyproject.toml and run pip install -e . (or equivalent) so all worktrees pick up the new dependency. Confirm sqlite_vec imports cleanly:

python -c "import sqlite_vec; print(sqlite_vec.__version__)"

If sqlite_vec isn't on PyPI when this plan executes, fall back to sqlite-vss and adapt T88/T92 accordingly. Both expose vector-search SQL via a loadable extension.

Pinned non-negotiables (carried forward):

  • State changes go through the event log. Use append_and_apply(conn, kind, payload) for the live path; apply_event only after a fresh append_event returning the new id.
  • Witness filter every memory read at SQL level (hard WHERE constraint; never a soft signal).
  • Per-POV scene summaries — never write omniscient narration.
  • TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
  • One commit per task minimum. Tasks that bundle multiple sub-features SHOULD split commits internally.

Verification before claiming done: Use superpowers-extended-cc:verification-before-completion — run the test command, paste actual output. Don't assume green.


Phase 3.6 carry-overs folded in

Three small items from Phase 3.6 backlog are bundled into Phase 4's Wave 1 trivial-fixes task (T90):

  1. read_recent_dialogue chat-id pushdown into SQL (T80 review nit)
  2. Lifecycle warning wording in regenerate (T83.4 — "at-or-after turn X" tightening)
  3. Legacy single-bot record_turn_memory consolidation (T84 review nit)

Three items remain DEFERRED beyond Phase 4 (Phase 4.5 if needed):

  • Scene-close-on-cancel UX revisit (no action unless real play surfaces a regression).
  • Cross-feature canned-queue brittleness (structured fixture builder for tests — not blocking).
  • Full lifecycle-rollback in regenerate (warning log already shipped in T83.4; proper rollback needs schema-level back-references, deferred indefinitely).

Parallel-Execution Strategy

Same pattern as Phase 3.5. Eight waves: parallel within each wave (file-disjoint), serial across waves.

How to dispatch a wave in parallel

Use the Agent tool with isolation: "worktree" so each subagent gets its own git worktree. (If the controlling session's working directory is not the chat repo, create worktrees manually with git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4 from inside the chat repo.)

Dispatch all tasks in a wave in a single message:

Agent({ description: "Wave 1 — T88 embeddings table", prompt: "...", isolation: "worktree" })
Agent({ description: "Wave 1 — T89 branches table", ... })
Agent({ description: "Wave 1 — T90 phase 3.6 carry-overs", ... })

After a wave completes

  1. Each subagent returns its worktree path and commit SHA(s).
  2. Run a spec + code-quality reviewer subagent on each completed task. Combined review acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval tasks (T91, T92, T96, T97) since the integration surface is wider.
  3. Merge the wave into phase-4 in any order (file-disjointness guarantees no conflict). Use --no-ff.
  4. Run the full test suite on the merged phase-4. If red, the wave's mutual-independence assumption was violated — bisect, fix, re-merge.
  5. Push phase-4 to gitea.
  6. Optionally clean up worktrees.

Conflict prevention checklist

For each parallel wave, verify the Files sections of all tasks have no overlapping paths. Hot files in this plan: chat/web/drawer.py + chat/templates/_drawer.html (T98 only — bundled), chat/state/memory.py (T96 only), chat/services/memory_write.py (T90 + T97 — sequential), chat/web/turns.py (T98 only via delete affordance — sequential after T96).

Why each wave is parallel-safe

Wave Tasks Hot files touched Disjoint?
1 T88, T89, T90 new migrations + new state modules; T90 touches turn_common.py + regenerate.py + memory_write.py (additive only)
2 T91, T92, T93 new service modules (embeddings, vector_search, cross_chat_search)
3 T94, T95 new service modules (branching, delete_impact)
4 T96 chat/state/memory.py (combined retrieval ranking) (single task)
5 T97 chat/services/memory_write.py + new backfill script (single task)
6 T98 chat/web/drawer.py + chat/templates/_drawer.html (drawer Phase 4 bundle) (single task)
7 T99, T100 new files: chat/web/snapshots.py + chat/templates/snapshots.html (T99); chat/web/search.py + chat/templates/search.html + small chat.html top-bar addition (T100) (disjoint)
8 T101, T102 new test file (T101); CLAUDE.md + design doc (T102)

Task overview

Wave 1 ─┬─ T88: embeddings table + projector handlers
        ├─ T89: branches table + projector handlers
        └─ T90: Phase 3.6 carry-overs trio (chat-id SQL pushdown + lifecycle wording + legacy-fn consolidation)

Wave 2 ─┬─ T91: embedding generation service (Featherless or local)
        ├─ T92: vector search service via sqlite-vec
        └─ T93: cross-chat search service (FTS over all owners)

Wave 3 ─┬─ T94: branch_from_event service (event-log fork, branch metadata)
        └─ T95: delete-impact computation service (cascade preview)

Wave 4 ─── T96: combined FTS + vector retrieval ranking in search_memories

Wave 5 ─── T97: memory_write enqueues embedding job + backfill script for existing memories

Wave 6 ─── T98: drawer Phase 4 bundle — branching UI + significance review + hide-from-view + surgical delete + remaining v1 edits

Wave 7 ─┬─ T99: snapshot UX (manual trigger, retention display, restore-from-snapshot UI)
        └─ T100: cross-chat search UX (top-bar input + search results page)

Wave 8 ─┬─ T101: cross-feature integration tests (vector × branching × delete × snapshot × search)
        └─ T102: Phase 4 documentation update

Critical path: 8 sequential merge points. Total tasks: 15. Parallelism: Waves 1, 2, 3, 7, 8 dispatch concurrently (3-way and 2-way). Waves 4, 5, 6 are single-task by hot-file constraint.


Wave 1 — Schema foundation + Phase 3.6 carry-overs (parallel)

Task 88: Embeddings table + projector handlers

Files:

  • Create: chat/db/migrations/0012_embeddings.sql
  • Create: chat/state/embeddings.py
  • Create: tests/test_embeddings_state.py
  • Modify: pyproject.toml (add sqlite-vec dependency — controlling agent should pre-install before dispatch; the worktree commits the dependency declaration)

Spec:

Adds the embeddings table that stores per-memory embedding vectors for vector retrieval. Uses sqlite-vec virtual-table syntax for cosine-similarity search. Schema:

-- Load sqlite-vec extension at connection time (handled in chat/db/connection.py).
-- Embeddings are stored as blobs in a vec0 virtual table for fast similarity search.

CREATE VIRTUAL TABLE embeddings USING vec0(
    memory_id INTEGER PRIMARY KEY,
    embedding FLOAT[384]   -- 384-dim default; adjust per chosen model
);

-- Sidecar table for non-vector metadata (model used, dim, indexed_at).
CREATE TABLE embeddings_meta (
    memory_id INTEGER PRIMARY KEY,
    model TEXT NOT NULL,
    dim INTEGER NOT NULL,
    indexed_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (memory_id) REFERENCES memories(id)
);

(If sqlite-vss is chosen instead, replace vec0 with vss0 and adapt the dim declaration. Both have similar Python loading semantics.)

chat/state/embeddings.py:

  • @on("embedding_indexed") payload {memory_id, model, dim, vector: list[float]}. Inserts into both embeddings and embeddings_meta. Idempotent via INSERT OR REPLACE (re-indexing a memory replaces the prior vector).
  • @on("embedding_deindexed") payload {memory_id}. Deletes from both tables. Used when a memory is purged via reset/cascade.
  • Reader get_embedding_meta(conn, memory_id) -> dict | None returns the meta row.

The chat/db/connection.py open_db helper needs to load the sqlite-vec extension on each connection. Add:

import sqlite_vec
# Inside open_db, after connection is opened:
conn.enable_load_extension(True)
sqlite_vec.load(conn)
conn.enable_load_extension(False)

This is a small modification to connection.py. Include it in T88's diff.

Tests: 3 minimum.

  1. test_embedding_indexed_inserts_row: append bot_authored, chat_created, memory_written (creates a memory), then embedding_indexed with vector=[0.1] * 384. Project. Assert embeddings_meta row exists for that memory_id with the right model.
  2. test_embedding_deindexed_removes_row: same setup; index then de-index; assert row is gone.
  3. test_vector_similarity_search_returns_nearest: index two memories with distinct vectors; query for nearest neighbor of one vector; assert correct memory_id returned. Uses sqlite-vec's MATCH '...' syntax (verify against actual sqlite-vec docs; adapt if needed).

If running tests requires sqlite-vec to be loaded, the test fixture may need to skip / xfail when the extension isn't installed. Use pytest.importorskip("sqlite_vec") at the top of the test file.

Commit: feat: embeddings table + projector handlers via sqlite-vec (T88).

Notes:

  • Schema version after migration alone: 12. T89 adds 0013, taking final to 13. The schema_version assertion in tests/test_world.py updates to 13 in the wave-merge step.
  • The connection.py change is small but cross-cutting — affects every open_db call. Verify the existing 343 tests still pass after the change.

Task 89: Branches table + projector handlers

Files:

  • Create: chat/db/migrations/0013_branches.sql
  • Create: chat/state/branches.py
  • Create: tests/test_branches_state.py

Spec:

Adds the branches table that records named alternate event-log forks. A branch is metadata: a name, an origin_event_id (the event we forked from), and a head_event_id (the latest event in this branch). The event log itself is unchanged — the branch table just labels linear ranges of event ids.

CREATE TABLE branches (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    origin_event_id INTEGER NOT NULL,
    head_event_id INTEGER NOT NULL,
    chat_id TEXT,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    is_active INTEGER NOT NULL DEFAULT 0
);

-- Exactly one row may have is_active = 1 at any time.
CREATE UNIQUE INDEX branches_active_idx ON branches(is_active) WHERE is_active = 1;

The "main" branch is implicit and bootstrapped by the migration: INSERT INTO branches (name, origin_event_id, head_event_id, is_active) VALUES ('main', 0, 0, 1);. Subsequent branches reference an origin_event_id (the event that the branch forked from).

chat/state/branches.py:

  • @on("branch_created") payload {name, origin_event_id, chat_id?, head_event_id}. Inserts a new row with is_active=0. Idempotent re-insertion via INSERT OR IGNORE.
  • @on("branch_switched") payload {name}. Sets is_active=1 on the named branch and is_active=0 on all others. Atomic via a single UPDATE.
  • @on("branch_head_updated") payload {name, head_event_id}. Updates head_event_id on the named branch. Used by the orchestrator when new events extend the branch.
  • Readers: get_branch(conn, name), list_branches(conn, chat_id=None), active_branch(conn).

Tests: 3 minimum.

  1. test_branch_created_inserts_row: append branch_created with name="experiment", origin_event_id=42; project; assert get_branch(conn, "experiment") returns the row.
  2. test_branch_switched_atomic: seed two branches; switch from one to the other; assert exactly one is active.
  3. test_main_branch_bootstrapped_by_migration: open a fresh DB, apply migrations; assert active_branch(conn)["name"] == "main".

Commit: feat: branches table + projector handlers (T89).

Notes:

  • Schema version after this migration alone: 13. Combined with T88: 13 (since T88 was 12, T89 stacks). Wave-merge bumps tests/test_world.py schema_version assertion to 13.
  • This task does NOT yet teach the orchestrator to consult is_active — the existing event_log queries assume a single timeline. T98 (drawer branching UI) will enable user-driven switches, but the actual "follow only the active branch" filter on event reads is a follow-up (Phase 4.5 nit; document in T102 docs sweep).

Task 90: Phase 3.6 carry-overs trio

Files:

  • Modify: chat/services/turn_common.py (push chat_id filter into SQL)
  • Modify: chat/services/regenerate.py (lifecycle warning wording tightening)
  • Modify: chat/services/memory_write.py (consolidate legacy record_turn_memory into the unified API or delete it)
  • Modify: tests/test_turn_common.py, tests/test_regenerate.py, tests/test_memory_write.py

Spec: Three small Phase 3.6 carry-over fixes bundled because each is 1-line + 1-test.

90.1 — read_recent_dialogue chat-id SQL pushdown

Per T80 review nit. Currently read_recent_dialogue filters chat_id post-fetch in Python. Push into SQL for tighter LIMIT semantics:

SELECT id, kind, payload_json
FROM event_log
WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn')
  AND superseded_by IS NULL
  AND hidden = 0
  AND json_extract(payload_json, '$.chat_id') = ?
ORDER BY id DESC
LIMIT ?

Then the post-fetch loop becomes a simple reverse + slice — no chat_id check needed.

Test added: test_read_recent_dialogue_limit_respects_chat_scope — seed two chats with 60 turns each; query chat_a with limit=50; assert returned rows are exactly 50 chat_a rows (not 50 cross-chat rows that filter down to <50 after Python).

Commit: perf: read_recent_dialogue pushes chat-id filter into SQL (T90.1).

90.2 — Lifecycle warning wording tightening

Per T83.4 review nit. Current warning lists "lifecycle transitions from superseded turn are NOT being rolled back". When user regenerates an OLDER turn (T29 supports this), the warning lists intervening-turn transitions that legitimately stand. Tighten wording to "lifecycle transitions at-or-after turn X" so operators reading logs aren't misled.

Change is one log message string. Test asserts the new wording appears.

Commit: chore: clarify regenerate lifecycle warning wording (T90.2).

90.3 — Legacy record_turn_memory consolidation

Per T84 review nit. The original Phase 1 single-bot record_turn_memory function still exists alongside the unified record_turn_memory_for_present. Either:

  • (a) Remove the legacy function entirely; update any remaining callers to use the unified API.
  • (b) Convert it to a thin wrapper for backward compat.

Pick (a) if there are zero remaining callers; (b) if any callers exist. Read the codebase to confirm. The mock-data seed scripts may still use the legacy fn.

Commit: refactor: consolidate legacy record_turn_memory into unified API (T90.3).

TDD process for T90:

  1. Read all 3 affected files + their tests.
  2. Implement 90.1 with test; commit.
  3. Implement 90.2 with test; commit.
  4. Implement 90.3 with test; commit.
  5. Run full suite — should be 343 + 3 = 346 (or +2 if 90.3 had no behavioral change).

Wave 2 — Embedding & search services (parallel)

Three new service modules. Fully file-disjoint.

Task 91: Embedding generation service

Files:

  • Create: chat/services/embeddings.py
  • Create: tests/test_embeddings.py

Spec: Wraps the embedding API call. Signature:

class EmbeddingResult(BaseModel):
    vector: list[float]
    model: str
    dim: int

async def generate_embedding(
    client: LLMClient,    # or a separate embedding-specific client
    *,
    text: str,
    model: str,
    timeout_s: float = 30.0,
) -> EmbeddingResult:
    """Generate an embedding vector for the given text. Falls back to a
    zero-vector with model='fallback' on failure (so callers get a deterministic
    sentinel they can detect and skip indexing)."""

Implementation: call the embedding endpoint (Featherless OpenAI-compatible /v1/embeddings, or a local sentence-transformers model). Add a new method client.embed(text, model) to LLMClient Protocol (and to MockLLMClient and FeatherlessClient).

Embedding model choice:

Default to a small CPU-friendly model accessible through the existing Featherless setup:

  • If Featherless has BAAI/bge-small-en-v1.5 or similar 384-dim model: use that.
  • If not: fall back to local sentence-transformers/all-MiniLM-L6-v2 (384-dim, runs CPU). Add sentence-transformers to pyproject.toml.
  • Document choice in CLAUDE.md (T102 docs sweep).

The 384 dim is hardcoded in T88's migration. If a different model with different dim is chosen, update T88's schema accordingly BEFORE T88 dispatches.

Tests: 3 minimum.

  1. test_generate_embedding_returns_vector_of_correct_dim: mock embedding response with a 384-element vector; assert returned vector length is 384.
  2. test_generate_embedding_returns_correct_model_metadata: assert result.model matches the input.
  3. test_generate_embedding_falls_back_on_failure: mock the client to raise; assert the result is a 384-element zero vector with model="fallback".

Commit: feat: embedding generation service (T91).


Task 92: Vector search service via sqlite-vec

Files:

  • Create: chat/services/vector_search.py
  • Create: tests/test_vector_search.py

Spec: Wraps sqlite-vec's MATCH syntax for cosine-similarity search over the embeddings virtual table. Witness-filter aware (joins through memories table for the witness check).

def vector_search(
    conn,
    *,
    owner_id: str,
    witness_role: str,    # "you" | "host" | "guest"
    query_vector: list[float],
    k: int = 4,
) -> list[dict]:
    """Return top-K memories by cosine similarity to query_vector,
    witness-filtered for the requesting bot's POV. Returns same row
    shape as state.memory.search_memories for combined-ranking
    compatibility."""

SQL pattern (sqlite-vec):

SELECT m.id, m.text, m.pov_summary, m.significance, e.distance
FROM embeddings e
JOIN memories m ON m.id = e.memory_id
WHERE e.embedding MATCH ?
  AND k = ?
  AND m.owner_id = ?
  AND m.witness_<role> = 1
ORDER BY e.distance ASC
LIMIT ?

(Adapt to actual sqlite-vec syntax — use vec0 MATCH semantics. The witness_<role> interpolation needs the same allowlist guard pattern as Phase 2.5 T72.3.)

Tests: 3 minimum.

  1. test_vector_search_returns_nearest_neighbors: index 5 memories with synthetic vectors; query for nearest 3; assert correct order.
  2. test_vector_search_respects_witness_filter: index a memory with witness [1, 1, 0]; query with witness_role="guest"; assert empty result.
  3. test_vector_search_respects_owner_filter: index memories for two owners; assert query for owner_a doesn't return owner_b's memories.

Commit: feat: vector search service via sqlite-vec (T92).


Task 93: Cross-chat search service

Files:

  • Create: chat/services/cross_chat_search.py
  • Create: tests/test_cross_chat_search.py

Spec: FTS5-based search across ALL chats and all owners (admin-style search; no witness filter). For "where did I last see this person mention X?" queries.

def search_all_memories(
    conn,
    *,
    query: str,
    k: int = 20,
) -> list[dict]:
    """Search FTS across all owners and chats. Returns rows with
    {memory_id, owner_id, chat_id, text, pov_summary, scene_id,
    significance, ts}. Sorted by FTS rank."""

This is intentionally NOT witness-filtered — it's a power-user search surface. The UI (T100) prompts the user to acknowledge they're seeing memories across POVs.

Tests: 3 minimum.

  1. test_search_all_memories_returns_matches_across_owners: seed 2 owners with overlapping keyword; search; assert both owner's matches appear.
  2. test_search_all_memories_orders_by_fts_rank: seed memories with varying FTS-match strength; assert order.
  3. test_search_all_memories_respects_k_limit.

Commit: feat: cross-chat search service (FTS5 over all owners) (T93).


Wave 3 — Branching + delete services (parallel)

Two new service modules. Fully file-disjoint.

Task 94: branch_from_event service

Files:

  • Create: chat/services/branching.py
  • Create: tests/test_branching.py

Spec:

def branch_from_event(
    conn,
    *,
    name: str,
    origin_event_id: int,
    chat_id: str | None = None,
) -> int:
    """Create a new named branch forking from origin_event_id.
    Emits a branch_created event. Returns the new branch's row id.
    Raises ValueError if name already exists."""

def switch_active_branch(conn, *, name: str) -> None:
    """Make the named branch active. Emits branch_switched. Subsequent
    event reads should consult is_active to filter."""

def list_branches_with_metadata(conn, chat_id: str | None = None) -> list[dict]:
    """List branches with: name, origin_event_id, head_event_id, is_active,
    event_count (number of events between origin and head, inclusive),
    created_at."""

Tests cover: basic create, duplicate-name raises, switch updates is_active exclusively, list returns metadata.

Commit: feat: branching service (T94).


Task 95: Delete-impact computation service

Files:

  • Create: chat/services/delete_impact.py
  • Create: tests/test_delete_impact.py

Spec: Computes the cascade impact of deleting a single event_log row (or a turn group: user_turn + assistant_turn + interjection if any). Returns a structured ImpactReport for the UI to render.

class DeletedItem(BaseModel):
    kind: str           # "memory" | "edge_update" | "scene_close" | etc.
    description: str    # human-readable
    target_id: int | str | None

class ImpactReport(BaseModel):
    target_event_id: int
    cascading: list[DeletedItem]
    notes: list[str]    # warnings, e.g. "this turn opened scene_X which has 3 subsequent turns"

def compute_delete_impact(conn, *, target_event_id: int) -> ImpactReport:
    """Walk the event log forward from target_event_id and identify
    everything that depends on this event: child memory_written events,
    edge_update events with this turn as source, scene_closed events
    triggered by this turn, etc. Also identify subsequent turns that
    REFERENCE this event (regenerated_from chains, etc.).
    
    Does NOT mutate the database. Pure computation for preview."""

The actual delete (truncate + supersede) is the existing rewind path from Phase 1 T31. T95 just builds the preview.

Tests: 4 minimum.

  1. test_impact_for_simple_turn_lists_memory_and_edges: seed a chat with a turn that wrote 1 memory + 2 edge_updates. Compute impact. Assert the 3 items appear in cascading.
  2. test_impact_for_scene_opening_turn_warns_about_subsequent_turns: seed a turn that opened a scene + 5 subsequent turns. Assert notes mentions the dependency.
  3. test_impact_for_regenerated_turn_lists_supersede_chain: seed a turn that's been regenerated (has superseded_by). Compute impact for the original. Assert the chain appears.
  4. test_impact_does_not_mutate_database: snapshot event_log before + after; assert byte-identical.

Commit: feat: delete-impact computation service (T95).


Wave 4 — Combined retrieval ranking (single)

Task 96: Combined FTS + vector retrieval ranking

Files:

  • Modify: chat/state/memory.py — extend search_memories to optionally include vector hits
  • Modify: tests/test_memory_search.py — add 4 tests

Spec:

search_memories currently does FTS5 + Python-side significance/recency re-rank. Phase 4 adds:

  • An optional query_vector: list[float] | None = None kwarg.
  • When query_vector is provided, run vector_search (T92) for top-K-vector candidates.
  • Merge with FTS top-K candidates via reciprocal-rank fusion (RRF) or a simpler sum-of-ranks scheme — implementer's choice. Document the merge formula.
  • Final result is top-K from the fused set, with the existing significance + recency boosts applied as a final pass.

When query_vector is None: existing behavior unchanged. Phase 1/2/3 callers that don't pass query_vector see no change.

Implementation note: the embedding for the query (the speaker's recent context) must be generated by the caller (Wave 5 T97 wires the prompt-assembly pipeline to call generate_embedding on the dialogue tail). T96 only handles the search side — assumes the vector is pre-computed.

Tests: 4 added.

  1. test_search_memories_without_query_vector_uses_fts_only: regression — call without query_vector; assert the existing FTS+rerank behavior.
  2. test_search_memories_with_query_vector_includes_vector_hits: index 5 memories where 1 is FTS-only-matching, 1 is vector-only-matching, 3 are unrelated. Pass both query=... and query_vector=.... Assert both the FTS hit and the vector hit appear in results.
  3. test_search_memories_fusion_significance_bias_still_applies: confirm the existing significance bias rerank still works on top of fused results.
  4. test_search_memories_fusion_handles_empty_vector_results: pass a vector for a memory that has no embeddings indexed; assert FTS-only results still come back.

Commit: feat: combined FTS + vector retrieval ranking (T96).


Wave 5 — Memory write hook + backfill (single)

Task 97: Embedding generation hook + backfill script

Files:

  • Modify: chat/services/memory_write.py — after each memory_written event, enqueue a background embedding job
  • Create: chat/services/embedding_worker.py — async worker that consumes the queue and emits embedding_indexed events
  • Create: scripts/backfill_embeddings.py — one-time script that walks all existing memories and embeds them
  • Modify: chat/app.py — wire the embedding worker into the lifespan startup
  • Modify: tests/test_memory_write.py — add 2 tests for the enqueue hook
  • Create: tests/test_embedding_worker.py — 3 tests for the worker drain logic

Spec:

After each successful memory_written event, enqueue an embedding job. The worker dequeues and:

  1. Reads the memory text (via get_memory(conn, memory_id)).
  2. Calls generate_embedding(client, text=memory.text, model=settings.embedding_model).
  3. Appends embedding_indexed event with the result. (Skip if result.model == "fallback" — leave the memory un-indexed; will retry later via backfill.)

The worker pattern mirrors Phase 1's chat/services/significance.py SignificanceWorker. Reuse its queue + lifecycle pattern.

Backfill script:

.venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run]

Walks all memories where no embeddings_meta row exists. For each, generates an embedding and emits embedding_indexed. Useful for the initial migration after Phase 4 lands AND for periodic re-runs if an embedding model changes.

Tests:

tests/test_memory_write.py:

  1. test_record_turn_memory_enqueues_embedding_job: monkeypatch the worker's enqueue method; record_turn_memory_for_present; assert the worker received a job per memory.

tests/test_embedding_worker.py:

  1. test_worker_drains_jobs_and_emits_indexed_events: enqueue 3 jobs with mock embeddings; run worker; assert 3 embedding_indexed events landed.
  2. test_worker_skips_fallback_results: mock the embedding service to return a fallback result; assert NO embedding_indexed event landed for that job.
  3. test_worker_handles_concurrent_jobs_serially: pin the Featherless 2-conn cap behavior (worker calls embed sequentially under the existing semaphore).

Commit (split):

  • feat: embedding worker drains queue and emits embedding_indexed events (T97.1)
  • feat: memory_write enqueues embedding job after each memory_written (T97.2)
  • feat: backfill_embeddings script for existing memories (T97.3)

Verification gates:

  • All Phase 1/2/3/3.5 memory tests still pass (regression critical).
  • New tests pass.
  • Manual smoke: run scripts/backfill_embeddings.py --dry-run against a seeded DB and verify expected count.

Wave 6 — Drawer Phase 4 bundle (single task)

Task 98: Drawer Phase 4 features

Files:

  • Modify: chat/web/drawer.py (add many new POST routes and GET extensions)
  • Modify: chat/templates/_drawer.html (add 5 new sections)
  • Create: tests/test_drawer_phase4.py

Spec: Drawer affordances for 5 Phase 4 features. Single task by hot-file constraint; split into 5 commits internally.

98.1 — Branching UI

GET drawer extension: list_branches_with_metadata(conn) → render in a "Branches" section (active branch highlighted + count of events).

POST routes:

  • /drawer/branch/create — form {name, origin_event_id}branch_from_event service.
  • /drawer/branch/switch — form {name}switch_active_branch.
  • /drawer/branch/from-turn/{event_id} — convenience: branch from a specific turn (used by per-turn UI affordance).

98.2 — Significance review panel

GET extension: significance distribution per chat (SELECT significance, COUNT(*) GROUP BY significance) → render histogram.

POST route:

  • /drawer/memory/significance/{memory_id} — form {new_value} (already supported via T22 manual_edit target_kind=memory_significance); just add the UI form.

Bulk re-rate is a Phase 4.5 polish — not in scope here. Just per-memory edit + distribution display.

98.3 — Hide-from-view toggle

POST route:

  • /drawer/turn/hide/{event_id} — form {hidden: bool} → emits a manual_edit with target_kind="turn_hidden".

NEW manual_edit projector branch for turn_hidden: sets event_log.hidden = ? for the target event. Reuses the existing hidden column.

UI affordance: per-turn checkbox in the chat surface or drawer (per-turn list with hide toggle).

98.4 — Surgical delete with cascade preview

GET extension:

  • /drawer/turn/delete-preview/{event_id} → returns the ImpactReport (T95) rendered as a modal.

POST route:

  • /drawer/turn/delete/{event_id} — invokes the rewind-and-truncate path (Phase 1 T31's rewind_to_turn) restricted to the target turn group.

Important: this reuses the existing pre-rewind snapshot path so the action is undoable.

98.5 — Remaining v1 edits

Audit: are any v1 fields STILL not editable from the drawer? Phase 2.5 T72.1 added edge_trust/edge_summary/memory_pov_summary/edge_knowledge_facts. T72.3 added witness flags. Anything left?

Likely candidates: scene narrative_anchor, scene weather, container properties JSON. Add edit forms for any that surface during the audit. If none, this sub-fix is a no-op.

Tests: 8+ in tests/test_drawer_phase4.py (one per sub-feature × happy path; plus 1 for the cascade-preview rendering).

Commits (5):

  • feat: drawer branching UI (T98.1)
  • feat: drawer significance review panel (T98.2)
  • feat: drawer hide-from-view toggle + manual_edit turn_hidden branch (T98.3)
  • feat: drawer surgical delete with cascade preview (T98.4)
  • feat: drawer remaining v1 field edits (T98.5) (or "no-op audit" if nothing left)

Wave 7 — Snapshot + cross-chat search UX (parallel)

Task 99: Snapshot UX

Files:

  • Create: chat/web/snapshots.py (new route module)
  • Create: chat/templates/snapshots.html (snapshot list page)
  • Modify: chat/templates/layout.html (add "Snapshots" nav link)
  • Create: tests/test_snapshot_ux.py

Spec: Surface the existing snapshot infrastructure (Phase 1 T20 wrote snapshots; Phase 4 makes them visible).

GET /snapshots — list all snapshots (periodic + pre-rewind) with metadata: kind, created_at, event_log_size, file_size_bytes.

POST /snapshots/take — manually trigger a snapshot now.

POST /snapshots/restore/{snapshot_id} — restore from snapshot (with hard confirmation).

GET /snapshots/{snapshot_id}/preview — show what's in the snapshot vs. current state.

Tests: 4 minimum (list, take, restore, preview).

Commit: feat: snapshot UX (manual trigger, list, restore) (T99).


Task 100: Cross-chat search UX

Files:

  • Create: chat/web/search.py (new route module)
  • Create: chat/templates/search.html (search results page)
  • Modify: chat/templates/layout.html (add top-bar search input)
  • Create: tests/test_search_ux.py

Spec: Top-bar search box submits to /search?q=.... Results page shows up to 50 matches across all chats and all owners (uses T93's search_all_memories). Each result shows: chat name, owner bot name, scene context, memory text excerpt with FTS highlight, "Open chat at this turn" link.

Tests: 3 minimum.

  1. Search returns results from multiple chats.
  2. Empty query returns empty result set.
  3. Result links navigate to the right chat anchor.

Commit: feat: cross-chat search UX (top-bar input + results page) (T100).


Wave 8 — Polish (parallel)

Task 101: Cross-feature integration tests

Files:

  • Create: tests/test_phase4_integration.py

Spec: End-to-end multi-feature flows. 5 tests minimum.

  1. Vector retrieval feedback loop: write a memory → embedding worker indexes it → search retrieves it via vector path.
  2. Branch + diverge: create branch B from turn 10 → switch to B → play 3 new turns → switch back to main → assert main's turn 11+ are still intact.
  3. Surgical delete: compute impact for a turn → confirm → assert event log truncated correctly + pre-rewind snapshot saved.
  4. Hide + retrieval: hide a turn → assert it doesn't appear in read_recent_dialogue (existing hidden = 0 filter) → unhide → assert it reappears.
  5. Cross-chat search: write memories in 3 chats → search for keyword present in all 3 → assert all 3 appear in results.

Commit: test: phase 4 cross-feature integration coverage (T101).


Task 102: Phase 4 documentation update

Files:

  • Modify: CLAUDE.md (add "Phase 4 status" section; update behavioral defaults; add "Phase 4.5 / 5 backlog" with carry-overs)
  • Modify: docs/plans/2026-04-26-v1-requirements-design.md (annotate §13 Phase 4 as Status: shipped 2026-04-27)

Spec:

Mirror the Phase 3 / 3.5 status sections. Document:

  • Vector retrieval: sqlite-vec virtual table, embedding worker async pipeline, combined FTS + vector ranking via RRF.
  • Branching: forks the event log; UI in drawer; is_active flag plus orchestrator filter (caveat — see backlog if filter not yet wired into all readers).
  • Drawer-edit on every field: branching, significance review, hide-from-view, surgical delete with preview, plus any audit findings.
  • Backup tooling: snapshots panel surfaces existing infra.
  • Significance review UI: distribution + per-memory edit.
  • Surgical delete + cascade preview: piggybacks on rewind path; impact report from T95.
  • Hide-from-view soft delete: manual_edit turn_hidden branch.
  • Cross-chat search: top-bar + results page over T93's service.

Phase 4.5 / 5 backlog candidates (reflect any discovered during execution):

  • Branching read-side filter — if T89's is_active isn't yet consulted by every event reader, this is the work to do.
  • Bulk significance re-rate (per T98.2 deferral).
  • Snapshot retention policy UI controls (per Phase 1 T19 deferred).
  • Auto-pin override UI (per Phase 2 design).
  • Embedding model swap migration tooling (when changing embedding model, need to re-embed everything).
  • Vector index optimization (HNSW vs flat — Phase 5 if needed).
  • Carry-overs that remained deferred from Phase 3.6: scene-close-on-cancel UX revisit, canned-queue brittleness fixture builder, full lifecycle rollback in regenerate.

Commit: docs: phase 4 status, behavioral defaults, deferred items (T102).


Wrap-up

After Wave 8 lands:

  1. Run full suite on phase-4: should be ~390+ tests passing (343 from Phase 3.5 + ~50 new).
  2. Manual smoke (recommended before opening the PR):
    • Run scripts/backfill_embeddings.py against a seeded DB to verify vector indexing works.
    • Search for a phrase that's substring-distinct but semantically similar to a memory; verify vector path returns it (FTS would miss).
    • Create a branch from an old turn; switch; play a few turns; switch back.
    • Trigger surgical delete on a turn; verify the impact preview matches what actually gets removed.
    • Hide a turn; verify it disappears from the chat surface; unhide.
    • Use top-bar search to find a phrase; verify cross-chat results appear.
    • Click the "Snapshots" nav link; trigger a manual snapshot; verify it appears.
  3. Push phase-4 to gitea.
  4. Open PR phase-4 → main.

Notes for the controller running this plan

  • External dependency: sqlite-vec (or sqlite-vss) MUST be added to pyproject.toml and installed BEFORE Wave 1 dispatches. The migration in T88 expects the extension to be loadable.
  • Embedding model choice: pin in T91 spec before dispatch. The 384 dim is hardcoded in T88's migration; if a different dim is used, update T88 first.
  • After each parallel wave, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T90 carry-overs); separate spec + quality reviewers for vector-retrieval and integration tasks (T91, T96, T97, T98, T101) — surface area is larger.
  • Don't dispatch Wave 5 until Wave 4 merged green. T97 (memory_write enqueue) calls into the embedding-aware worker; the worker uses T91's generate_embedding. Both must be merged into phase-4 first.
  • Don't dispatch Wave 6 until Wave 5 merged green. T98 (drawer) wires UI affordances over services from earlier waves.
  • Token-spend rough estimate: Phase 4 should be ~70-80% the size of Phase 3 (similar scope, larger per-task because vector + branching are non-trivial). Per-task spend similar to Phase 3's larger tasks (T59, T64).
  • DO NOT break existing v1/v2/v3/v3.5 surface contracts. Every test file that was green at the start of Phase 4 must stay green at the end. The cross-feature integration tests from Phase 3 (tests/test_phase3_integration.py) are particularly load-bearing.