docs: add Phase 4.5 cleanup plan (all 24 backlog items)
16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Mix of: - Wave 1 (parallel 6-way): trivial polish across 6 different files - Wave 2 (single): schema migration 0014 (FK CASCADE + memories.event_id) - Wave 3 (single): drawer bundle (event_id guard + html.escape + modal partial + bulk significance re-rate) - Wave 4 (single): search UX (FTS snippet highlight + deep-link) - Wave 5 (single): real embedding model swap (LLMClient.embed protocol) - Wave 6 (single): branching read-side filter (riskiest — cross-cutting) - Wave 7 (single): regenerate lifecycle rollback - Wave 8 (single): sqlite-vec swap [ENVIRONMENTAL — may defer to Phase 5 if Python rebuild / apsw not feasible] - Wave 9 (parallel 3-way): structured fixture builder + integration tests + docs Schema baseline 13 -> 14 (or 15 with T115). Big tasks (T112 real embed, T113 branching filter, T114 lifecycle rollback) advance the engine beyond Phase 4's metadata-only state. T115 environmental decision captured in pre-flight; the other 13 tasks ship without it. Uses task ids T103-T118 to avoid collision with prior phases.
This commit is contained in:
@@ -0,0 +1,724 @@
|
||||
# Roleplay Engine — Phase 4.5 Cleanup Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
|
||||
|
||||
**Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder).
|
||||
|
||||
**Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability.
|
||||
|
||||
**Tech Stack:**
|
||||
|
||||
- Existing — same as Phase 4.
|
||||
- **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5.
|
||||
|
||||
**Source-of-truth references:**
|
||||
|
||||
- Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred).
|
||||
- Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md).
|
||||
- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status".
|
||||
|
||||
---
|
||||
|
||||
## Pre-flight
|
||||
|
||||
**Branch:** create `phase-4.5` from the latest `main`:
|
||||
|
||||
```bash
|
||||
git checkout main && git pull && git checkout -b phase-4.5
|
||||
```
|
||||
|
||||
**Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15.
|
||||
|
||||
**Optional pre-flight for T115 (sqlite-vec swap):**
|
||||
|
||||
The host Python build needs `enable_load_extension`. Two options:
|
||||
|
||||
1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv.
|
||||
2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`).
|
||||
|
||||
If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected.
|
||||
|
||||
**Pinned non-negotiables (carried forward):**
|
||||
|
||||
- State changes go through the event log. Use `append_and_apply` for the live path.
|
||||
- Witness filter every memory read at SQL level.
|
||||
- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
|
||||
- One commit per task minimum. Bundled tasks split internally.
|
||||
|
||||
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output.
|
||||
|
||||
---
|
||||
|
||||
## Backlog item → task mapping
|
||||
|
||||
24 items consolidated into 14 tasks by **file ownership**:
|
||||
|
||||
| # | Item | Source | Task |
|
||||
|---|------|--------|------|
|
||||
| 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) |
|
||||
| 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** |
|
||||
| 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** |
|
||||
| 4 | Real embedding model swap | T91 / deferred | **T112** |
|
||||
| 5 | `timeout_s` fallback-path logging | T91 | **T107** |
|
||||
| 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** |
|
||||
| 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** |
|
||||
| 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** |
|
||||
| 9 | `html.escape()` on delete-impact modal output | T98 | **T110** |
|
||||
| 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** |
|
||||
| 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** |
|
||||
| 12 | Strict `kind` validation in snapshot routes | T99 | **T105** |
|
||||
| 13 | `created_at` from file mtime — document drift risk | T99 | **T105** |
|
||||
| 14 | Hardcoded `k=50` → module constant | T100 | **T106** |
|
||||
| 15 | N+1 lookups in search results | T100 | **T106** |
|
||||
| 16 | FTS highlighting via `snippet()` | T100 | **T111** |
|
||||
| 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** |
|
||||
| 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** |
|
||||
| 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** |
|
||||
| 20 | Bulk significance re-rate in drawer | deferred | **T110** |
|
||||
| 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) |
|
||||
| 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** |
|
||||
| 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** |
|
||||
| 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** |
|
||||
|
||||
---
|
||||
|
||||
## Parallel-Execution Strategy
|
||||
|
||||
Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves.
|
||||
|
||||
### How to dispatch a wave in parallel
|
||||
|
||||
Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4.5`.)
|
||||
|
||||
### After a wave completes
|
||||
|
||||
1. Each subagent returns its worktree path and commit SHA(s).
|
||||
2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
|
||||
3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
|
||||
4. **Run the full test suite** on the merged `phase-4.5`.
|
||||
5. **Push `phase-4.5`** to gitea.
|
||||
6. Optionally clean up worktrees.
|
||||
|
||||
### Conflict prevention checklist
|
||||
|
||||
For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`.
|
||||
|
||||
### Why each wave is parallel-safe
|
||||
|
||||
| Wave | Tasks | Hot files | Disjoint? |
|
||||
|------|-------|-----------|-----------|
|
||||
| 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ |
|
||||
| 2 | T109 | new migration + minor projector update | (single task) |
|
||||
| 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) |
|
||||
| 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) |
|
||||
| 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) |
|
||||
| 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) |
|
||||
| 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) |
|
||||
| 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) |
|
||||
| 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## Task overview
|
||||
|
||||
```
|
||||
Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning)
|
||||
├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)
|
||||
├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)
|
||||
├─ T106: search.py polish (k constant + N+1 batched lookups)
|
||||
├─ T107: embeddings.py timeout_s fallback-path logging
|
||||
└─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test)
|
||||
|
||||
Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column)
|
||||
|
||||
Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)
|
||||
|
||||
Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id)
|
||||
|
||||
Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill)
|
||||
|
||||
Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range)
|
||||
|
||||
Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede)
|
||||
|
||||
Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight]
|
||||
|
||||
Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness)
|
||||
├─ T117: Phase 4.5 cross-feature integration tests
|
||||
└─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals
|
||||
```
|
||||
|
||||
Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 2–8 are single-task by hot-file constraint.
|
||||
|
||||
---
|
||||
|
||||
## Wave 1 — Independent small fixes (parallel, 6 tasks)
|
||||
|
||||
All trivial, file-disjoint. Each is 1-line + 1-test or similar.
|
||||
|
||||
### Task 103: branches polish
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/state/branches.py`
|
||||
- Modify: `tests/test_branches_state.py`
|
||||
|
||||
**Spec (2 sub-fixes, single commit):**
|
||||
|
||||
1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped).
|
||||
|
||||
2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches.
|
||||
|
||||
**Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable).
|
||||
|
||||
**Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 104: state/memory.py polish
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/state/memory.py`
|
||||
- Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed)
|
||||
|
||||
**Spec (2 sub-fixes):**
|
||||
|
||||
1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper.
|
||||
|
||||
2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit).
|
||||
|
||||
**Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired.
|
||||
|
||||
**Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 105: snapshots.py polish
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/web/snapshots.py`
|
||||
- Modify: `tests/test_snapshot_ux.py` (1 new test)
|
||||
|
||||
**Spec (3 sub-fixes):**
|
||||
|
||||
1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`).
|
||||
|
||||
2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting.
|
||||
|
||||
3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note.
|
||||
|
||||
**Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/<id>` without `kind`; assert 400.
|
||||
|
||||
**Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 106: search.py polish
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/web/search.py`
|
||||
- Modify: `tests/test_search_ux.py` (1 new test)
|
||||
|
||||
**Spec (2 sub-fixes):**
|
||||
|
||||
1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site.
|
||||
|
||||
2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row.
|
||||
|
||||
**Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`.
|
||||
|
||||
**Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 107: embeddings.py timeout_s fallback-path logging
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/services/embeddings.py`
|
||||
- Modify: `tests/test_embeddings.py` (1 new test)
|
||||
|
||||
**Spec:**
|
||||
|
||||
When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade.
|
||||
|
||||
```python
|
||||
if model != DEFAULT_EMBEDDING_MODEL:
|
||||
_log.warning(
|
||||
"generate_embedding: non-default model %r returned fallback "
|
||||
"(model client.embed() not yet implemented in Phase 4.5+); "
|
||||
"downstream search will degrade silently. Configure a supported model.",
|
||||
model,
|
||||
)
|
||||
return EmbeddingResult(...) # fallback
|
||||
```
|
||||
|
||||
The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning.
|
||||
|
||||
**Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears.
|
||||
|
||||
**Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 108: scene-close-on-cancel UX revisit
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one)
|
||||
- Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation
|
||||
|
||||
**Spec:**
|
||||
|
||||
This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output.
|
||||
|
||||
**Action:**
|
||||
|
||||
1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4).
|
||||
2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't).
|
||||
3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only.
|
||||
|
||||
**Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change.
|
||||
|
||||
**Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract.
|
||||
|
||||
**Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`.
|
||||
|
||||
---
|
||||
|
||||
## Wave 2 — Schema migration (single)
|
||||
|
||||
### Task 109: 0014 schema migration
|
||||
|
||||
**Files:**
|
||||
- Create: `chat/db/migrations/0014_phase45_schema.sql`
|
||||
- Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written)
|
||||
- Modify: `tests/test_world.py` (bump schema_version assertion to 14)
|
||||
- Modify: `tests/test_memory_write.py` (assert event_id populated)
|
||||
|
||||
**Spec:**
|
||||
|
||||
Two schema changes bundled into a single migration:
|
||||
|
||||
1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically.
|
||||
|
||||
2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting.
|
||||
|
||||
**Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context.
|
||||
|
||||
**Tests:**
|
||||
|
||||
1. `test_schema_version_after_migration_is_14` (rename + bump from 13).
|
||||
2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id.
|
||||
3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable.
|
||||
|
||||
**Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`.
|
||||
|
||||
---
|
||||
|
||||
## Wave 3 — Drawer Phase 4.5 bundle (single)
|
||||
|
||||
### Task 110: drawer polish + bulk significance re-rate
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/web/drawer.py`
|
||||
- Modify: `chat/templates/_drawer.html`
|
||||
- Create: `chat/templates/_delete_impact_modal.html` (extracted partial)
|
||||
- Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind)
|
||||
- Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests)
|
||||
|
||||
**Spec (4 sub-fixes, 4 commits):**
|
||||
|
||||
1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`.
|
||||
|
||||
2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS.
|
||||
|
||||
3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern.
|
||||
|
||||
4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail).
|
||||
|
||||
**Tests:**
|
||||
|
||||
1. `test_delete_turn_with_event_id_zero_returns_400`.
|
||||
2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"<div class=\"delete-impact-modal\">" in response.content` or similar).
|
||||
3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `<script>` in a description-bound field; render preview; assert it appears HTML-escaped.
|
||||
4. `test_bulk_significance_re_rate_emits_manual_edit_per_memory` — seed 5 memories at significance 0; bulk re-rate to 2; assert 5 `manual_edit` events landed.
|
||||
|
||||
**Commits (4):**
|
||||
- `fix: drawer delete_turn guards event_id <= 0 (T110.1)`
|
||||
- `fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)`
|
||||
- `refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)`
|
||||
- `feat: drawer bulk significance re-rate per chat (T110.4)`
|
||||
|
||||
---
|
||||
|
||||
## Wave 4 — Search UX enhancements (single)
|
||||
|
||||
### Task 111: FTS highlighting + deep-link to turn
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/services/cross_chat_search.py`
|
||||
- Modify: `chat/web/search.py`
|
||||
- Modify: `chat/templates/search.html`
|
||||
- Modify: `tests/test_search_ux.py`
|
||||
|
||||
**Spec (2 sub-fixes, 2 commits):**
|
||||
|
||||
1. **FTS highlighting via `snippet()`** (T100 nit): replace the `pov_summary` column in `search_all_memories`'s SELECT with `snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)` to return a highlighted snippet around the match. The template renders this raw via `|safe` (the snippet is built by SQLite from indexed content; the `<mark>` tags are the only HTML, and SQLite escapes any HTML special chars in the source content).
|
||||
|
||||
2. **Deep-link to turn via memories.event_id** (T100 nit + T109 dependency): now that `memories.event_id` exists (from T109), each search result row knows the originating event id. The chat page uses turn-id stamping (Phase 3.5 T86 added `id="turn-{event_id}"`). Build result links as `/chats/{chat_id}#turn-{event_id}`. The chat page DOM scrolls to the anchor on load (browser default).
|
||||
|
||||
**Tests:**
|
||||
|
||||
1. `test_search_results_include_fts_snippet_with_highlight` — seed memory with text containing "rabbit"; search for "rabbit"; assert response body contains `<mark>rabbit</mark>` (or whatever marker the snippet uses).
|
||||
2. `test_search_result_link_includes_turn_anchor` — seed memory with known event_id; search; assert link href contains `#turn-{event_id}`.
|
||||
|
||||
**Commits (2):**
|
||||
- `feat: cross-chat search FTS snippet highlighting (T111.1)`
|
||||
- `feat: cross-chat search deep-links to turn via memories.event_id (T111.2)`
|
||||
|
||||
---
|
||||
|
||||
## Wave 5 — Real embedding model (single)
|
||||
|
||||
### Task 112: Real embedding model swap
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/llm/client.py` (Protocol — add `embed(text, model) -> list[float]` method)
|
||||
- Modify: `chat/llm/featherless.py` (FeatherlessClient — implement `embed` against Featherless `/v1/embeddings` endpoint OR equivalent)
|
||||
- Modify: `chat/llm/mock.py` (MockLLMClient — accept canned embedding vectors)
|
||||
- Modify: `chat/services/embeddings.py` (route non-default model through `client.embed()`)
|
||||
- Modify: `chat/config.py` (add `embedding_model: str` setting; default to current pseudo)
|
||||
- Modify: `scripts/backfill_embeddings.py` (re-embed-all option for model swaps)
|
||||
- Modify: `tests/test_embeddings.py` + `tests/test_llm_mock.py` + `tests/test_featherless.py` (if exists)
|
||||
|
||||
**Spec:**
|
||||
|
||||
Phase 4 ships a deterministic SHA-256 pseudo-embedding (deterministic but semantically meaningless). T112 wires the path for a real embedding model.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Extend `LLMClient` Protocol** with `async def embed(self, text: str, *, model: str) -> list[float]`.
|
||||
|
||||
2. **Implement on FeatherlessClient**: call the Featherless OpenAI-compatible `/v1/embeddings` endpoint:
|
||||
```python
|
||||
response = await self._http.post(
|
||||
"/v1/embeddings",
|
||||
json={"model": model, "input": text},
|
||||
headers={"Authorization": f"Bearer {self._api_key}"},
|
||||
)
|
||||
data = response.json()
|
||||
return data["data"][0]["embedding"]
|
||||
```
|
||||
Handle rate limits (existing 2-conn semaphore covers this).
|
||||
|
||||
3. **Implement on MockLLMClient**: `embed` pops a canned vector from a new `canned_embeddings` queue. Tests configure this queue.
|
||||
|
||||
4. **Update `generate_embedding`**: when `model != DEFAULT_EMBEDDING_MODEL`, call `client.embed(text, model=model)` instead of falling through to fallback. Wrap in try/except — failures fall back to zero vector (existing fallback path).
|
||||
|
||||
5. **Settings**: add `embedding_model: str = "pseudo-sha256-384"` to `Settings`. App reads this at startup; the embedding worker (`chat/services/embedding_worker.py`) passes it through.
|
||||
|
||||
6. **Backfill script**: add `--re-embed-all` flag that walks ALL memories (regardless of existing `embeddings_meta` rows) and re-embeds with the configured model. Useful for swapping models.
|
||||
|
||||
**Tests:**
|
||||
|
||||
1. `test_embed_routes_to_client_when_non_default_model` — mock client with canned vector; call `generate_embedding(model="bge-small-en-v1.5")`; assert vector matches the canned response.
|
||||
2. `test_embed_falls_back_on_client_failure` — mock client to raise; assert returns zero vector with model="fallback".
|
||||
3. `test_mock_llm_client_embed_pops_canned`.
|
||||
4. `test_featherless_embed_calls_correct_endpoint` (if there's an existing featherless test pattern; otherwise mock the HTTP layer).
|
||||
|
||||
**Commits:**
|
||||
- `feat: LLMClient Protocol gains embed() method (T112.1)`
|
||||
- `feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)`
|
||||
- `feat: generate_embedding routes non-default models through client.embed (T112.3)`
|
||||
- `feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)`
|
||||
|
||||
---
|
||||
|
||||
## Wave 6 — Branching read-side filter (single, BIG)
|
||||
|
||||
### Task 113: Branching read-side filter
|
||||
|
||||
**Files (cross-cutting):**
|
||||
- Modify: `chat/services/turn_common.py::read_recent_dialogue` — filter events to active branch's range
|
||||
- Modify: `chat/services/scene_summarize.py::_read_recent_dialogue` (similar)
|
||||
- Modify: `chat/state/memory.py::search_memories` — memories should be filtered to active branch (memories.event_id from T109 enables this)
|
||||
- Modify: `chat/state/branches.py` — add helper `active_branch_event_ids(conn) -> tuple[int, int]` returning (origin, head)
|
||||
- Add tests across multiple files
|
||||
- Modify: `tests/test_branching.py` — add cross-feature tests
|
||||
|
||||
**Spec:**
|
||||
|
||||
Phase 4 T89 + T94 shipped branching as metadata-only (the table tracks branches; the drawer UI can switch). But event readers DON'T consult `is_active` — they read the entire event_log. So switching branches has no functional effect.
|
||||
|
||||
T113 wires the filter:
|
||||
|
||||
1. **Helper** `active_branch_event_ids(conn) -> tuple[int, int]`: returns `(origin_event_id, head_event_id)` for the currently active branch. For "main" with origin=0 + head=N, returns `(0, N)` meaning "all events visible".
|
||||
|
||||
2. **Apply filter** in every event reader that returns historical state:
|
||||
- `read_recent_dialogue`: WHERE clause adds `id BETWEEN ? AND ?` (the active branch's range).
|
||||
- `search_memories`: WHERE clause adds `m.event_id BETWEEN ? AND ?` (uses T109's column).
|
||||
- `scene_summarize._read_recent_dialogue`: same as turn_common.
|
||||
- Other readers TBD — grep for `event_log` SELECT patterns and audit each one.
|
||||
|
||||
3. **Branches that diverge**: when branch B is created from event 10 and then accumulates events 11-15 (which only exist on B's timeline), but main also accumulates 11-12, the events overlap by id range. This is OK because event reads filter by `id <= active_branch.head_event_id`. The simpler model: branches share event_log ids globally, but each branch's "head" defines which ids are visible.
|
||||
|
||||
4. **Events written under branch B** carry an implicit branch tag — but the event_log table has no `branch_id` column today. T113 punts on cross-branch event writes (they all land in the global log) and relies on the `head_event_id` filter to scope reads. This is a Phase 4.5+ first cut; full branch-isolated event_log is Phase 5+.
|
||||
|
||||
**Edge cases:**
|
||||
|
||||
- Active branch has `head_event_id = 0` (just created): readers return empty.
|
||||
- No active branch: readers fall through to "all events visible" (defensive).
|
||||
- Switching branches mid-flight: each `read_recent_dialogue` call re-queries `active_branch`, so it's always current. No caching.
|
||||
|
||||
**Tests:** 5+ minimum.
|
||||
|
||||
1. `test_read_recent_dialogue_respects_active_branch_head` — seed 10 events; active branch head = 5; assert only first 5 returned.
|
||||
2. `test_search_memories_respects_active_branch_head` — same.
|
||||
3. `test_branch_switch_changes_visible_events` — switch branches; immediately read; assert different result sets.
|
||||
4. `test_main_branch_with_head_zero_returns_empty` — defensive.
|
||||
5. `test_no_active_branch_falls_through_to_all_events` — defensive.
|
||||
|
||||
**Commit:** `feat: branching read-side filter — event readers consult active branch range (T113)`.
|
||||
|
||||
**This is the largest task in Phase 4.5.** Estimate 200-400 lines across multiple files. Implementer should split commits if it helps clarity (one per affected reader).
|
||||
|
||||
---
|
||||
|
||||
## Wave 7 — Lifecycle rollback in regenerate (single)
|
||||
|
||||
### Task 114: Lifecycle rollback
|
||||
|
||||
**Files:**
|
||||
- Modify: `chat/services/regenerate.py`
|
||||
- Modify: `chat/db/migrations/0014_phase45_schema.sql` (T109's migration) — add column? OR
|
||||
- Add new migration — see decision below
|
||||
- Modify: tests in `tests/test_regenerate.py`
|
||||
|
||||
**Spec:**
|
||||
|
||||
Phase 3.5 T83.4 shipped a warning log when regenerate detects un-rolled-back lifecycle transitions. T114 implements actual rollback.
|
||||
|
||||
**Schema decision:**
|
||||
|
||||
Option A: extend lifecycle event payloads with `triggered_by_assistant_turn_id` (no schema change needed — just a payload convention). Production code (T61 turn flow) populates it when emitting `event_started`/`event_completed`/`event_cancelled`. Existing rows have NULL — rollback skips them with a debug log.
|
||||
|
||||
Option B: add a column to `event_log` for stronger invariants. Significant migration cost.
|
||||
|
||||
**Recommended:** Option A. Safer, no migration, backward compatible (older events skip rollback). Document in commit body.
|
||||
|
||||
**Rollback semantics:**
|
||||
|
||||
When regenerate detects lifecycle events triggered by the superseded turn:
|
||||
- `event_started` → emit `event_cancelled` (or a NEW `event_started_undone` event kind that reverts status to "planned") with the same event_id.
|
||||
- `event_completed` → emit `event_uncompleted` (NEW event kind that reverts status from "completed" to "active").
|
||||
- `event_cancelled` → emit `event_uncancelled` (reverts to prior status — which we'd need to track; or simpler: emit `event_started` again to restore "active").
|
||||
|
||||
**Simpler approach (recommended):** add ONE new event kind `event_status_reverted` with payload `{event_id, prior_status}`. The projector sets `events.status = prior_status` for the event_id. Rollback emits this event for each affected lifecycle transition, looking up the prior status from the row's history (via event_log scan) or accepting it as a payload field.
|
||||
|
||||
**Production code change:** in `chat/web/turns.py::post_turn` (and `chat/services/regenerate.py`), when emitting `event_started`/`event_completed`/`event_cancelled`, populate `triggered_by_assistant_turn_id: <id>` in the payload. Forward-only — older code doesn't need updating.
|
||||
|
||||
**Tests:** 3 minimum.
|
||||
|
||||
1. `test_regenerate_rolls_back_event_started_from_superseded_turn` — seed an event; play a turn that starts it; regenerate; assert `event_status_reverted` event landed with `prior_status="planned"` and the events row is back to "planned".
|
||||
2. `test_regenerate_rolls_back_event_completed_to_active` — same but completed → active rollback.
|
||||
3. `test_regenerate_skips_events_without_back_reference` — older events without `triggered_by_assistant_turn_id` are not rolled back (debug log). Pin the backward-compat behavior.
|
||||
|
||||
**Commits:**
|
||||
- `feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)`
|
||||
- `feat: event_status_reverted event kind + projector handler (T114.2)`
|
||||
- `feat: regenerate rolls back lifecycle transitions on supersede (T114.3)`
|
||||
|
||||
---
|
||||
|
||||
## Wave 8 — sqlite-vec swap (single, ENVIRONMENTAL)
|
||||
|
||||
### Task 115: sqlite-vec swap (optional)
|
||||
|
||||
**Files:**
|
||||
- Create: `chat/db/migrations/0015_vec0_virtual_tables.sql`
|
||||
- Modify: `chat/db/connection.py` (load extension on every connection)
|
||||
- Modify: `chat/services/vector_search.py` (rewrite to use vec0 MATCH instead of pure-Python cosine)
|
||||
- Modify: `chat/state/embeddings.py` (writer needs to populate vec0 table)
|
||||
- Modify: `pyproject.toml` (add `sqlite-vec` dependency)
|
||||
|
||||
**Pre-flight:**
|
||||
|
||||
This task REQUIRES one of:
|
||||
- Python rebuilt with `--enable-loadable-sqlite-extensions` (pyenv reinstall).
|
||||
- `apsw` migration of `chat/db/connection.py`.
|
||||
|
||||
If neither is feasible at the time of execution: SKIP THIS TASK and document the deferral in T118 docs sweep. The other 13 Phase 4.5 tasks ship without it.
|
||||
|
||||
**Spec:**
|
||||
|
||||
1. **Migration** `0015_vec0_virtual_tables.sql`:
|
||||
```sql
|
||||
CREATE VIRTUAL TABLE embeddings_vec USING vec0(
|
||||
memory_id INTEGER PRIMARY KEY,
|
||||
embedding FLOAT[384]
|
||||
);
|
||||
-- Backfill from existing JSON embeddings table.
|
||||
INSERT INTO embeddings_vec (memory_id, embedding)
|
||||
SELECT memory_id, vec_f32(vector_json) FROM embeddings;
|
||||
```
|
||||
|
||||
2. **`chat/db/connection.py`** loads `sqlite_vec` extension on every connection:
|
||||
```python
|
||||
import sqlite_vec
|
||||
def open_db(...):
|
||||
conn = sqlite3.connect(...)
|
||||
conn.enable_load_extension(True)
|
||||
sqlite_vec.load(conn)
|
||||
conn.enable_load_extension(False)
|
||||
...
|
||||
```
|
||||
|
||||
3. **Rewrite `vector_search.py`** to use `embeddings_vec MATCH ?` syntax with `k=?` clause:
|
||||
```sql
|
||||
SELECT m.id, m.pov_summary, m.significance, e.distance
|
||||
FROM embeddings_vec e
|
||||
JOIN memories m ON m.id = e.memory_id
|
||||
WHERE e.embedding MATCH ? AND k = ?
|
||||
AND m.owner_id = ?
|
||||
AND m.witness_<role> = 1
|
||||
ORDER BY e.distance ASC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
4. **HNSW note**: vec0 supports both flat (default) and HNSW indexes. T115 ships flat (sufficient for < few thousand memories). Document HNSW upgrade path in CLAUDE.md if memory counts ever grow past pure-Python feasibility.
|
||||
|
||||
5. **Old `embeddings` JSON table**: keep alongside `embeddings_vec` (data redundancy is fine; the JSON table is the source of truth and `embeddings_vec` is the index). Backfill on migration. Keep the `embedding_indexed` projector populating both.
|
||||
|
||||
**Tests:** rewrite `tests/test_vector_search.py` to expect new behavior. Same observable contract — only implementation changes. All 5 existing tests should pass post-swap.
|
||||
|
||||
**Commit:** `feat: sqlite-vec swap (vec0 virtual tables + MATCH-based search) (T115)`.
|
||||
|
||||
---
|
||||
|
||||
## Wave 9 — Polish (parallel, 3 tasks)
|
||||
|
||||
### Task 116: Structured test-fixture builder
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/fixtures.py` (or extend `tests/conftest.py`)
|
||||
- Modify: existing test files that use brittle canned-queue arrays (selectively)
|
||||
|
||||
**Spec:**
|
||||
|
||||
Phase 3 carry-over. Tests across `test_turn_flow.py`, `test_meanwhile_turn_flow.py`, `test_phase3_integration.py`, `test_phase4_integration.py` use positional canned-response arrays for `MockLLMClient`. Adding a new classifier call to a code path requires updating canned arrays in many tests.
|
||||
|
||||
**Solution:** structured fixture builder that lets tests declare their classifier expectations by name, not position:
|
||||
|
||||
```python
|
||||
# tests/fixtures.py
|
||||
class CannedQueue:
|
||||
def __init__(self):
|
||||
self._queue = []
|
||||
def parse_turn(self, **fields): ...
|
||||
def state_update(self, **fields): ...
|
||||
def detect_scene_close(self, should_close: bool): ...
|
||||
def detect_event_transitions(self, transitions: list[dict]): ...
|
||||
def summarize_scene(self, summary: str, **fields): ...
|
||||
def detect_threads(self, candidates: list[dict]): ...
|
||||
# ... one method per classifier service
|
||||
def build(self) -> list[str]:
|
||||
return [json.dumps(item) for item in self._queue]
|
||||
```
|
||||
|
||||
Usage:
|
||||
|
||||
```python
|
||||
def test_post_turn_with_event_transition(...):
|
||||
canned = (
|
||||
CannedQueue()
|
||||
.parse_turn(intent="narrative")
|
||||
.narrative("BotA speaks.") # narrative is a stream, but for simplicity treat it like a canned response
|
||||
.state_update(affinity_delta=0, trust_delta=0)
|
||||
.state_update(affinity_delta=0, trust_delta=0)
|
||||
.detect_event_transitions([{"event_id": "evt_1", "new_status": "completed"}])
|
||||
.detect_scene_close(should_close=False)
|
||||
.build()
|
||||
)
|
||||
mock = MockLLMClient(canned=canned)
|
||||
# ...
|
||||
```
|
||||
|
||||
**Migration scope:** don't migrate ALL existing tests at once — that's a separate massive refactor. Instead, ship the fixture builder + migrate 2-3 representative tests as proof of concept. Document the migration path in the fixture's docstring.
|
||||
|
||||
**Tests:** the fixture builder itself doesn't need extensive testing — it's just a builder. Add 1-2 sanity tests that the JSON output matches expected shapes.
|
||||
|
||||
**Commit:** `test: structured CannedQueue fixture builder for classifier mocks (T116)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 117: Phase 4.5 cross-feature integration tests
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_phase45_integration.py`
|
||||
|
||||
**Spec:**
|
||||
|
||||
End-to-end multi-feature flows specific to Phase 4.5 changes. 5 tests minimum.
|
||||
|
||||
1. **Real embedding swap + retrieval** — configure `embedding_model="bge-small-en-v1.5"` (mocked); write a memory; backfill or wait for worker; assert vector search returns the memory via `client.embed`-derived vector (not pseudo).
|
||||
|
||||
2. **Branching read-side filter end-to-end** — create a branch from turn 5; switch; play 3 turns on the branch; switch back to main; assert main's recent dialogue is missing the branch turns (read filter respects active branch's head).
|
||||
|
||||
3. **Lifecycle rollback** — start an event via a turn; regenerate that turn; assert lifecycle reverted (event back to "planned").
|
||||
|
||||
4. **Search deep-link** — write memories; search; click a result; verify the chat page renders with the right turn anchored (assert via TestClient response — either the browser anchor OR a server-side scroll-to-anchor mechanism).
|
||||
|
||||
5. **Bulk significance re-rate end-to-end** — seed 5 memories at significance 0; bulk re-rate via drawer; verify significance histogram updates.
|
||||
|
||||
**Commit:** `test: phase 4.5 cross-feature integration coverage (T117)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 118: Phase 4.5 documentation update
|
||||
|
||||
**Files:**
|
||||
- Modify: `CLAUDE.md`
|
||||
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 entries — though they're already shipped per Phase 4 T102)
|
||||
|
||||
**Spec:**
|
||||
|
||||
Mirror the Phase 3.5 / 2.5 status sections. Document:
|
||||
|
||||
- All shipped items per task (T103–T117).
|
||||
- Empty out the Phase 4.5 / 5 backlog (replace with single "All items shipped" line).
|
||||
- Add new "Phase 5 backlog" section if any Phase 4.5 reviews surfaced new follow-ups.
|
||||
|
||||
**Phase 5 backlog candidates** (default, if no new follow-ups discovered):
|
||||
|
||||
- Vector index optimization (HNSW) when memory counts grow past flat-index feasibility.
|
||||
- Branch-isolated event_log (each branch has its own physical event_log range vs the current shared id space + head filter).
|
||||
- Embedding model swap migration tooling — when changing models, need to re-embed everything; T112 added `--re-embed-all` but a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
|
||||
- Real-time collaborative branching (multi-user) — out of scope for v1.
|
||||
- Avatars / portraits (multimodality) — deferred indefinitely per design §14.
|
||||
|
||||
**Commit:** `docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118)`.
|
||||
|
||||
---
|
||||
|
||||
## Wrap-up
|
||||
|
||||
After Wave 9 lands:
|
||||
|
||||
1. **Run full suite** on `phase-4.5`: should be ~430+ tests passing (413 from Phase 4 + ~20 new across Phase 4.5).
|
||||
2. **Manual smoke** (recommended before opening the PR):
|
||||
- Configure `embedding_model="bge-small-en-v1.5"` (or whatever real model is chosen); restart server; play a turn; verify `embedding_indexed` events use the real model and search returns semantically-relevant memories.
|
||||
- Create a branch, switch, play turns, switch back — verify main's history is unaffected.
|
||||
- Plan an event, complete it via a turn, regenerate that turn — verify event reverts to "planned".
|
||||
- Use cross-chat search; click a result; verify it lands on the right turn in the chat page.
|
||||
- Bulk re-rate a chat's significance distribution.
|
||||
3. **Push `phase-4.5`** to gitea.
|
||||
4. **Open PR** `phase-4.5 → main`.
|
||||
|
||||
---
|
||||
|
||||
## Notes for the controller running this plan
|
||||
|
||||
- **T115 (sqlite-vec swap)** is environmental. If pre-flight fails (no rebuilt Python, no apsw), defer to Phase 5 and ship Phase 4.5 with 13 tasks. T118 docs sweep should note the deferral.
|
||||
- **T112 (real embedding swap)** assumes Featherless or similar exposes an `/v1/embeddings` endpoint. If not available, document the gap and ship the Protocol + Mock impl only (Featherless impl deferred). The pseudo path remains the default in that case — same as Phase 4.
|
||||
- **T113 (branching read-side filter)** is the riskiest task. Cross-cutting. Land it on a quiet branch, test thoroughly. If integration tests break in unexpected ways, bisect the affected reader and add coverage.
|
||||
- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
|
||||
- **Token-spend rough estimate**: Phase 4.5 should be ~50% the size of Phase 4 (similar number of tasks, mostly smaller). Big tasks (T112, T113, T114) bring the per-task spend up but parallelism in Wave 1 + Wave 9 brings the wall-clock down.
|
||||
- **DO NOT break existing v1/v2/v3/v3.5/v4 surface contracts.** Every test file that was green at the start of Phase 4.5 must stay green at the end. The cross-feature integration tests (`tests/test_phase4_integration.py`, `tests/test_phase3_integration.py`) are particularly load-bearing.
|
||||
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md",
|
||||
"tasks": [
|
||||
{"id": 103, "subject": "T103: branches polish (global-leak doc + branch-switch warning)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 104, "subject": "T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 105, "subject": "T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 106, "subject": "T106: search.py polish (k constant + N+1 batched lookups)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 107, "subject": "T107: embeddings.py timeout_s fallback-path logging", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 108, "subject": "T108: scene-close-on-cancel UX revisit (regression test pin + rationale doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
|
||||
{"id": 109, "subject": "T109: 0014 schema migration (FK CASCADE + memories.event_id column)", "status": "pending", "wave": 2, "parallelGroup": null},
|
||||
{"id": 110, "subject": "T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [109]},
|
||||
{"id": 111, "subject": "T111: search UX (FTS snippet highlighting + deep-link via memories.event_id)", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [109]},
|
||||
{"id": 112, "subject": "T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + routing)", "status": "pending", "wave": 5, "parallelGroup": null},
|
||||
{"id": 113, "subject": "T113: branching read-side filter (event readers consult is_active branch range)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [109]},
|
||||
{"id": 114, "subject": "T114: regenerate lifecycle rollback (back-reference + event_status_reverted)", "status": "pending", "wave": 7, "parallelGroup": null},
|
||||
{"id": 115, "subject": "T115: sqlite-vec swap (vec0 virtual tables + MATCH search) [ENVIRONMENTAL — may defer]", "status": "pending", "wave": 8, "parallelGroup": null},
|
||||
{"id": 116, "subject": "T116: structured CannedQueue test fixture builder", "status": "pending", "wave": 9, "parallelGroup": "wave-9"},
|
||||
{"id": 117, "subject": "T117: phase 4.5 cross-feature integration tests", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]},
|
||||
{"id": 118, "subject": "T118: phase 4.5 docs sweep — prune backlog, capture phase 5 candidates", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]}
|
||||
],
|
||||
"lastUpdated": "2026-04-27T00:00:00Z",
|
||||
"notes": "16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Wave 1 (6-way parallel) and Wave 9 (3-way parallel) maximize parallelism. Waves 2-8 are single-task by hot-file constraint. T115 (sqlite-vec swap) requires Python rebuild OR apsw migration — environmental; may defer to Phase 5. Schema baseline 13 -> 14 (T109's 0014) -> optionally 15 (T115's 0015). Big tasks: T112 (real embedding swap), T113 (branching read-side filter — riskiest), T114 (lifecycle rollback). Uses task ids T103-T118."
|
||||
}
|
||||
Reference in New Issue
Block a user