Phase 4: vector retrieval, branching, drawer polish #6

Merged
dohertj2 merged 41 commits from phase-4 into main 2026-04-27 04:10:26 -04:00
Owner

Summary

Phase 4 polish per requirements doc §13: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review, surgical delete with cascade preview, hide-from-view soft delete, and cross-chat search. Plus the 3 Phase 3.6 carry-over fixes.

Notable pivot: sqlite-vec was deferred — host Python build doesn't support enable_load_extension. Phase 4 ships pure-Python cosine similarity over a JSON-blob embeddings table. For single-user scale (< few thousand memories per chat), Python iteration is sub-millisecond and the API stays stable for a Phase 4.5+ swap.

What shipped (15 tasks, 8 waves)

  • Wave 1 (parallel) — schema + carry-overs: T88 embeddings table; T89 branches table; T90 phase 3.6 fixes (read_recent_dialogue SQL pushdown + lifecycle warning + legacy fn removal)
  • Wave 2 (parallel) — services: T91 embedding generation (deterministic SHA-256 pseudo-embedding for Phase 4 first cut); T92 vector search; T93 cross-chat FTS search
  • Wave 3 (parallel) — services: T94 branching service; T95 delete-impact computation (no DB mutation)
  • Wave 4 (single) — T96 combined FTS+vector retrieval via reciprocal-rank fusion (RRF, RRF_CONST=60)
  • Wave 5 (single) — T97 EmbeddingWorker + memory_write hook + backfill script + ALL 4 production call sites wired (turns, regenerate, meanwhile, drawer)
  • Wave 6 (single) — T98 drawer Phase 4 bundle (5 sub-features): branching UI, significance review, hide-from-view + turn_hidden manual_edit branch, surgical delete with cascade preview, narrative_anchor + weather edits
  • Wave 7 (parallel) — T99 snapshot UX (manual trigger + list + restore + preview); T100 cross-chat search UX (top-bar + results page)
  • Wave 8 (parallel) — T101 cross-feature integration tests; T102 docs

Architecture notes

  • Schema migrations: 0012_embeddings, 0013_branches. Schema baseline now version 13.
  • Embeddings stored as JSON arrays (pure-Python cosine); sqlite-vec deferred to Phase 4.5+.
  • Branching is data-model + UI; read-side filter (is_active consultation by event readers) deferred to Phase 4.5+.
  • Embedding pipeline is async via worker queue mirroring SignificanceWorker pattern.
  • Backward compatible: 343 of 413 tests are unchanged from Phase 3.5 baseline; 70 new tests added.

Test plan

  • Full pytest suite: 413 passing (343 Phase 3.5 + 70 Phase 4). 0 failures.
  • Each task verified in own worktree before merge.
  • Each task reviewed for spec compliance + code quality.
  • Cross-feature integration tests (T101) pin 5 multi-feature scenarios.
  • Manual smoke (recommend before merging):
    • Run `scripts/backfill_embeddings.py` against a seeded DB
    • Multi-tab regenerate continues to swap live (Phase 3.5 T86 still functional)
    • Trigger surgical delete → verify cascade preview matches actual removal
    • Hide a turn → verify it disappears from chat surface
    • Top-bar search for a phrase → verify cross-chat results appear
    • Manual snapshot via `/snapshots` page → restore from it

Phase 4.5 / 5 backlog (tracked in CLAUDE.md)

  • sqlite-vec swap when host Python supports loadable extensions
  • Real embedding model (currently SHA-256 pseudo-embedding for pipeline integration only)
  • Branching read-side filter — events readers don't yet consult is_active
  • Bulk significance re-rate in drawer
  • Various UX polish items per T98/T99/T100 reviews
  • Carry-overs: scene-close-on-cancel revisit, canned-queue brittleness, full lifecycle rollback

Plan

`docs/plans/2026-04-27-v4-phase4-implementation.md` (committed in bffd9a2).

## Summary Phase 4 polish per requirements doc §13: vector retrieval, branching UI, drawer-edit on every field, backup tooling, significance review, surgical delete with cascade preview, hide-from-view soft delete, and cross-chat search. Plus the 3 Phase 3.6 carry-over fixes. **Notable pivot:** sqlite-vec was deferred — host Python build doesn't support `enable_load_extension`. Phase 4 ships **pure-Python cosine similarity** over a JSON-blob embeddings table. For single-user scale (< few thousand memories per chat), Python iteration is sub-millisecond and the API stays stable for a Phase 4.5+ swap. ## What shipped (15 tasks, 8 waves) - **Wave 1 (parallel)** — schema + carry-overs: T88 embeddings table; T89 branches table; T90 phase 3.6 fixes (read_recent_dialogue SQL pushdown + lifecycle warning + legacy fn removal) - **Wave 2 (parallel)** — services: T91 embedding generation (deterministic SHA-256 pseudo-embedding for Phase 4 first cut); T92 vector search; T93 cross-chat FTS search - **Wave 3 (parallel)** — services: T94 branching service; T95 delete-impact computation (no DB mutation) - **Wave 4 (single)** — T96 combined FTS+vector retrieval via reciprocal-rank fusion (RRF, RRF_CONST=60) - **Wave 5 (single)** — T97 EmbeddingWorker + memory_write hook + backfill script + ALL 4 production call sites wired (turns, regenerate, meanwhile, drawer) - **Wave 6 (single)** — T98 drawer Phase 4 bundle (5 sub-features): branching UI, significance review, hide-from-view + turn_hidden manual_edit branch, surgical delete with cascade preview, narrative_anchor + weather edits - **Wave 7 (parallel)** — T99 snapshot UX (manual trigger + list + restore + preview); T100 cross-chat search UX (top-bar + results page) - **Wave 8 (parallel)** — T101 cross-feature integration tests; T102 docs ## Architecture notes - Schema migrations: 0012_embeddings, 0013_branches. Schema baseline now version 13. - Embeddings stored as JSON arrays (pure-Python cosine); sqlite-vec deferred to Phase 4.5+. - Branching is data-model + UI; read-side filter (`is_active` consultation by event readers) deferred to Phase 4.5+. - Embedding pipeline is async via worker queue mirroring SignificanceWorker pattern. - Backward compatible: 343 of 413 tests are unchanged from Phase 3.5 baseline; 70 new tests added. ## Test plan - [x] Full pytest suite: 413 passing (343 Phase 3.5 + 70 Phase 4). 0 failures. - [x] Each task verified in own worktree before merge. - [x] Each task reviewed for spec compliance + code quality. - [x] Cross-feature integration tests (T101) pin 5 multi-feature scenarios. - [ ] Manual smoke (recommend before merging): - [ ] Run \`scripts/backfill_embeddings.py\` against a seeded DB - [ ] Multi-tab regenerate continues to swap live (Phase 3.5 T86 still functional) - [ ] Trigger surgical delete → verify cascade preview matches actual removal - [ ] Hide a turn → verify it disappears from chat surface - [ ] Top-bar search for a phrase → verify cross-chat results appear - [ ] Manual snapshot via \`/snapshots\` page → restore from it ## Phase 4.5 / 5 backlog (tracked in CLAUDE.md) - **sqlite-vec swap** when host Python supports loadable extensions - **Real embedding model** (currently SHA-256 pseudo-embedding for pipeline integration only) - **Branching read-side filter** — events readers don't yet consult `is_active` - **Bulk significance re-rate** in drawer - Various UX polish items per T98/T99/T100 reviews - Carry-overs: scene-close-on-cancel revisit, canned-queue brittleness, full lifecycle rollback ## Plan \`docs/plans/2026-04-27-v4-phase4-implementation.md\` (committed in bffd9a2).
dohertj2 added 41 commits 2026-04-27 04:09:57 -04:00
The previous implementation pulled the last N rows in SQL across all
chats and dropped foreign-chat rows in Python. With LIMIT N this could
return far fewer than N relevant rows when other chats had recent
activity. Push the chat_id filter into SQL via json_extract so LIMIT N
always returns N rows scoped to the requested chat.

Test: seeds two chats with 60 turns each interleaved; queries chat_a
with limit=50; asserts exactly 50 chat_a rows returned (was 0 prior to
the fix because chat_b's rows dominated the global tail).
The warning said "lifecycle transitions from superseded turn ARE NOT
being rolled back". When regenerating an OLDER turn, the listed
transitions can include intervening-turn ones that legitimately stand
on their own — they weren't authored by the superseded turn itself.

Reword to "lifecycle transitions at-or-after turn <id>" so operators
reading logs aren't misled into thinking every listed event id was
emitted by the target turn. Cosmetic change to a single log message.

Test: extends test_regenerate_with_prior_lifecycle_logs_warning to
assert the new phrasing is present and the old phrasing is gone.
The Phase 1 single-bot ``record_turn_memory`` lingered next to the
unified ``record_turn_memory_for_present`` introduced in T84. Only test
fixtures still called the legacy entry point.

- Remove ``record_turn_memory`` from ``chat/services/memory_write.py``.
- Update the two test_memory_write.py callers to use
  ``record_turn_memory_for_present(..., guest_bot_id=None)``, which
  produces the same ``[you=1, host=1, guest=0]`` witness mask.

The unified API returns ``dict[bot_id, (event_id, memory_id)]``; tests
extract the host entry. No production callers were affected.
Audit of chat/state/manual_edit.py target_kind dispatch found two §6.4
fields without drawer affordances despite being already-projected text
columns: chat_state.narrative_anchor and chat_state.weather. Both land
via new manual_edit branches (target_kind chat_narrative_anchor and
chat_weather) plus paired drawer routes and Scene-section text inputs.

The container properties_json blob is intentionally deferred — bounded
JSON edits aren't wired through manual_edit and the drawer never
surfaces multiple containers at once, so v1 leaves it out.
Wires T93's `search_all_memories` service into a small read-only HTML
surface so users can find a memory across every chat in the database.

* `chat/web/search.py` (new): GET `/search?q=...` runs the FTS service
  with k=50, hydrates each row with bot name + scene timestamp, and
  renders `search.html`. Empty `q` short-circuits to no results so the
  top-bar form can submit even with an empty input.
* `chat/templates/search.html` (new): empty-state placeholder, results
  list with chat-level "Open chat" links (`/chats/{chat_id}` — memories
  don't carry an event_id today, so no per-turn anchor).
* `chat/templates/layout.html`: append a small `<form>` to the rail
  nav, additive only.
* `chat/app.py`: register `search_router` (additive import + include).
* `tests/test_search_ux.py`: 3 tests — multi-chat results, empty-query
  placeholder, chat link.
dohertj2 merged commit df977fc985 into main 2026-04-27 04:10:26 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/chat#6