41 Commits

Author SHA1 Message Date
Joseph Doherty cdfacdd0c4 merge: T118 phase 4.5 docs sweep — Phase 4.5 status + Phase 5 backlog 2026-04-27 07:04:01 -04:00
Joseph Doherty 5c4356e4e2 merge: T117 phase 4.5 cross-feature integration tests 2026-04-27 07:04:01 -04:00
Joseph Doherty 969f8963bc merge: T116 CannedQueue test fixture builder 2026-04-27 07:04:01 -04:00
Joseph Doherty f71613786b test: phase 4.5 cross-feature integration coverage (T117) 2026-04-27 07:03:56 -04:00
Joseph Doherty 4afaf01de7 test: structured CannedQueue fixture builder for classifier mocks (T116)
Phase 4.5 carry-over from Phase 3. Tests across test_turn_flow.py,
test_meanwhile_turn_flow.py, and the phase3/4 integration suites built
positional canned-response arrays for MockLLMClient — adding a new
classifier call to a code path required updating the array index in
many places.

This change ships tests/fixtures.py with a fluent CannedQueue builder
that lets tests declare classifier expectations by name and call order
instead of by index. Each method appends one item to an internal queue
and returns self for chaining; build() emits the flat list[str] queue
that MockLLMClient(canned=...) already consumes. The mock's contract
is unchanged.

Builder methods cover: parse_turn, detect_addressee, state_update
(with zero_state alias), detect_interjection,
detect_interjection_targeted, detect_scene_close,
detect_event_transitions, summarize_scene_pov, detect_threads,
meanwhile_digest, score_significance, and a narrative() helper for
streaming bot beats. raw() is a passthrough escape hatch.

Migration scope: ship the builder + 2 sanity tests + migrate 3
representative tests in test_turn_flow.py as proof of concept
(test_single_bot_turn_no_guest_regression,
test_turn_with_event_transition_appends_started_event,
test_turn_with_no_active_events_skips_classifier). The remaining
positional-array tests stay as-is; the builder docstring documents
the migration template for Phase 5 work.
2026-04-27 07:03:20 -04:00
Joseph Doherty 5bc9a94734 docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118) 2026-04-27 06:56:20 -04:00
Joseph Doherty cfc05a140c merge: T114 regenerate lifecycle rollback (back-ref + event_status_reverted) 2026-04-27 06:51:13 -04:00
Joseph Doherty 80ce891bd8 feat: regenerate rolls back lifecycle transitions on supersede (T114.3)
Closes the T83.4 gap: when ``regenerate_assistant_turn`` supersedes an
assistant_turn that already produced lifecycle transitions, it now
emits an ``event_status_reverted`` (T114.2) for each transition tagged
with ``triggered_by_assistant_turn_id == original_assistant_event_id``
(T114.1 back-reference) before the regenerated narrative is
reclassified.

Mapping from forward kind to ``prior_status`` lives in
``_PRIOR_STATUS_MAP``:
  - event_started   → planned
  - event_completed → active
  - event_cancelled → active (best-effort default; cancellation can fire
    from either planned or active, but detect_event_transitions only
    surfaces currently-active rows so 'active' is the realistic prior)

Backward compatibility: lifecycle rows authored before T114.1 lack the
back-reference field. Those are skipped (DEBUG log per row) and
collected into a legacy WARNING that preserves the T83.4
observability contract — operators still see un-rolled-back
transitions, just from older logs.

The classify-and-emit pass below the rollback now operates against an
events projection that has already been reverted, so re-firing
``event_started``/``event_completed``/``event_cancelled`` for the
regenerated narrative is safe — no double-emit of promotion artifacts.

Spec tests:
- ``test_regenerate_rolls_back_event_started_from_superseded_turn``
- ``test_regenerate_rolls_back_event_completed_to_active`` (also
  exercises the multi-rollback loop: a turn that fired both a start
  and a completion gets two event_status_reverted rows in id order,
  with active as the final projection — matching the per-row replay
  semantics of the projector)
- ``test_regenerate_skips_events_without_back_reference`` (pins the
  legacy compatibility path with both DEBUG and WARNING expectations)
2026-04-27 06:45:43 -04:00
Joseph Doherty 6d4ad86e33 feat: event_status_reverted event kind + projector handler (T114.2)
Adds the inverse projection used by T114.3's regenerate rollback. The
new ``event_status_reverted`` event kind carries
``{event_id, prior_status}`` and the handler unconditionally sets
``events.status = prior_status`` for the row.

Unlike the forward transitions (event_started / event_completed /
event_cancelled), this handler does NOT guard against terminal
statuses — its entire purpose is to reverse a transition, including
walking back from a terminal status to a non-terminal one. Without
that, rolling back an event_completed (status='completed' is terminal
for the forward handlers) would silently no-op and leave the row in
the post-superseded state.

The handler registers via the existing ``@on(kind)`` decorator pattern
in chat/eventlog/projector.py, so future replays of an event_log that
contains event_status_reverted rows pick it up automatically.

Test exercises completed→active, active→planned, and cancelled→active
round-trips.
2026-04-27 06:39:03 -04:00
Joseph Doherty 7370f68bdf feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)
Phase 3.5 T83.4 surfaced un-rolled-back lifecycle transitions on
regenerate; T114 wires up the actual rollback. Step 1 is the back-
reference: every event_started / event_completed / event_cancelled
emitted by post_turn (chat/web/turns.py) and regenerate
(chat/services/regenerate.py) now carries
``triggered_by_assistant_turn_id`` in its payload, set to the id of
the assistant_turn event that produced the transition.

Schema decision (Option A from the plan): no migration. The field is
a payload convention only — older event_log rows lack it and rollback
will skip them with a debug log when T114.3 lands. Forward-only.

The post_turn lifecycle block already runs AFTER the assistant_turn
event is appended (step 8a vs step 7), so ``primary_assistant_event_id``
is in scope. Same for regenerate: the lifecycle classification (step 9a)
runs after step 6's append. **No emission-order reorder was needed**
in either flow.

Updates ``test_turn_with_event_transition_appends_started_event`` to
assert the new field is present in the emitted event_started payload
and points at the assistant_turn id.
2026-04-27 06:38:48 -04:00
Joseph Doherty f863cf0158 merge: T113 branching read-side filter (active branch range respected by readers) 2026-04-27 06:25:50 -04:00
Joseph Doherty 456f50d334 feat: branching read-side filter — event readers consult active branch range (T113)
Wire the active branch's [origin_event_id, head_event_id] window into
every user-facing event/memory reader so switching branches actually
changes what dialogue and memories the user sees. Phase 4 T89/T94
shipped branches as metadata-only — this closes the loop.

Helper:
- chat/state/branches.py: add `active_branch_event_ids(conn)` returning
  the active branch's id range, with two defensive fall-throughs to
  `(0, BIG_INT)`: (a) no active branch row at all, and (b) the
  bootstrap "main" sentinel (name="main", origin=0, head=0). Production
  never bumps main's head_event_id today, so this preserves existing
  reader behaviour for every test that doesn't explicitly switch.

Readers updated (all user-facing dialogue / retrieval surfaces):
- chat/services/turn_common.py::read_recent_dialogue — chat-history
  prompt context + the chat-view template path (via web/turns.py +
  web/chat.py).
- chat/services/scene_summarize.py::_read_recent_dialogue — scene-close
  per-POV summary input.
- chat/state/memory.py::search_memories — FTS leg filters via
  m.event_id (T109's column); legacy NULL event_id rows are *included*
  unconditionally so the filter doesn't break pre-0014 retrieval. The
  fused (FTS + RRF + vector) path also drops vector hits whose
  event_id falls outside the branch window.
- chat/web/meanwhile.py::_read_recent_meanwhile_dialogue — meanwhile
  prompt context.

Projector queries (chat/state/world.py et al.) and admin/management
surfaces (drawer hide-panel, cross-chat search, regenerate's row
lookups by id) are intentionally NOT branch-filtered: projection must
see the full log to build state correctly, and the admin surfaces
operate across branches by design.

Tests (10 new, 446 total):
- tests/test_branches_state.py: 3 tests for `active_branch_event_ids`
  itself (bootstrap-main, no-active-branch, non-main literal range).
- tests/test_branching.py: 7 cross-feature tests covering the spec's
  five required scenarios plus scene_summarize and meanwhile readers.
2026-04-27 06:25:22 -04:00
Joseph Doherty 757abf24f8 merge: T112 real embedding model swap (Protocol + Mock + routing + backfill) 2026-04-27 06:08:13 -04:00
Joseph Doherty 9b7a6d459f feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)
Adds two new flags to the backfill script:

* --re-embed-all walks **every** memory (not just those without
  an existing embeddings row) and re-emits embedding_indexed
  events. The projector is INSERT OR REPLACE, so re-emitting an event
  for an existing memory replaces the prior vector. Use this when
  swapping embedding models — the default mode still keeps the Phase
  4 gap-fill behavior.
* --model M overrides Settings.embedding_model for this run.

The script also gains a small _build_client helper that returns
None for the pseudo path (no client needed) and a FeatherlessClient
otherwise; tests monkeypatch this to inject a Mock with canned
embeddings.

Adds tests/test_backfill_embeddings.py with three integration
tests: re-embed-all walks every memory, default mode skips existing
rows, and --model overrides the configured model end-to-end.
2026-04-27 06:02:23 -04:00
Joseph Doherty e0a28abbcd feat: generate_embedding routes non-default models through client.embed (T112.3)
When model != DEFAULT_EMBEDDING_MODEL, generate_embedding now
calls client.embed(text, model=model) and wraps the returned
vector in an EmbeddingResult tagged with the requested model.
On any exception (NotImplementedError from providers without an
embeddings endpoint, transient network errors, etc.), the existing
T107 warning fires and the function falls back to the zero-vector
sentinel — callers detect model == 'fallback' and skip indexing.

Adds:
- MockLLMClient accepts a canned_embeddings queue mirroring
  the existing canned pattern. embed() pops from the front;
  empty queue raises IndexError so misconfigured tests fail
  loudly.
- Settings.embedding_model defaults to "pseudo-sha256-384"
  so existing zero-config installs keep Phase 4 behavior. The app
  lifespan now passes this through to EmbeddingWorker.model.

The public signature of generate_embedding is unchanged:
(client, *, text, model=DEFAULT_EMBEDDING_MODEL, dim=..., timeout_s=...).
2026-04-27 05:50:29 -04:00
Joseph Doherty ac6e74ab4c feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)
Implements embed() on FeatherlessClient. Featherless's OpenAI-
compatible surface does NOT expose /v1/embeddings at the time of
writing, so this implementation raises NotImplementedError rather
than issuing a request that would 404. The
chat.services.embeddings.generate_embedding wrapper (T112.3)
catches the exception and degrades to the zero-vector fallback path
(plus the existing T107 warning) — misconfigured callers fail loudly
in logs while the request path keeps working.

If/when Featherless ships embeddings, swap the body for
self._client.embeddings.create(model=..., input=...) guarded by
the existing 2-conn semaphore (mirrors generate/stream). The Protocol
seam in T112.1 is already wired so no other code needs to change.

Adds tests/test_featherless.py pinning the NotImplementedError
contract.
2026-04-27 05:48:34 -04:00
Joseph Doherty 5f16bb575a feat: LLMClient Protocol gains embed() method (T112.1)
Adds async def embed(self, text: str, *, model: str) -> list[float]
to the LLMClient Protocol so Phase 4.5 can wire a real-embedding swap
without changing call sites. Protocol is structural — existing
implementations that don't use it remain compatible; downstream
implementations (FeatherlessClient, MockLLMClient) ship in T112.2 and
T112.3.
2026-04-27 05:47:55 -04:00
Joseph Doherty f05d1e0f21 merge: T111 search UX (FTS snippet + turn deep-link) 2026-04-27 05:42:48 -04:00
Joseph Doherty 9987da2c07 feat: cross-chat search deep-links to turn via memories.event_id (T111.2)
Add ``m.event_id`` (T109's nullable column from migration 0014) to
``search_all_memories``'s SELECT, propagate it through the route's
template context, and have ``search.html`` build result links as
``/chats/{chat_id}#turn-{event_id}`` — matching the ``id="turn-{event_id}"``
anchor that Phase 3.5 T86 stamps on each turn DOM node so the chat page
scrolls to the originating turn on load. Memory rows projected before
the 0014 migration ran read NULL ``event_id``; the template falls back
to a chat-level link in that case so we never emit ``#turn-None``.

Pre-existing tests that asserted on the bare ``href="/chats/{chat_id}"``
contract are updated to assert on the ``href="/chats/{chat_id}#turn-``
prefix to reflect the new deep-link.
2026-04-27 05:42:17 -04:00
Joseph Doherty fa87ab8c55 feat: cross-chat search FTS snippet highlighting (T111.1)
Replace the ``pov_summary`` column in ``search_all_memories``'s SELECT
with ``snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)`` so each
match in a result row is wrapped in ``<mark>`` for the search-results
UI. The original ``pov_summary`` is still returned alongside as a
non-highlighted fallback. Template renders ``r.snippet|safe`` — the only
HTML in the snippet output is the configured ``<mark>`` markers, so it
is safe to bypass Jinja's auto-escape.
2026-04-27 05:30:32 -04:00
Joseph Doherty fae6edef6b merge: T110 drawer Phase 4.5 bundle (event_id guard + html.escape + Jinja partial + bulk re-rate) 2026-04-27 05:26:03 -04:00
Joseph Doherty 2ab8fcbdf0 feat: drawer bulk significance re-rate per chat (T110.4)
The drawer's Significance review panel previously only supported
per-memory edits. Adds a bulk control: pick ``level_from`` and
``level_to``, and every memory in the chat at ``level_from`` is moved
to ``level_to``.

Implementation emits one ``manual_edit`` event per matching memory
(not a single bulk event) so the §6.4 per-row audit trail stays
intact — each affected memory carries its own ``prior_value -> new_value``
snapshot, so an inverse edit can restore an individual row without
needing to inspect a bulk payload's member list. Reuses the existing
``memory_significance`` ``manual_edit`` projector branch (T25), so no
state-layer changes are required.

The route rejects no-op submissions (``level_from == level_to``) with
400 to avoid padding the event log with empty edits, and clamps both
levels to 0..3 (matching ``edit_memory_significance``).

UI: a small ``<details>`` block in the Significance review section
with two number inputs and a submit button.

Test: tests/test_drawer_phase4.py::test_bulk_significance_re_rate_emits_manual_edit_per_memory.
2026-04-27 05:14:59 -04:00
Joseph Doherty 5d5c888acf refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)
The modal HTML was assembled via raw f-string concatenation in
``delete_preview``. Move it to a dedicated Jinja2 partial
(``chat/templates/_delete_impact_modal.html``) and render via
``TEMPLATES.TemplateResponse``. Jinja2 autoescape now handles HTML
safety automatically — the explicit ``html.escape()`` calls added in
T110.2 (and the ``import html``) become redundant and are removed in
this commit.

Net behavioural change: attribute quoting style flips from single to
double quotes (Jinja default) — the existing T98.4 substring-based
assertions are unaffected, and the new T110.3 test pins the
double-quoted shape so future regressions surface.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_uses_jinja_partial.
2026-04-27 05:13:36 -04:00
Joseph Doherty a45a33534f fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)
The delete-impact modal is built via raw f-string concatenation from the
ImpactReport — item.kind / item.description / report.notes ultimately
embed user-controllable content (turn prose, scene timestamps). A turn
with prose like "<script>alert(1)</script>" would reach the rendered
HTML verbatim. Currently safe (the fields embedded today are bounded
strings) but defense-in-depth — wrap with html.escape() so future
description changes can't smuggle markup through.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_escapes_user_controllable_strings.
2026-04-27 05:12:28 -04:00
Joseph Doherty f3827706df fix: drawer delete_turn guards event_id <= 0 (T110.1)
A stale tab or hand-crafted request posting event_id=0 to the surgical
delete route would compute after_event_id=-1 and silently truncate the
entire log. Now rejected with 400.

SQLite assigns event_log ids starting at 1, so any legitimate id is
always >= 1 — non-positive values can only indicate a client bug.

Test: tests/test_drawer_phase4.py::test_delete_turn_with_event_id_zero_returns_400.
2026-04-27 05:11:39 -04:00
Joseph Doherty 2afbb9fefc merge: T109 schema 0014 — memories.event_id column 2026-04-27 05:01:17 -04:00
Joseph Doherty 1f8b4d2078 feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109) 2026-04-27 05:00:57 -04:00
Joseph Doherty 3f1a284acb merge: T108 scene-close-on-cancel strengthen test + rationale 2026-04-27 04:48:00 -04:00
Joseph Doherty 87f93f00b5 merge: T107 embeddings.py fallback warning 2026-04-27 04:48:00 -04:00
Joseph Doherty d1e2902655 merge: T106 search.py N+1 batching + k constant 2026-04-27 04:48:00 -04:00
Joseph Doherty 54dfa8d611 merge: T105 snapshots.py polish 2026-04-27 04:48:00 -04:00
Joseph Doherty 5d36d3456f merge: T104 memory.py DRY MAX(id) + fts_rank doc 2026-04-27 04:48:00 -04:00
Joseph Doherty 0e9421dcf7 merge: T103 branches polish (global-leak doc + unknown-name warning) 2026-04-27 04:48:00 -04:00
Joseph Doherty baffeb3a44 chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)
Investigation surfaced a transactional bug in the cancel path: when the
primary stream raises asyncio.CancelledError mid-stream, post_turn
re-raises at end-of-function, and open_db's dependency teardown skips
conn.commit() — rolling back ALL post-cancel writes including the
scene_closed event. The existing T74.3 regression test only passes
because asyncio is not imported at module scope, so CancelledError
becomes NameError (caught by except Exception, leaves cancelled=False).
Documented in turns.py + test docstring; deferred for triage.
2026-04-27 04:47:26 -04:00
Joseph Doherty 29b7c90b29 chore: embeddings.py warns on fallback for non-default models (T107) 2026-04-27 04:47:17 -04:00
Joseph Doherty 64c9ca634a chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105) 2026-04-27 04:47:14 -04:00
Joseph Doherty 374a76c867 chore: branches polish — global-leak docs + unknown-name warning (T103) 2026-04-27 04:34:32 -04:00
Joseph Doherty b65e1e1098 chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104) 2026-04-27 04:34:28 -04:00
Joseph Doherty 996a16cfb5 perf: search.py N+1 batching + k constant extraction (T106) 2026-04-27 04:34:18 -04:00
Joseph Doherty a06f90a164 docs: add Phase 4.5 cleanup plan (all 24 backlog items)
16 tasks across 9 waves consolidating all 24 items in CLAUDE.md
Phase 4.5/5 backlog. Mix of:

- Wave 1 (parallel 6-way): trivial polish across 6 different files
- Wave 2 (single): schema migration 0014 (FK CASCADE + memories.event_id)
- Wave 3 (single): drawer bundle (event_id guard + html.escape + modal
  partial + bulk significance re-rate)
- Wave 4 (single): search UX (FTS snippet highlight + deep-link)
- Wave 5 (single): real embedding model swap (LLMClient.embed protocol)
- Wave 6 (single): branching read-side filter (riskiest — cross-cutting)
- Wave 7 (single): regenerate lifecycle rollback
- Wave 8 (single): sqlite-vec swap [ENVIRONMENTAL — may defer to Phase 5
  if Python rebuild / apsw not feasible]
- Wave 9 (parallel 3-way): structured fixture builder + integration tests + docs

Schema baseline 13 -> 14 (or 15 with T115). Big tasks (T112 real embed,
T113 branching filter, T114 lifecycle rollback) advance the engine
beyond Phase 4's metadata-only state. T115 environmental decision
captured in pre-flight; the other 13 tasks ship without it.

Uses task ids T103-T118 to avoid collision with prior phases.
2026-04-27 04:22:08 -04:00
dohertj2 df977fc985 Merge pull request 'Phase 4: vector retrieval, branching, drawer polish' (#6) from phase-4 into main 2026-04-27 04:10:25 -04:00
46 changed files with 4645 additions and 207 deletions
+39 -44
View File
@@ -322,53 +322,48 @@ Phase 4 polish shipped end-to-end across 15 tasks (T88T102). Vector retrieval
### Phase 4.5 / 5 backlog
New follow-ups discovered during Phase 4 reviews and execution. None are blocking; pick up at any time.
All items shipped or deferred to Phase 5 (see "Phase 5 backlog" below). Final schema version: 14.
#### From T88 review
## Phase 4.5 status
- **`embeddings` FK lacks `ON DELETE CASCADE`**: deindex events are the only deletion path; if memories ever get deleted directly (raw SQL), embedding rows orphan. Defensible since projector model uses explicit deindex events, but worth a comment or `ON DELETE CASCADE` addition.
Phase 4.5 cleanup shipped 13 of 14 planned tasks (T103T117 with T115 deferred; T118 is this docs sweep). Two CLAUDE.md backlogs (Phase 3.6/4, Phase 4.5/5) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 5 backlog" section below. Schema baseline advanced from version 13 to **14** (migration 0014: `memories.event_id`). Test count grew from ~413 (Phase 4) to ~457 (+~44 new tests across the wave).
#### From T89 review
- **Wave 1 — trivial polish (parallel)**:
- **T103** branches polish — global-branch (`chat_id IS NULL`) leak documented in `list_branches`; branch-switch to nonexistent name now logs a warning.
- **T104** `memory.py` DRY — `MAX(id)` helper extracted; `fts_rank=None` contract documented for vector-only rows.
- **T105** `snapshots.py` polish — `datetime`/`timezone` imports hoisted to module level; strict `kind` validation in restore/preview (rejects missing); `created_at` from file mtime documented.
- **T106** `search.py` polish — `k=50` extracted to module constant; N+1 `get_bot`/`get_chat`/`get_scene` lookups batched.
- **T107** `embeddings.py``timeout_s` fallback-path warning when non-default model misconfigured.
- **Wave 2 — scene-close-on-cancel (single)**:
- **T108** strengthened the T74.3 regression test + documented rationale in `turns.py`. **Surfaced a deferred bug**: existing pin only passes because `asyncio` isn't imported in the test module (NameError caught instead of CancelledError). When CancelledError fires for real, `post_turn`'s end-of-function re-raise causes `open_db`'s dependency teardown to skip `conn.commit()`, rolling back ALL post-cancel writes. Documented and deferred to Phase 5 triage.
- **Wave 3 — schema 0014 (single)**:
- **T109** `memories.event_id` column (foundation for T111 deep-link). FK CASCADE on `embeddings.memory_id` deferred (memories rows are never deleted today; defensive constraint can't fire — saved for broader migration cleanup in Phase 5).
- **Wave 4 — drawer Phase 4.5 bundle (single)**:
- **T110** `event_id <= 0` guard in `delete_turn` + `html.escape()` on delete-impact modal + Jinja partial extraction + bulk significance re-rate per chat (one `manual_edit` event per memory).
- **Wave 5 — search UX (single)**:
- **T111** FTS snippet highlighting via `snippet()` + deep-link to turn via `memories.event_id`.
- **Wave 6 — real embedding model swap (single)**:
- **T112** `LLMClient.embed()` Protocol + Mock impl with `canned_embeddings` + `FeatherlessClient.embed()` (raises `NotImplementedError` — Featherless OAI-compat doesn't expose embeddings, gap documented) + `generate_embedding` routes non-default models through `client.embed()` with fallback + `--re-embed-all` backfill flag.
- **Wave 7 — branching read-side filter (single)**:
- **T113** `active_branch_event_ids(conn)` helper + applied to `read_recent_dialogue`, `scene_summarize._read_recent_dialogue`, `search_memories`, and `meanwhile._read_recent_meanwhile_dialogue`. Cross-chat search and projector queries deliberately NOT filtered (cross-chat is by design; projectors must see full log). Bootstrap "main" branch (origin=0, head=0) detected as the no-clamp sentinel.
- **Wave 8 — regenerate lifecycle rollback (single)**:
- **T114** `triggered_by_assistant_turn_id` payload back-reference on `event_started`/`event_completed`/`event_cancelled` + new `event_status_reverted` event kind + projector handler in `chat/state/events.py` + regenerate flow emits revert events for affected lifecycle transitions.
- **Wave 9 — final polish + integration (parallel)**:
- **T115** sqlite-vec swap — **DEFERRED to Phase 5**. Pre-flight failed: host Python build doesn't expose `sqlite3.Connection.enable_load_extension` (raises `AttributeError`). Requires either Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Phase 4 pure-Python cosine remains in production.
- **T116** structured `CannedQueue` test fixture builder + 23 POC test migrations (Phase 5 to migrate the rest).
- **T117** Phase 4.5 cross-feature integration tests (5 minimum: real embedding swap, branching read-side filter, lifecycle rollback, search deep-link, bulk significance re-rate).
- **T118** documentation (this section).
- **`list_branches(chat_id=...)` filter leaks global branches** (`chat_id IS NULL`) into every chat scope. Intentional? Document.
- **Branch-switch to nonexistent silently leaves zero active branches** — log a warning when this would happen.
### Phase 5 backlog
#### From T91 review
New follow-ups discovered during Phase 4.5 reviews and execution, plus carry-over deferrals. None are blocking; pick up at any time.
- **Real embedding model swap**: Phase 4 ships pseudo-embedding (deterministic SHA-256 hash). Phase 4.5+ should swap to a real model (Featherless `bge-small-en-v1.5` if available; or local `sentence-transformers/all-MiniLM-L6-v2`). The 384-dim is hardcoded in `0012_embeddings.sql`; if dim changes, migrate first.
- **`timeout_s` unused on pseudo path** — fine, but log when non-default model falls through to fallback so misconfigured callers don't silently degrade.
#### From T96 review
- **Duplicate `MAX(id)` lookup** between `_composite_rerank` and the fused-path tail — DRY follow-up.
- **`fts_rank=None` for vector-only rows** — document downstream contract.
#### From T98 review
- **`event_id <= 0` guard in `delete_turn`** — currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: 400`.
- **`html.escape()` on `compute_delete_impact` output rendered into the modal** — defense in depth (currently model-controlled strings, but if event payload fields ever appear in descriptions, autoescape needed).
- **Extract delete-impact modal HTML to a Jinja partial** — testability + autoescape inheritance.
#### From T99 review
- **Hoist `datetime`/`timezone` imports to module level** in `chat/web/snapshots.py`.
- **`kind` defaulting in restore/preview** — reject missing `kind` rather than silent 404.
- **`created_at` from file mtime** vs filename-encoded timestamp — small drift if files copied; document.
#### From T100 review
- **Hardcoded `k=50`** — extract to module constant.
- **N+1 lookups (`get_bot`/`get_chat`/`get_scene` per row)** — fine at `k=50`, revisit if `k` grows.
- **FTS highlighting via `snippet()`** — Phase 4 skipped this; UX nice-to-have.
- **Result links chat-level only** — `memories` table has no `event_id` column; deep-linking to specific turn requires schema addition.
#### Deferred items
- **sqlite-vec swap** when host Python supports `enable_load_extension`.
- **Real embedding model** with proper semantic similarity.
- **Branching read-side filter**: T89 ships data-model + UI but event readers don't yet consult `is_active`. Each branch is metadata-only labeled ranges. Consult-on-read is Phase 4.5+ work.
- **Bulk significance re-rate** in drawer (T98.2 deferred — only per-memory edit shipped).
- **Vector index optimization** (HNSW) — only relevant if memory counts grow past pure-Python feasibility.
- **`scene-close-on-cancel` UX revisit** (Phase 2.5 carry-over).
- **Cross-feature canned-queue brittleness fixture builder** (Phase 3 carry-over).
- **Full lifecycle-rollback in regenerate** — Phase 3.5 T83.4 shipped a warning log; proper rollback needs schema-level back-references (`triggered_by_assistant_turn_id` payload field).
- **T115 sqlite-vec swap** (environmental blocker): host Python's `sqlite3.Connection` does not expose `enable_load_extension``python -c "import sqlite3; sqlite3.connect(':memory:').enable_load_extension(True)"` raises `AttributeError`. Fix requires either a Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Pure-Python cosine remains in production until then.
- **T108 follow-up: cancel-path commit bug** — `post_turn`'s re-raised `CancelledError` causes `open_db` dependency teardown to skip `conn.commit()`, rolling back all post-cancel writes. The existing T74.3 regression test passes only because `asyncio` isn't imported in the test module (NameError masks the real cancel path). Triage required — either commit before re-raise, or restructure the route to never re-raise after the close-detection branch.
- **`embeddings` FK CASCADE on `memory_id`** — deferred from T109; do as part of a broader migration consolidation in Phase 5.
- **`CannedQueue` fixture migration** — T116 shipped the builder + POC migrations; remaining tests still use positional canned arrays. Migrate in Phase 5.
- **Vector index optimization (HNSW)** — currently scales to a few thousand memories on the flat-index pure-Python cosine path; revisit when counts grow past flat-index feasibility.
- **Branch-isolated `event_log`** — each branch has its own physical `event_log` range vs the current shared id space + head filter; full branch isolation is Phase 5+.
- **Embedding model swap migration tooling** — T112 added `--re-embed-all`; a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
- **Real-time collaborative branching** (multi-user) — out of scope for v1.
- **Avatars / portraits** (multimodality) — deferred indefinitely per design §14.
+6
View File
@@ -94,9 +94,15 @@ async def lifespan(app: FastAPI):
# Phase 4's pseudo-embedding path is local so the worker doesn't need
# an LLM client; we still pass one so the Phase 4.5 swap to a real
# model is a one-line change.
# T112 (Phase 4.5): the embedding model is now configurable via
# ``Settings.embedding_model``. Default ``"pseudo-sha256-384"``
# keeps the local-only path; swapping to a real model routes
# through ``client.embed(...)`` and falls back to a zero vector
# plus warning if the provider doesn't support embeddings.
embedding_worker = EmbeddingWorker(
conn_factory=lambda: open_db(settings.db_path),
client=_factory(),
model=settings.embedding_model,
)
await embedding_worker.start()
app.state.embedding_worker = embedding_worker
+8
View File
@@ -39,6 +39,14 @@ class Settings(BaseModel):
data_dir: Path = REPO_ROOT / "data"
bind_host: str = "127.0.0.1"
bind_port: int = 8000
# T112 (Phase 4.5): embedding model identifier. Default is the
# deterministic local pseudo (semantically meaningless but keeps the
# vector pipeline structurally valid). Swap to a real model name
# (e.g. "bge-small-en-v1.5") once the LLMClient implementation
# supports embed() — currently FeatherlessClient does NOT, so a
# non-default value will trigger the zero-vector fallback path
# plus a T107 warning until a different provider is wired in.
embedding_model: str = "pseudo-sha256-384"
def load_settings() -> Settings:
config_path = Path(os.environ.get("CHAT_CONFIG_PATH", DEFAULT_CONFIG))
@@ -0,0 +1,25 @@
-- 0014_phase45_schema.sql — Phase 4.5 Wave 2 schema bump (T109).
--
-- Two schema concerns are bundled into this migration:
--
-- 1. ``embeddings.memory_id`` FK should ideally carry ``ON DELETE CASCADE``
-- (T88 review nit). DEFERRED to Phase 5: ``embeddings`` rows are only ever
-- deleted when the parent ``memories`` row is deleted, and ``memories``
-- rows are never deleted today (memory hide is a soft flag; the surgical
-- ``deindex_event`` path operates on ``event_log`` and does NOT cascade
-- to projection rows). The CASCADE constraint therefore can't fire under
-- current usage — adding the SQLite table-rebuild dance (rename, recreate,
-- copy, drop, reindex) for a defensive constraint is unwarranted bloat
-- in a polish wave. Revisit during the broader Phase 5 migration cleanup
-- when other table reshapes make the rebuild worthwhile.
--
-- 2. Add ``memories.event_id`` (NULLABLE INTEGER, references ``event_log.id``)
-- so cross-chat search results can deep-link back to the originating
-- turn (foundation for T111). The column is nullable so historical
-- memory rows projected before 0014 ran continue to round-trip cleanly;
-- new rows are populated by the ``memory_written`` projector handler
-- from the projecting event's id. This is a pure additive change — no
-- backfill is performed. Older rows simply read NULL until/unless a
-- later migration backfills them; T111 surfaces are coded to accept
-- NULL gracefully (no deep-link rendered).
ALTER TABLE memories ADD COLUMN event_id INTEGER REFERENCES event_log(id);
+8
View File
@@ -12,3 +12,11 @@ class Message:
class LLMClient(Protocol):
async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str: ...
def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]: ...
# T112 (Phase 4.5): real-embedding seam. Implementations either call a
# provider's ``/v1/embeddings`` endpoint or, when the provider doesn't
# expose embeddings (e.g. Featherless today), raise ``NotImplementedError``
# so ``generate_embedding`` can catch it and degrade to the zero-vector
# fallback. The Protocol is structural, so this method only needs to
# exist on implementations; existing callers that don't use it are
# unaffected.
async def embed(self, text: str, *, model: str) -> list[float]: ...
+23
View File
@@ -53,3 +53,26 @@ class FeatherlessClient:
delta = chunk.choices[0].delta.content or ""
if delta:
yield delta
async def embed(self, text: str, *, model: str) -> list[float]:
"""Embeddings via Featherless — currently unsupported.
T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
for a future real-embedding swap. Featherless's OpenAI-compatible
surface does NOT expose ``/v1/embeddings`` at the time of writing,
so this implementation raises ``NotImplementedError`` rather than
attempting a request that would 404. The
:func:`chat.services.embeddings.generate_embedding` wrapper
catches this and degrades to the existing zero-vector fallback
(with the T107 warning), so misconfigured callers fail loudly in
logs but the request path keeps working.
If Featherless ships embeddings, swap the body for an
``self._client.embeddings.create(model=..., input=...)`` call
guarded by ``self._sem()`` (mirrors ``generate``/``stream``).
"""
raise NotImplementedError(
"Featherless does not expose /v1/embeddings; "
"configure a different embedding provider or stick with "
"the default pseudo-sha256-384 model."
)
+21 -1
View File
@@ -4,8 +4,23 @@ from .client import Message
class MockLLMClient:
def __init__(self, canned: list[str]):
"""In-memory LLMClient for tests.
``canned`` feeds ``generate``/``stream`` (one entry per call, popped
from the front). ``canned_embeddings`` (T112, Phase 4.5) feeds
``embed`` the same way — each call pops the next vector. An empty
queue raises ``IndexError`` so misconfigured tests fail loudly
rather than returning ``None`` or hanging.
"""
def __init__(
self,
canned: list[str],
*,
canned_embeddings: list[list[float]] | None = None,
):
self._canned = list(canned)
self._canned_embeddings: list[list[float]] = list(canned_embeddings or [])
async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str:
return self._canned.pop(0)
@@ -14,3 +29,8 @@ class MockLLMClient:
text = self._canned.pop(0)
for ch in text:
yield ch
async def embed(self, text: str, *, model: str) -> list[float]:
# Mirrors the canned-queue pattern; empty queue raises so
# misconfigured tests surface clearly instead of returning None.
return self._canned_embeddings.pop(0)
+36 -8
View File
@@ -26,13 +26,28 @@ def search_all_memories(
"""Search FTS5 across all owners and chats.
Returns rows with ``{memory_id, owner_id, chat_id, scene_id,
pov_summary, significance, ts, fts_rank}``, sorted by FTS5 BM25
rank ascending (lower rank = stronger match, surfaced first).
event_id, pov_summary, snippet, significance, ts, fts_rank}``,
sorted by FTS5 BM25 rank ascending (lower rank = stronger match,
surfaced first).
``event_id`` (T111.2 / T109) is the id of the ``event_log`` row that
drove the projecting ``memory_written`` event. May be ``None`` for
memory rows projected before the 0014 schema migration ran (the
column is nullable on purpose; T109 did not backfill historical
rows). The search-results UI uses it to deep-link to the originating
turn anchor (Phase 3.5 T86 stamps ``id="turn-{event_id}"`` on each
turn DOM node) and falls back to a chat-level link when ``None``.
The ``memories`` table has no ``ts`` column; we expose ``created_at``
(the projector-side row insertion timestamp) under that key so the
UI does not have to know the storage name.
``snippet`` (T111.1) is the FTS5 ``snippet()`` output for the
matched ``pov_summary`` column: a windowed excerpt with each match
token wrapped in ``<mark>...</mark>`` for the search-results UI to
render verbatim. The full ``pov_summary`` is also returned so
non-highlighted callers (or fallbacks) keep the original string.
An empty / whitespace-only ``query`` short-circuits to ``[]`` to
avoid an FTS5 ``MATCH ''`` syntax error and to keep the top-bar
"no input yet" state from triggering a full-table scan.
@@ -45,9 +60,20 @@ def search_all_memories(
# from the content table because the FTS index only stores
# ``pov_summary``. ORDER BY rank ASC because BM25 in FTS5 returns
# negative scores where lower is better.
#
# ``snippet(memories_fts, 0, ...)`` (T111.1) targets column 0 of the
# FTS virtual table, which is ``pov_summary`` (the only column
# indexed by ``CREATE VIRTUAL TABLE memories_fts USING fts5(
# pov_summary, ...)`` in migration 0006). SQLite passes the raw
# column text through verbatim aside from inserting the configured
# before/after match markers, so the only HTML in the output is the
# ``<mark>`` we injected — safe to render with ``|safe`` server-side.
rows = conn.execute(
"SELECT m.id, m.owner_id, m.chat_id, m.scene_id, "
" m.pov_summary, m.significance, m.created_at, "
"SELECT m.id, m.owner_id, m.chat_id, m.scene_id, m.event_id, "
" m.pov_summary, "
" snippet(memories_fts, 0, '<mark>', '</mark>', '', 32) "
" AS snippet, "
" m.significance, m.created_at, "
" memories_fts.rank "
"FROM memories_fts "
"JOIN memories m ON m.id = memories_fts.rowid "
@@ -63,10 +89,12 @@ def search_all_memories(
"owner_id": r[1],
"chat_id": r[2],
"scene_id": r[3],
"pov_summary": r[4],
"significance": r[5],
"ts": r[6],
"fts_rank": r[7],
"event_id": r[4],
"pov_summary": r[5],
"snippet": r[6],
"significance": r[7],
"ts": r[8],
"fts_rank": r[9],
}
for r in rows
]
+24 -5
View File
@@ -10,6 +10,7 @@ EmbeddingResult shape stays the same, only the generator changes.
from __future__ import annotations
import hashlib
import logging
import math
import struct
@@ -18,6 +19,8 @@ from pydantic import BaseModel
from chat.llm.client import LLMClient
_log = logging.getLogger(__name__)
DEFAULT_EMBEDDING_DIM = 384
DEFAULT_EMBEDDING_MODEL = "pseudo-sha256-384"
FALLBACK_EMBEDDING_MODEL = "fallback"
@@ -92,11 +95,27 @@ async def generate_embedding(
# Pure-local pseudo path — no LLMClient call.
return EmbeddingResult(vector=_pseudo_embed(text, dim), model=model, dim=dim)
# Future: real embedding via client.embed(...). Phase 4.5 work.
# For Phase 4, any non-default model falls through to fallback.
return EmbeddingResult(
vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim
)
# T112 (Phase 4.5): non-default model — route through the client's
# ``embed()`` method. On any failure (including ``NotImplementedError``
# from providers that don't expose embeddings, e.g. Featherless today),
# fall back to the zero vector and re-fire the T107 warning so
# misconfigured callers see the issue in logs rather than silently
# producing useless cosine results.
try:
vector = await client.embed(text, model=model)
return EmbeddingResult(vector=list(vector), model=model, dim=len(vector))
except Exception as exc: # noqa: BLE001 — any failure must degrade gracefully
_log.warning(
"generate_embedding: non-default model %r returned fallback "
"(client.embed() raised %s: %s); "
"downstream search will degrade silently. Configure a supported model.",
model,
type(exc).__name__,
exc,
)
return EmbeddingResult(
vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim
)
__all__ = [
+130 -36
View File
@@ -95,6 +95,27 @@ from chat.web.render import render_turn_html
_log = logging.getLogger(__name__)
# T114.3: map a lifecycle-transition event kind to the events-table
# status it implicitly transitioned *from*. Regenerate uses this to pick
# the ``prior_status`` value for the ``event_status_reverted`` rollback
# event so the projector sets the row back to where it was before the
# superseded turn fired the transition.
#
# - ``event_started`` was emitted when the row was 'planned' → revert to
# 'planned'.
# - ``event_completed`` was emitted when the row was 'active' → revert
# to 'active'.
# - ``event_cancelled`` could have fired from either 'planned' or
# 'active'. Best-effort default: 'active'. The forward transitions
# below only fire detect_event_transitions for currently-active rows,
# so 'active' is the realistic prior in practice.
_PRIOR_STATUS_MAP: dict[str, str] = {
"event_started": "planned",
"event_completed": "active",
"event_cancelled": "active",
}
async def regenerate_assistant_turn(
conn: Connection,
client,
@@ -115,17 +136,18 @@ async def regenerate_assistant_turn(
cannot be found — the FastAPI route translates this to 404.
.. note::
**Lifecycle-rollback limitation (T83.4, Phase 4 follow-up).**
**Lifecycle rollback (T114, Phase 4.5).**
When the superseded turn already produced lifecycle transitions
(``event_started`` / ``event_completed`` / ``event_cancelled``),
this function does NOT roll those rows back before re-running
``detect_event_transitions`` against the regenerated text. A
regenerate-after-completion can therefore double-emit promotion
artifacts if the new text re-completes the same event. Phase 3.5
only documents the gap and emits a WARNING log naming the
affected event_log ids; the actual undo pass is invasive
(re-projection / inverse-handler dispatch) and is deferred to
Phase 4. See the ``# T83.4`` block below for the warning emit.
this function emits an ``event_status_reverted`` event for each
so the events row's status returns to its prior value before the
regenerated narrative is reclassified. Backward compatibility:
lifecycle events authored before T114.1 lack the
``triggered_by_assistant_turn_id`` payload field; rollback skips
those (logged at DEBUG) so historic rows are not retroactively
reverted. A WARNING about un-rolled-back transitions is still
emitted when stragglers are found — the rollback handles the
common case while older logs continue to need manual review.
"""
chat = get_chat(conn, chat_id)
if chat is None:
@@ -158,20 +180,21 @@ async def regenerate_assistant_turn(
original_assistant_payload = json.loads(row[0])
original_user_turn_id = original_assistant_payload.get("user_turn_id")
# T83.4: scan for downstream lifecycle transitions emitted by the
# superseded turn — they're not being rolled back (see method
# docstring). Heuristic: any ``event_started`` / ``event_completed``
# / ``event_cancelled`` event_log row with id strictly greater than
# the original assistant_turn's id was emitted as part of (or after)
# that turn's processing. Lifecycle events don't carry ``chat_id``
# in their payload (their payload references an ``event_id`` FK to
# the ``events`` table, which holds chat_id), so we join through
# ``events`` to scope to this chat.
#
# A WARNING log surfaces the affected event ids so operators can
# spot double-emit cases until the Phase 4 rollback pass lands.
# T114.3: roll back lifecycle transitions emitted by the superseded
# turn. The scan uses the same id-greater-than-superseded-turn
# heuristic as the legacy T83.4 warning, joined to ``events`` for
# chat scoping (lifecycle events don't carry chat_id in their
# payload — they reference an ``event_id`` FK to the ``events``
# table, which holds chat_id). For each row whose payload carries
# ``triggered_by_assistant_turn_id == original_assistant_event_id``
# (T114.1 back-reference), emit an ``event_status_reverted`` event
# so the events-row status returns to the pre-transition value.
# Lifecycle rows authored before T114.1 lack the back-reference;
# those are skipped (DEBUG log) and a WARNING tracks their count so
# operators still see legacy stragglers — preserves the T83.4
# observability contract for un-rolled-back transitions.
unrolled_lifecycle = conn.execute(
"SELECT el.id, el.kind FROM event_log AS el "
"SELECT el.id, el.kind, el.payload_json FROM event_log AS el "
"JOIN events AS ev "
" ON ev.event_id = json_extract(el.payload_json, '$.event_id') "
"WHERE el.kind IN ("
@@ -182,18 +205,73 @@ async def regenerate_assistant_turn(
"ORDER BY el.id ASC",
(chat_id, original_assistant_event_id),
).fetchall()
if unrolled_lifecycle:
# T90.2: phrased as "at-or-after turn <id>" rather than "from
# superseded turn" because regenerating an OLDER turn lists
# intervening-turn transitions that legitimately stand on their
# own — those weren't authored by the superseded turn itself.
rolled_back_ids: list[int] = []
skipped_no_backref: list[int] = []
for el_id, el_kind, el_payload_json in unrolled_lifecycle:
try:
lifecycle_payload = json.loads(el_payload_json)
except (TypeError, ValueError):
skipped_no_backref.append(el_id)
continue
triggered_by = lifecycle_payload.get("triggered_by_assistant_turn_id")
if triggered_by != original_assistant_event_id:
# Either a legacy row (no field) or a transition triggered
# by a *different* turn — leave it alone. DEBUG so the
# message is available under verbose logging without
# spamming the default WARNING channel.
_log.debug(
"regenerate_assistant_turn: skipping rollback for "
"lifecycle event_log id=%d (kind=%s) — no back-reference "
"or different turn (triggered_by=%r vs superseded=%d)",
el_id,
el_kind,
triggered_by,
original_assistant_event_id,
)
if triggered_by is None:
skipped_no_backref.append(el_id)
continue
prior_status = _PRIOR_STATUS_MAP.get(el_kind)
if prior_status is None:
# Defensive: the SQL filter already restricts to the three
# known kinds, but a future schema addition shouldn't crash
# the rollback path.
continue
target_event_id = lifecycle_payload.get("event_id")
if target_event_id is None:
continue
append_and_apply(
conn,
kind="event_status_reverted",
payload={
"event_id": target_event_id,
"prior_status": prior_status,
},
)
rolled_back_ids.append(el_id)
if rolled_back_ids:
_log.info(
"regenerate_assistant_turn: rolled back %d lifecycle "
"transition(s) triggered by superseded turn %s "
"(event_log ids: %s)",
len(rolled_back_ids),
original_assistant_event_id,
rolled_back_ids,
)
if skipped_no_backref:
# T83.4 (legacy) compatibility: still warn about stragglers
# without the back-reference so operators can spot pre-T114
# double-emit risks. Phrased as "at-or-after turn <id>" per
# T90.2 — older transitions may legitimately belong to other
# turns.
_log.warning(
"regenerate_assistant_turn: %d lifecycle transition(s) "
"at-or-after turn %s are NOT being rolled back (Phase 4 "
"follow-up). Affected event ids: %s",
len(unrolled_lifecycle),
"at-or-after turn %s are NOT being rolled back (no "
"triggered_by_assistant_turn_id back-reference). "
"Affected event ids: %s",
len(skipped_no_backref),
original_assistant_event_id,
[r[0] for r in unrolled_lifecycle],
skipped_no_backref,
)
# 1a. Look up any sibling interjection beat in the same turn group
@@ -716,11 +794,13 @@ async def regenerate_assistant_turn(
# runs inline after a completion so promotion artifacts land in the
# same regenerate path.
#
# T83.4 follow-up: when a regenerate replaces a turn that had
# already produced event transitions, those original transitions
# are NOT undone here (Phase 4 work). A WARNING log earlier in this
# function names the affected event_log ids — see the T83.4 block
# near the function entry.
# T114.3: original-turn transitions emitted before this regenerate
# ran were rolled back at the top of the function (see the
# ``# T114.3`` block) by appending ``event_status_reverted`` for
# each. The classify-and-emit pass below now operates against an
# ``events`` projection that has already been reverted, so it can
# safely re-fire transitions for the regenerated narrative without
# double-emitting promotion artifacts.
new_active_events = list_active_events(conn, chat_id)
if new_active_events:
lifecycle_decision = await detect_event_transitions(
@@ -738,6 +818,12 @@ async def regenerate_assistant_turn(
payload={
"event_id": transition.event_id,
"started_at": chat.get("time"),
# T114.1: back-reference to the assistant_turn
# that triggered this transition (see turns.py
# for rationale).
"triggered_by_assistant_turn_id": (
new_assistant_event_id
),
},
)
elif transition.new_status == "completed":
@@ -747,6 +833,10 @@ async def regenerate_assistant_turn(
payload={
"event_id": transition.event_id,
"completed_at": chat.get("time"),
# T114.1: back-reference (see above).
"triggered_by_assistant_turn_id": (
new_assistant_event_id
),
},
)
promote_completed_event(
@@ -762,6 +852,10 @@ async def regenerate_assistant_turn(
payload={
"event_id": transition.event_id,
"completed_at": chat.get("time"),
# T114.1: back-reference (see above).
"triggered_by_assistant_turn_id": (
new_assistant_event_id
),
},
)
+16 -3
View File
@@ -144,23 +144,36 @@ def _read_recent_dialogue(
``id >= since_event_id`` so callers needing a scene-scoped view (e.g.
thread detection on close) don't pull turns that landed before the
closing scene's ``scene_opened`` event.
T113: also clamps by the active branch's ``[origin, head]`` event-id
range so scene-summary inputs respect the user's current branch.
Bootstrap-main and "no active branch" fall through to ``(0, BIG_INT)``
so existing flows are unchanged.
"""
from chat.state.branches import active_branch_event_ids
origin, head = active_branch_event_ids(conn)
if since_event_id is None:
cur = conn.execute(
"SELECT kind, payload_json FROM event_log "
"WHERE kind IN ('user_turn', 'assistant_turn') "
" AND superseded_by IS NULL AND hidden = 0 "
" AND id BETWEEN ? AND ? "
"ORDER BY id DESC LIMIT ?",
(limit,),
(origin, head, limit),
)
else:
# Compose ``since_event_id`` with the branch lower bound — readers
# want the tightest ``id >= max(since, origin)`` clamp without an
# extra Python pass.
lower = max(origin, since_event_id)
cur = conn.execute(
"SELECT kind, payload_json FROM event_log "
"WHERE kind IN ('user_turn', 'assistant_turn') "
" AND superseded_by IS NULL AND hidden = 0 "
" AND id >= ? "
" AND id BETWEEN ? AND ? "
"ORDER BY id DESC LIMIT ?",
(since_event_id, limit),
(lower, head, limit),
)
rows = list(reversed(cur.fetchall()))
out: list[dict] = []
+11 -2
View File
@@ -30,6 +30,7 @@ from __future__ import annotations
import json
from sqlite3 import Connection
from chat.state.branches import active_branch_event_ids
from chat.state.edges import get_edge
@@ -60,15 +61,22 @@ def read_recent_dialogue(
previous implementation filtered chat_id post-fetch in Python, which
let foreign-chat rows fill the LIMIT and yield fewer than N relevant
rows in busy multi-chat databases.
T113: clamp by the active branch's ``[origin, head]`` event-id range so
switching branches actually changes what dialogue this read sees.
Bootstrap-main and "no active branch" both fall through to ``(0,
BIG_INT)`` no functional change for the metadata-only Phase 4 era.
"""
origin, head = active_branch_event_ids(conn)
if exclude_event_id is None:
cur = conn.execute(
"SELECT id, kind, payload_json FROM event_log "
"WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
" AND superseded_by IS NULL AND hidden = 0 "
" AND id BETWEEN ? AND ? "
" AND json_extract(payload_json, '$.chat_id') = ? "
"ORDER BY id DESC LIMIT ?",
(chat_id, limit),
(origin, head, chat_id, limit),
)
else:
cur = conn.execute(
@@ -76,9 +84,10 @@ def read_recent_dialogue(
"WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
" AND id != ? "
" AND superseded_by IS NULL AND hidden = 0 "
" AND id BETWEEN ? AND ? "
" AND json_extract(payload_json, '$.chat_id') = ? "
"ORDER BY id DESC LIMIT ?",
(exclude_event_id, chat_id, limit),
(exclude_event_id, origin, head, chat_id, limit),
)
rows = list(reversed(cur.fetchall()))
out: list[dict] = []
+81
View File
@@ -9,11 +9,15 @@ existing event readers remain branch-agnostic.
"""
from __future__ import annotations
import logging
from sqlite3 import Connection
from chat.eventlog.projector import on
from chat.eventlog.log import Event
logger = logging.getLogger(__name__)
@on("branch_created")
def _apply_branch_created(conn: Connection, e: Event) -> None:
@@ -37,9 +41,26 @@ def _apply_branch_switched(conn: Connection, e: Event) -> None:
"""Set is_active=1 on the named branch and is_active=0 on all others.
Atomic via two UPDATEs ordered to avoid the unique-active-index race.
If the named branch does not exist, a warning is emitted and the
is_active flags are still cleared (preserving prior behavior the
second UPDATE simply matches no rows). Callers should validate the
name upstream; this guard surfaces accidental mismatches in the log.
"""
p = e.payload
name = p["name"]
# Warn (don't raise) if the target branch is missing. The existing
# outcome — zero active branches — is preserved; this just makes the
# condition observable instead of silent.
exists = conn.execute(
"SELECT 1 FROM branches WHERE name = ? LIMIT 1",
(name,),
).fetchone()
if exists is None:
logger.warning(
"branch_switched to unknown branch name %r; no branch will be active",
name,
)
# Clear ALL is_active flags first (avoids the unique-index trip).
conn.execute("UPDATE branches SET is_active = 0 WHERE is_active = 1")
conn.execute(
@@ -79,6 +100,16 @@ def get_branch(conn: Connection, name: str) -> dict | None:
def list_branches(conn: Connection, chat_id: str | None = None) -> list[dict]:
"""Return branch rows, optionally scoped to a chat.
When ``chat_id`` is provided the filter is ``chat_id = ? OR chat_id IS NULL``,
so global (null-chat) branches are returned in *every* per-chat scope. This
is intentional: the bootstrapped ``"main"`` branch (and any future
null-chat branches) are global by design they belong to no single chat
and should appear alongside per-chat branches in any chat-scoped listing.
Callers that want only per-chat branches should filter the result on
``chat_id is not None``.
"""
if chat_id is None:
rows = conn.execute(
"SELECT id, name, origin_event_id, head_event_id, chat_id, "
@@ -126,8 +157,58 @@ def active_branch(conn: Connection) -> dict | None:
}
# T113: sentinel "no upper bound" used by ``active_branch_event_ids`` when the
# active branch's head is unset (the bootstrap "main" branch with origin=0 +
# head=0). Readers compose ``id BETWEEN origin AND head`` so a value larger
# than any possible row id behaves as "no clamp" without needing a separate
# code path. ``2**63 - 1`` is SQLite's max signed-int — safe forever.
_NO_HEAD_CLAMP = 2**63 - 1
def active_branch_event_ids(conn: Connection) -> tuple[int, int]:
"""Return ``(origin_event_id, head_event_id)`` for the currently active
branch, suitable as bounds for an ``event_log.id BETWEEN ? AND ?`` clamp
on user-facing reads (T113).
Defensive defaults:
* **No active branch row** (``active_branch`` returns ``None``) return
``(0, _NO_HEAD_CLAMP)`` so readers see all events. This preserves the
Phase 4 "branches are metadata-only" contract for any code path that
somehow runs without the migration-0013 bootstrap.
* **Bootstrap "main"** the canonical ``name="main", origin=0, head=0``
row inserted by migration 0013. Production today never emits
``branch_head_updated`` for main, so head stays at 0 even as events
accumulate. We treat this exact bootstrap state as "no clamp" and
return ``(0, _NO_HEAD_CLAMP)`` so all events remain visible. This is
what every existing test (which never configures branches) relies on.
* **Any other branch** return the literal ``(origin, head)`` from the
branch row. A branch created at origin=N has head=N initially (per
``branch_from_event``), so ``BETWEEN N AND N`` returns just that one
seed event until the head is bumped via ``branch_head_updated``.
Note on the schema mismatch with the T113 spec: the spec describes
``head_event_id`` as nullable, but migration 0013 declared it
``NOT NULL DEFAULT 0``. We read head=0 on bootstrap main as the
"unset" sentinel; non-main branches never reach head=0 in normal
flow (creation sets head=origin, and origin=0 only for main).
"""
branch = active_branch(conn)
if branch is None:
return (0, _NO_HEAD_CLAMP)
origin = int(branch.get("origin_event_id") or 0)
head = int(branch.get("head_event_id") or 0)
# Bootstrap "main" sentinel — see docstring above. Detect by name +
# both ids being 0 to avoid mis-firing on a hypothetical future
# branch that legitimately starts at origin=0.
if branch.get("name") == "main" and origin == 0 and head == 0:
return (0, _NO_HEAD_CLAMP)
return (origin, head)
__all__ = [
"get_branch",
"list_branches",
"active_branch",
"active_branch_event_ids",
]
+23
View File
@@ -67,6 +67,29 @@ def _apply_event_expired(conn: Connection, e: Event) -> None:
)
@on("event_status_reverted")
def _apply_event_status_reverted(conn: Connection, e: Event) -> None:
"""T114.2: Revert an event row's status to ``prior_status``.
Emitted by ``regenerate_assistant_turn`` when a superseded turn had
triggered a lifecycle transition (event_started / event_completed /
event_cancelled). The rollback step needs an inverse projection that
sets the row's status back to whatever it was *before* the now-
superseded transition fired.
Unlike the forward transitions (which guard against terminal-status
overwrites) this handler is unconditional the entire purpose is to
reverse a transition, including reverting from a terminal status
(completed/cancelled) back to a non-terminal one.
"""
p = e.payload
conn.execute(
"UPDATE events SET status = ?, updated_at = datetime('now') "
"WHERE event_id = ?",
(p["prior_status"], p["event_id"]),
)
def get_event(conn: Connection, event_id: str) -> dict | None:
row = conn.execute(
"SELECT event_id, chat_id, kind, status, props_json, planned_for, "
+71 -11
View File
@@ -13,13 +13,18 @@ def _row_to_dict(conn: Connection, row: tuple) -> dict:
@on("memory_written")
def _apply_memory_written(conn: Connection, e: Event) -> None:
# T109 (schema 0014): persist the projecting event's id on the memory
# row so cross-chat search results can deep-link back to the
# originating turn (T111). Older memory rows projected before 0014
# ran read NULL here — the column is nullable for that reason.
p = e.payload
conn.execute(
"INSERT INTO memories ("
"owner_id, chat_id, scene_id, pov_summary, "
"witness_you, witness_host, witness_guest, "
"chat_clock_at, source, reliability, significance, pinned, auto_pinned"
") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
"chat_clock_at, source, reliability, significance, pinned, auto_pinned, "
"event_id"
") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(
p["owner_id"],
p["chat_id"],
@@ -34,6 +39,7 @@ def _apply_memory_written(conn: Connection, e: Event) -> None:
int(p.get("significance", 1)),
int(p.get("pinned", 0)),
int(p.get("auto_pinned", 0)),
e.id,
),
)
@@ -112,6 +118,25 @@ SIGNIFICANCE_RANK_BIAS = 0.5
RRF_CONST = 60
def _max_event_id(conn: Connection, owner_id: str) -> int:
"""Return the largest ``memories.id`` for ``owner_id`` (1 if none exist).
Used as the recency-boost denominator by both ``_composite_rerank`` and
``_rrf_fuse_and_rerank`` (T104). The row id is a monotonic recency proxy
newer memories have larger ids so dividing by the per-owner max keeps
the boost in [0, 1] regardless of how many memories the owner has.
Returns 1 (not 0) when the owner has no rows so callers can divide by
the result without a guard. The "no memories" case never actually hits
this helper because the FTS query above would have returned no rows,
but the safe default keeps the helper trivially reusable.
"""
row = conn.execute(
"SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
).fetchone()
return row[0] if row and row[0] else 1
def search_memories(
conn: Connection,
owner_id: str,
@@ -163,6 +188,14 @@ def search_memories(
When ``query_vector`` is None: FTS-only behaviour unchanged all
Phase 1-3.5 callers see the same row shape and ordering as before.
**Row-shape contract (T104):** every returned dict carries an
``fts_rank`` key. For FTS hits this is the BM25 score (a negative float,
lower-is-better). For *vector-only* hits surfaced by the fused path
rows that matched the query embedding but did NOT match FTS the
``fts_rank`` value is ``None``. Downstream consumers must accept
``None`` here; do not assume ``fts_rank`` is always numeric. The
``composite_score`` is always a float on every returned row.
"""
if witness_role not in _VALID_WITNESS_ROLES:
raise ValueError(
@@ -180,12 +213,20 @@ def search_memories(
# channel) so memories that are weak in FTS but strong in vector — and
# vice versa — make it into the merge pool.
over_fetch = max(k * 2, 20) if query_vector is not None else max(k * 4, 20)
# T113: branch-scope filter on ``m.event_id`` (T109's column). Memories
# whose ``event_id`` is NULL — projected before the 0014 schema migration
# ran — are *included* unconditionally so the branch filter never breaks
# legacy retrieval. Newer rows respect the active branch's bounds.
from chat.state.branches import active_branch_event_ids
origin, head = active_branch_event_ids(conn)
sql = (
f"SELECT {select_list}, memories_fts.rank AS fts_rank "
"FROM memories_fts "
"JOIN memories m ON m.id = memories_fts.rowid "
f"WHERE m.owner_id = ? AND m.{witness_col} = 1 "
"AND memories_fts MATCH ? "
"AND (m.event_id IS NULL OR m.event_id BETWEEN ? AND ?) "
# T57: significance multiplier biases the FTS over-fetch order. BM25
# ``rank`` is lower-is-better, so subtracting ``significance * BIAS``
# surfaces higher-significance rows above lower-significance rows with
@@ -194,7 +235,10 @@ def search_memories(
"ORDER BY (memories_fts.rank - m.significance * ?) ASC "
"LIMIT ?"
)
cur = conn.execute(sql, (owner_id, query, SIGNIFICANCE_RANK_BIAS, over_fetch))
cur = conn.execute(
sql,
(owner_id, query, origin, head, SIGNIFICANCE_RANK_BIAS, over_fetch),
)
rows = cur.fetchall()
# FTS-only path: preserve pre-T96 behaviour exactly.
@@ -227,10 +271,7 @@ def _composite_rerank(
Extracted from ``search_memories`` so the no-vector path stays a single
call and the fused path can re-use the same boost formulae after RRF.
"""
max_id_row = conn.execute(
"SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
).fetchone()
max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1
max_id = _max_event_id(conn, owner_id)
result_cols = cols + ["fts_rank"]
enriched: list[dict] = []
@@ -301,6 +342,28 @@ def _rrf_fuse_and_rerank(
query_vector=query_vector,
k=vec_over_fetch,
)
# T113: drop vector hits that fall outside the active branch's event-id
# range. ``vector_search`` is a generic service used elsewhere; the
# branch filter applied to the FTS leg also has to apply here so the
# fused result respects the same scope. Memories with NULL event_id
# (legacy rows projected before T109's 0014 schema migration) are
# included unconditionally — same policy as the FTS leg.
from chat.state.branches import _NO_HEAD_CLAMP, active_branch_event_ids
vec_origin, vec_head = active_branch_event_ids(conn)
if vec_hits and (vec_origin > 0 or vec_head < _NO_HEAD_CLAMP):
vec_ids = [h["memory_id"] for h in vec_hits]
placeholders_v = ",".join("?" * len(vec_ids))
in_range = {
row[0]
for row in conn.execute(
f"SELECT id FROM memories "
f"WHERE id IN ({placeholders_v}) "
f" AND (event_id IS NULL OR event_id BETWEEN ? AND ?)",
(*vec_ids, vec_origin, vec_head),
).fetchall()
}
vec_hits = [h for h in vec_hits if h["memory_id"] in in_range]
vec_rank_by_id: dict[int, int] = {
hit["memory_id"]: rank for rank, hit in enumerate(vec_hits)
}
@@ -343,10 +406,7 @@ def _rrf_fuse_and_rerank(
# Final composite re-rank: significance + recency boosts on top of the
# negated fusion score so the sort direction matches the FTS-only path.
max_id_row = conn.execute(
"SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
).fetchone()
max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1
max_id = _max_event_id(conn, owner_id)
result_cols = cols + ["fts_rank"]
enriched: list[dict] = []
+34
View File
@@ -0,0 +1,34 @@
{# T110.3: delete-impact modal partial.
Rendered from :func:`chat.web.drawer.delete_preview` via a Jinja2
TemplateResponse so HTML autoescape covers user-controllable fields
(item.kind, item.description, notes) automatically — the prior
f-string assembly required explicit html.escape() calls (T110.2)
which become redundant under autoescape.
Inputs:
``chat_id`` — the URL chat id (used to build the confirm form action).
``impact`` — an :class:`~chat.services.delete_impact.ImpactReport`.
#}
<div class="delete-impact-modal">
<h3>Delete event {{ impact.target_event_id }}?</h3>
<p>This will discard {{ impact.cascading|length }} events. Cascade:</p>
<ul class="delete-impact-cascade">
{% if impact.cascading %}
{% for item in impact.cascading %}
<li><strong>{{ item.kind }}</strong>: {{ item.description }}</li>
{% endfor %}
{% else %}
<li>none</li>
{% endif %}
</ul>
<ul class="delete-impact-notes">
{% for note in impact.notes %}
<li>{{ note }}</li>
{% endfor %}
</ul>
<form hx-post="/chats/{{ chat_id }}/drawer/turn/delete/{{ impact.target_event_id }}"
hx-target="#drawer" hx-swap="innerHTML">
<button type="submit">Confirm delete</button>
</form>
</div>
+19
View File
@@ -547,6 +547,25 @@
</ul>
</details>
{% endif %}
{# T110.4: bulk significance re-rate. Move every memory in this chat
at level_from to level_to with one manual_edit event per row, so
the audit trail stays per-memory. #}
<details class="bulk-significance">
<summary>Bulk re-rate significance</summary>
<form class="inline-edit"
hx-post="/chats/{{ chat.id }}/drawer/memory/significance/bulk"
hx-target="#drawer" hx-swap="innerHTML">
<label>
From:
<input type="number" name="level_from" min="0" max="3" value="0" required>
</label>
<label>
To:
<input type="number" name="level_to" min="0" max="3" value="1" required>
</label>
<button type="submit">Re-rate all</button>
</form>
</details>
</section>
<section class="drawer-section">
+17 -2
View File
@@ -21,14 +21,29 @@
<ul class="search-results">
{% for r in results %}
<li class="search-result">
<a class="search-result-link" href="/chats/{{ r.chat_id }}">
{# T111.2: deep-link to the originating turn via the
``id="turn-{event_id}"`` anchor stamped by Phase 3.5 T86.
``event_id`` may be NULL for memory rows projected before the
0014 migration ran (T109 did not backfill historical rows); in
that case fall back to a chat-level link with no anchor so we
never emit ``#turn-None``. #}
<a class="search-result-link"
href="/chats/{{ r.chat_id }}{% if r.event_id %}#turn-{{ r.event_id }}{% endif %}">
<div class="search-result-meta muted">
<strong>{{ r.owner_name }}</strong>
<span>&middot; {{ r.chat_id }}</span>
{% if r.chat_name %}<span>&middot; {{ r.chat_name }}</span>{% endif %}
{% if r.scene_label %}<span>&middot; scene {{ r.scene_label }}</span>{% endif %}
</div>
<div class="search-result-summary">{{ r.pov_summary }}</div>
{# T111.1: ``r.snippet`` is the FTS5 ``snippet()`` excerpt with
each match wrapped in ``<mark>...</mark>``. ``|safe`` is
required so the marker tags survive Jinja's auto-escape; the
snippet is built by SQLite from indexed text, so the only
HTML in the string is the ``<mark>`` we configured (any
special chars from the source content are passed through as
literal text, NOT as HTML). This is the only ``|safe`` filter
on the page — chat_id, owner_name, etc. remain auto-escaped. #}
<div class="search-result-summary">{{ r.snippet|safe }}</div>
</a>
</li>
{% endfor %}
+81 -21
View File
@@ -411,6 +411,64 @@ async def edit_memory_significance(
return await drawer(chat_id, request, conn)
@router.post(
"/chats/{chat_id}/drawer/memory/significance/bulk",
response_class=HTMLResponse,
)
async def bulk_re_rate_significance(
chat_id: str,
request: Request,
level_from: int = Form(...),
level_to: int = Form(...),
conn=Depends(get_conn),
):
"""T110.4: bulk re-rate every memory in this chat at ``level_from``
to ``level_to``.
Fans out into one ``manual_edit`` event per matching memory rather
than a single bulk event so the §6.4 audit trail stays per-row
each affected memory carries its own ``prior_value -> new_value``
snapshot, so an inverse edit can restore an individual row without
needing to inspect a bulk payload's member list. The drawer's
significance-distribution panel surfaces the new buckets on the
refreshed partial.
Both levels are clamped to 0..3 (matching ``edit_memory_significance``)
and a no-op (``level_from == level_to``) is rejected with 400 so a
misclick can't pad the event log with empty edits.
"""
chat = get_chat(conn, chat_id)
if chat is None:
raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}")
lf = max(0, min(3, int(level_from)))
lt = max(0, min(3, int(level_to)))
if lf == lt:
raise HTTPException(
status_code=400,
detail=f"level_from and level_to must differ (both = {lf})",
)
rows = conn.execute(
"SELECT id FROM memories WHERE chat_id = ? AND significance = ? "
"ORDER BY id ASC",
(chat_id, lf),
).fetchall()
for row in rows:
memory_id = int(row[0])
append_and_apply(
conn,
kind="manual_edit",
payload={
"target_kind": "memory_significance",
"target_id": memory_id,
"prior_value": lf,
"new_value": lt,
},
)
return await drawer(chat_id, request, conn)
@router.post(
"/chats/{chat_id}/drawer/memory/{memory_id}/pin",
response_class=HTMLResponse,
@@ -1234,28 +1292,18 @@ async def delete_preview(
report = compute_delete_impact(conn, target_event_id=int(event_id))
# Build the modal HTML directly — the impact report is small and
# reusing the drawer template would require a fragment include just
# for this surface. Mirrors the rewind-preview style in
# :func:`chat.web.turns.rewind_preview`.
items_html = "".join(
f"<li><strong>{item.kind}</strong>: {item.description}</li>"
for item in report.cascading
# T110.3: render via the ``_delete_impact_modal.html`` Jinja partial
# so HTML autoescape covers user-controllable fields (item.kind,
# item.description, notes) automatically. The prior implementation
# built the modal HTML via raw f-string concatenation and required
# explicit ``html.escape()`` calls (T110.2) on each interpolated
# field; under autoescape those calls become redundant. Mirrors the
# rewind-preview style in :func:`chat.web.turns.rewind_preview`.
return TEMPLATES.TemplateResponse(
request,
"_delete_impact_modal.html",
{"chat_id": chat_id, "impact": report},
)
notes_html = "".join(f"<li>{note}</li>" for note in report.notes)
body = (
"<div class='delete-impact-modal'>"
f"<h3>Delete event {report.target_event_id}?</h3>"
f"<p>This will discard {len(report.cascading)} events. Cascade:</p>"
f"<ul class='delete-impact-cascade'>{items_html or '<li>none</li>'}</ul>"
f"<ul class='delete-impact-notes'>{notes_html}</ul>"
f"<form hx-post='/chats/{chat_id}/drawer/turn/delete/{report.target_event_id}' "
"hx-target='#drawer' hx-swap='innerHTML'>"
"<button type='submit'>Confirm delete</button>"
"</form>"
"</div>"
)
return HTMLResponse(body)
@router.post(
@@ -1278,7 +1326,19 @@ async def delete_turn(
A snapshot is taken before truncation (inside ``execute_rewind``)
so the user can recover via the snapshot index.
T110.1 guards ``event_id <= 0``: a stale tab or hand-crafted request
posting ``event_id=0`` would otherwise compute ``after_event_id=-1``
and silently truncate the entire log. ``id`` is auto-assigned by
SQLite starting at 1 so any caller's "real" id is always >= 1; a
zero or negative value can only mean a client bug, surfaced as 400.
"""
if int(event_id) <= 0:
raise HTTPException(
status_code=400,
detail=f"event_id must be a positive integer, got {event_id}",
)
chat = get_chat(conn, chat_id)
if chat is None:
raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}")
+10 -1
View File
@@ -71,18 +71,27 @@ def _read_recent_meanwhile_dialogue(
that already match avoids an unbounded scan as ``event_log``
grows. The user-side rows match on chat_id only since they aren't
tagged with a scene id (they ride the chat-wide log).
T113: clamp by the active branch's ``[origin, head]`` event-id range
so meanwhile prompt context respects the user's current branch.
Bootstrap-main and "no active branch" both fall through to ``(0,
BIG_INT)`` no functional change for the metadata-only Phase 4 era.
"""
from chat.state.branches import active_branch_event_ids
origin, head = active_branch_event_ids(conn)
cur = conn.execute(
"SELECT id, kind, payload_json FROM event_log "
"WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
" AND superseded_by IS NULL AND hidden = 0 "
" AND id BETWEEN ? AND ? "
" AND json_extract(payload_json, '$.chat_id') = ? "
" AND ("
" kind IN ('user_turn', 'user_turn_edit') "
" OR json_extract(payload_json, '$.meanwhile_scene_id') = ?"
" ) "
"ORDER BY id DESC LIMIT ?",
(chat_id, scene_id, limit),
(origin, head, chat_id, scene_id, limit),
)
rows = cur.fetchall()
rows.reverse()
+148 -9
View File
@@ -14,6 +14,12 @@ For each match we hydrate just enough metadata to render a row:
* the originating scene title when one exists,
* and the ``pov_summary`` itself.
T106 (Phase 4.5): hydration is batched. Pre-T106 the route called
``get_bot``/``get_chat``/``get_scene`` once per result row N+1 with
``DEFAULT_SEARCH_K=50`` meaning up to 150 individual SELECTs per page
load. We now collect distinct ids first and fan-in via three
``WHERE id IN (...)`` queries, then map back per row.
We deliberately keep this module synchronous and template-only no
HTMX swaps, no JSON API because the search box is a "leave the
current chat to look something up" surface, not an inline drawer.
@@ -21,7 +27,9 @@ current chat to look something up" surface, not an inline drawer.
from __future__ import annotations
import json
from pathlib import Path
from sqlite3 import Connection
from fastapi import APIRouter, Depends, Request
from fastapi.responses import HTMLResponse
@@ -36,29 +44,145 @@ TEMPLATES = Jinja2Templates(
directory=str(Path(__file__).resolve().parent.parent / "templates")
)
#: Maximum cross-chat FTS matches surfaced per ``/search`` page load.
#: Extracted as a module-level constant (T106) so the cap is tunable
#: without touching the route body. ``search_all_memories`` itself
#: defaults to a smaller ``k=20``; we override here because the
#: top-bar search is a "scan everything I've seen" surface, not an
#: inline drawer.
DEFAULT_SEARCH_K = 50
router = APIRouter()
def _fetch_bots_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]:
"""Batched sibling of :func:`chat.state.entities.get_bot`.
Inlined here (not exported from ``state.entities``) to keep T106's
scope confined to ``search.py`` per the Phase 4.5 plan. Returns
``{bot_id: bot_dict}`` for every id present in ``ids``; ids with
no matching row are simply absent from the map (the caller falls
back to the raw id string the same way it did pre-T106).
Empty ``ids`` short-circuits to ``{}`` because SQLite rejects
``WHERE id IN ()`` as a syntax error.
"""
if not ids:
return {}
placeholders = ",".join("?" * len(ids))
cols = [c[1] for c in conn.execute("PRAGMA table_info(bots)").fetchall()]
rows = conn.execute(
f"SELECT * FROM bots WHERE id IN ({placeholders})",
tuple(ids),
).fetchall()
out: dict[str, dict] = {}
for row in rows:
d = dict(zip(cols, row))
d["voice_samples"] = json.loads(d.pop("voice_samples_json"))
d["traits"] = json.loads(d.pop("traits_json"))
out[d["id"]] = d
return out
def _fetch_chats_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]:
"""Batched sibling of :func:`chat.state.world.get_chat`.
Mirrors that helper's ``chats``/``chat_state`` JOIN so the returned
dicts have the same shape (``narrative_anchor``, ``time``,
``weather``, ``active_scene_id``, etc.). Empty ``ids`` returns
``{}`` to dodge the ``IN ()`` syntax error.
"""
if not ids:
return {}
placeholders = ",".join("?" * len(ids))
rows = conn.execute(
"SELECT c.id, c.host_bot_id, c.guest_bot_id, c.created_at, "
" s.time, s.weather, s.active_scene_id, s.narrative_anchor "
f"FROM chats c JOIN chat_state s ON s.chat_id = c.id "
f"WHERE c.id IN ({placeholders})",
tuple(ids),
).fetchall()
return {
row[0]: {
"id": row[0],
"host_bot_id": row[1],
"guest_bot_id": row[2],
"created_at": row[3],
"time": row[4],
"weather": row[5],
"active_scene_id": row[6],
"narrative_anchor": row[7],
}
for row in rows
}
def _fetch_scenes_by_ids(conn: Connection, ids: set[int]) -> dict[int, dict]:
"""Batched sibling of :func:`chat.state.world.get_scene`.
Returns ``{scene_id: scene_dict}`` with ``participants`` already
JSON-decoded so callers see the same shape as the per-row helper.
Empty ``ids`` returns ``{}``.
"""
if not ids:
return {}
placeholders = ",".join("?" * len(ids))
cols = [c[1] for c in conn.execute("PRAGMA table_info(scenes)").fetchall()]
rows = conn.execute(
f"SELECT * FROM scenes WHERE id IN ({placeholders})",
tuple(ids),
).fetchall()
out: dict[int, dict] = {}
for row in rows:
d = dict(zip(cols, row))
d["participants"] = json.loads(d.pop("participants_json"))
out[d["id"]] = d
return out
@router.get("/search", response_class=HTMLResponse)
async def search(request: Request, q: str = "", conn=Depends(get_conn)):
"""Render ``search.html`` with up to 50 cross-chat FTS matches.
"""Render ``search.html`` with up to :data:`DEFAULT_SEARCH_K` matches.
``q`` is intentionally allowed to be empty that path renders the
page's "enter a query" placeholder rather than a 400, because the
top-bar form submits to this URL even with an empty input. T93's
service short-circuits whitespace-only queries to ``[]`` so there
is no FTS5 ``MATCH ''`` syntax error to guard against here.
"""
raw_results = search_all_memories(conn, query=q, k=50) if q else []
# Hydrate display fields per row. We do this in the route (not the
# service) so the service stays a pure FTS shim that other UIs
# can reuse.
Hydration (T106) is batched: rather than calling ``get_bot`` /
``get_chat`` / ``get_scene`` per row (worst case 3 * k individual
SELECTs), we collect distinct ids and issue one ``IN (...)`` query
per entity kind, then map back during the row build. ``get_bot``
et al. remain imported for test-time monkeypatching but are no
longer invoked on the hot path.
"""
raw_results = (
search_all_memories(conn, query=q, k=DEFAULT_SEARCH_K) if q else []
)
# Collect distinct ids up front so the IN-list queries dedupe (a
# popular bot or scene shows up many times across the result set).
bot_ids: set[str] = {r["owner_id"] for r in raw_results if r["owner_id"]}
chat_ids: set[str] = {r["chat_id"] for r in raw_results if r["chat_id"]}
scene_ids: set[int] = {
r["scene_id"] for r in raw_results if r["scene_id"]
}
bots_by_id = _fetch_bots_by_ids(conn, bot_ids)
chats_by_id = _fetch_chats_by_ids(conn, chat_ids)
scenes_by_id = _fetch_scenes_by_ids(conn, scene_ids)
# Hydrate display fields per row from the batched maps. We do this
# in the route (not the service) so the service stays a pure FTS
# shim that other UIs can reuse.
results = []
for row in raw_results:
bot = get_bot(conn, row["owner_id"])
chat = get_chat(conn, row["chat_id"])
scene = get_scene(conn, row["scene_id"]) if row["scene_id"] else None
bot = bots_by_id.get(row["owner_id"])
chat = chats_by_id.get(row["chat_id"])
scene = (
scenes_by_id.get(row["scene_id"]) if row["scene_id"] else None
)
results.append(
{
"memory_id": row["memory_id"],
@@ -69,6 +193,13 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)):
chat.get("narrative_anchor") if chat else None
),
"scene_id": row["scene_id"],
# T111.2: event_id deep-links to the originating turn
# via the ``id="turn-{event_id}"`` anchor that Phase 3.5
# T86 stamps on each turn DOM node. May be ``None`` for
# memory rows projected before the 0014 migration ran
# (T109 did not backfill historical rows); the template
# falls back to a chat-level link in that case.
"event_id": row["event_id"],
# Scenes have no ``title`` column today; surface the
# ``started_at`` timestamp as a human-friendly label
# when a scene is set, otherwise leave it blank.
@@ -76,6 +207,14 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)):
scene.get("started_at") if scene else None
),
"pov_summary": row["pov_summary"],
# T111.1: ``snippet`` is the FTS5 windowed excerpt with
# ``<mark>`` tags around each match. Falls back to the
# full ``pov_summary`` if the row lacks a snippet (which
# shouldn't happen on this code path because every
# ``raw_results`` row came from a MATCH query, but we
# guard defensively so the template never renders
# ``None``).
"snippet": row.get("snippet") or row["pov_summary"],
"significance": row["significance"],
"ts": row["ts"],
}
+35 -9
View File
@@ -8,20 +8,27 @@ Routes:
* ``GET /snapshots`` list all snapshots (both kinds)
* ``POST /snapshots/take`` take a periodic snapshot now
* ``POST /snapshots/restore/{id}`` restore (requires matching ``confirm_id``)
* ``POST /snapshots/restore/{id}`` restore (requires matching ``confirm_id`` and ``kind``)
* ``GET /snapshots/{id}/preview`` show metadata + delta vs current
The ``snapshot_id`` is the filename stem (the UTC timestamp written by
:func:`chat.services.snapshot.take_snapshot`) there's no separate UUID,
and the timestamp filename is already unique per snapshot kind. Both
periodic and rewind snapshots share the same id space lookup-wise, so
the restore + preview routes accept ``kind`` as a form/query param to
disambiguate.
the restore + preview routes require ``kind`` as a form/query param to
disambiguate (a missing/empty ``kind`` is a 400, not a silent default).
Note on ``created_at`` mtime drift: the listing's ``created_at`` comes
from the file's mtime, not the encoded filename timestamp. ``cp -p``
preserves mtime, but plain ``cp`` resets it to "now" so a copied
snapshot can show a misleading ``created_at`` while its filename still
reflects the original UTC capture time.
"""
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
from fastapi import APIRouter, Depends, Form, HTTPException, Request
@@ -52,8 +59,6 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]:
``last_event_id`` (parsed from the JSON body small enough that
listing isn't a performance concern for the handful of files we keep).
"""
from datetime import datetime, timezone
rows: list[dict] = []
for kind in SNAPSHOT_KINDS:
snap_dir = data_dir / "snapshots" / kind
@@ -85,12 +90,26 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]:
return rows
def _require_kind(kind: str) -> str:
"""Reject missing/empty/unknown ``kind`` with 400.
Defaulting silently to ``"periodic"`` made rewind-snapshot lookups
appear as 404s, which is confusing make the client always state
the kind explicitly.
"""
if not kind or kind not in SNAPSHOT_KINDS:
raise HTTPException(
status_code=400,
detail=f"kind must be one of {SNAPSHOT_KINDS}",
)
return kind
def _resolve_snapshot_path(
data_dir: Path, snapshot_id: str, kind: str
) -> Path:
"""Map an ``(id, kind)`` pair to the on-disk file, or 404."""
if kind not in SNAPSHOT_KINDS:
raise HTTPException(status_code=400, detail=f"unknown kind: {kind}")
_require_kind(kind)
path = data_dir / "snapshots" / kind / f"{snapshot_id}.json"
if not path.exists():
raise HTTPException(status_code=404, detail="snapshot not found")
@@ -127,7 +146,7 @@ async def snapshots_restore(
snapshot_id: str,
request: Request,
confirm_id: str = Form(""),
kind: str = Form("periodic"),
kind: str = Form(""),
conn=Depends(get_conn),
):
"""Hard-confirm restore: ``confirm_id`` must equal the path id.
@@ -135,7 +154,11 @@ async def snapshots_restore(
Mismatched confirm 400 (without touching the DB). On match, the
existing :func:`restore_from_snapshot` clears projected tables and
re-loads them from the dump.
``kind`` is required (must be ``"periodic"`` or ``"rewind"``) a
missing or empty value 400s rather than silently defaulting.
"""
_require_kind(kind)
if confirm_id != snapshot_id:
raise HTTPException(
status_code=400,
@@ -151,7 +174,7 @@ async def snapshots_restore(
async def snapshots_preview(
snapshot_id: str,
request: Request,
kind: str = "periodic",
kind: str = "",
conn=Depends(get_conn),
):
"""Show snapshot metadata + a basic delta against the current event log.
@@ -159,7 +182,10 @@ async def snapshots_preview(
Phase 4 keeps this simple: the snapshot's ``last_event_id`` plus the
current ``MAX(event_log.id)`` is enough to tell the user how far the
log has moved on. A richer per-table diff is a Phase 4.5+ concern.
``kind`` is required see :func:`snapshots_restore`.
"""
_require_kind(kind)
settings = request.app.state.settings
path = _resolve_snapshot_path(settings.data_dir, snapshot_id, kind)
dump = json.loads(path.read_text())
+30
View File
@@ -812,6 +812,14 @@ async def post_turn(
payload={
"event_id": transition.event_id,
"started_at": chat.get("time"),
# T114.1: back-reference to the assistant_turn that
# triggered this transition. Regenerate uses this
# to roll back lifecycle transitions when the turn
# is superseded. Forward-only — older events
# without this field are skipped by rollback.
"triggered_by_assistant_turn_id": (
primary_assistant_event_id
),
},
)
elif transition.new_status == "completed":
@@ -821,6 +829,10 @@ async def post_turn(
payload={
"event_id": transition.event_id,
"completed_at": chat.get("time"),
# T114.1: back-reference (see above).
"triggered_by_assistant_turn_id": (
primary_assistant_event_id
),
},
)
# Run promotion inline so the artifact-emitting events
@@ -842,6 +854,10 @@ async def post_turn(
payload={
"event_id": transition.event_id,
"completed_at": chat.get("time"),
# T114.1: back-reference (see above).
"triggered_by_assistant_turn_id": (
primary_assistant_event_id
),
},
)
# Any other ``new_status`` value falls through silently —
@@ -873,6 +889,20 @@ async def post_turn(
# mid-stream still meant to close the scene — the cancelled bot
# beat doesn't invalidate that intent. Pinned by
# test_cancelled_turn_still_closes_scene_when_user_prose_signals_close.
#
# T108 NOTE — the in-memory append order is correct, but the cancel
# path re-raises ``CancelledError`` at the end of ``post_turn``
# (see step 11 below). The ``open_db`` dependency teardown skips
# ``conn.commit()`` when the consumer raises, which means in
# production a genuine cancel currently rolls back ALL post-cancel
# writes — including this scene_closed event, the truncated
# assistant_turn record, edge updates, and per-POV summaries. The
# T74.3 regression test passes only because of a missing
# ``import asyncio`` in the test module: the inline mock raises
# ``NameError`` instead of ``CancelledError``, which is caught by
# the ``except Exception:`` branch and leaves ``cancelled=False``,
# so the function returns 204 normally and the commit fires. This
# is a transactional bug deferred for triage (T108 report).
if scene is not None and prose.strip():
container = None
if scene.get("container_id") is not None:
@@ -522,6 +522,8 @@ Written per witness when a scene closes. Different details, different interpreta
**Status: shipped 2026-04-27** (T88T102, 15 tasks across 8 waves; +70 tests). See "Phase 4 status" in CLAUDE.md for the per-task breakdown. Vector retrieval shipped via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions); branching is data-model + drawer UI; significance review, hide-from-view soft delete, surgical delete with cascade preview, snapshot UX, and cross-chat search all surface from the drawer or top-bar.
**Phase 4.5 cleanup: shipped 2026-04-27** (T103T118, 13 of 14 planned tasks; T115 sqlite-vec swap deferred to Phase 5 due to host Python lacking `enable_load_extension`; +~44 tests; schema baseline now 14). See "Phase 4.5 status" in CLAUDE.md for the per-task breakdown — notable shipped: real embedding model swap path (`LLMClient.embed()` + `--re-embed-all`), branching read-side filter (`active_branch_event_ids`), regenerate lifecycle rollback (`event_status_reverted`), FTS snippet highlighting + deep-link to turn (`memories.event_id`), bulk significance re-rate.
- Vector retrieval (sqlite-vss or sqlite-vec).
- Branching UI.
- Drawer-edit on every field.
@@ -0,0 +1,724 @@
# Roleplay Engine — Phase 4.5 Cleanup Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
**Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder).
**Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability.
**Tech Stack:**
- Existing — same as Phase 4.
- **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5.
**Source-of-truth references:**
- Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred).
- Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md).
- Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status".
---
## Pre-flight
**Branch:** create `phase-4.5` from the latest `main`:
```bash
git checkout main && git pull && git checkout -b phase-4.5
```
**Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15.
**Optional pre-flight for T115 (sqlite-vec swap):**
The host Python build needs `enable_load_extension`. Two options:
1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv.
2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`).
If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected.
**Pinned non-negotiables (carried forward):**
- State changes go through the event log. Use `append_and_apply` for the live path.
- Witness filter every memory read at SQL level.
- TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
- One commit per task minimum. Bundled tasks split internally.
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output.
---
## Backlog item → task mapping
24 items consolidated into 14 tasks by **file ownership**:
| # | Item | Source | Task |
|---|------|--------|------|
| 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) |
| 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** |
| 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** |
| 4 | Real embedding model swap | T91 / deferred | **T112** |
| 5 | `timeout_s` fallback-path logging | T91 | **T107** |
| 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** |
| 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** |
| 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** |
| 9 | `html.escape()` on delete-impact modal output | T98 | **T110** |
| 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** |
| 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** |
| 12 | Strict `kind` validation in snapshot routes | T99 | **T105** |
| 13 | `created_at` from file mtime — document drift risk | T99 | **T105** |
| 14 | Hardcoded `k=50` → module constant | T100 | **T106** |
| 15 | N+1 lookups in search results | T100 | **T106** |
| 16 | FTS highlighting via `snippet()` | T100 | **T111** |
| 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** |
| 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** |
| 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** |
| 20 | Bulk significance re-rate in drawer | deferred | **T110** |
| 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) |
| 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** |
| 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** |
| 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** |
---
## Parallel-Execution Strategy
Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves.
### How to dispatch a wave in parallel
Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4.5`.)
### After a wave completes
1. Each subagent returns its worktree path and commit SHA(s).
2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
4. **Run the full test suite** on the merged `phase-4.5`.
5. **Push `phase-4.5`** to gitea.
6. Optionally clean up worktrees.
### Conflict prevention checklist
For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`.
### Why each wave is parallel-safe
| Wave | Tasks | Hot files | Disjoint? |
|------|-------|-----------|-----------|
| 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ |
| 2 | T109 | new migration + minor projector update | (single task) |
| 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) |
| 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) |
| 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) |
| 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) |
| 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) |
| 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) |
| 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ |
---
## Task overview
```
Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning)
├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)
├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)
├─ T106: search.py polish (k constant + N+1 batched lookups)
├─ T107: embeddings.py timeout_s fallback-path logging
└─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test)
Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column)
Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)
Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id)
Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill)
Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range)
Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede)
Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight]
Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness)
├─ T117: Phase 4.5 cross-feature integration tests
└─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals
```
Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 28 are single-task by hot-file constraint.
---
## Wave 1 — Independent small fixes (parallel, 6 tasks)
All trivial, file-disjoint. Each is 1-line + 1-test or similar.
### Task 103: branches polish
**Files:**
- Modify: `chat/state/branches.py`
- Modify: `tests/test_branches_state.py`
**Spec (2 sub-fixes, single commit):**
1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped).
2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches.
**Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable).
**Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`.
---
### Task 104: state/memory.py polish
**Files:**
- Modify: `chat/state/memory.py`
- Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed)
**Spec (2 sub-fixes):**
1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper.
2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit).
**Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired.
**Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`.
---
### Task 105: snapshots.py polish
**Files:**
- Modify: `chat/web/snapshots.py`
- Modify: `tests/test_snapshot_ux.py` (1 new test)
**Spec (3 sub-fixes):**
1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`).
2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting.
3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note.
**Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/<id>` without `kind`; assert 400.
**Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`.
---
### Task 106: search.py polish
**Files:**
- Modify: `chat/web/search.py`
- Modify: `tests/test_search_ux.py` (1 new test)
**Spec (2 sub-fixes):**
1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site.
2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row.
**Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`.
**Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`.
---
### Task 107: embeddings.py timeout_s fallback-path logging
**Files:**
- Modify: `chat/services/embeddings.py`
- Modify: `tests/test_embeddings.py` (1 new test)
**Spec:**
When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade.
```python
if model != DEFAULT_EMBEDDING_MODEL:
_log.warning(
"generate_embedding: non-default model %r returned fallback "
"(model client.embed() not yet implemented in Phase 4.5+); "
"downstream search will degrade silently. Configure a supported model.",
model,
)
return EmbeddingResult(...) # fallback
```
The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning.
**Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears.
**Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`.
---
### Task 108: scene-close-on-cancel UX revisit
**Files:**
- Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one)
- Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation
**Spec:**
This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output.
**Action:**
1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4).
2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't).
3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only.
**Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change.
**Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract.
**Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`.
---
## Wave 2 — Schema migration (single)
### Task 109: 0014 schema migration
**Files:**
- Create: `chat/db/migrations/0014_phase45_schema.sql`
- Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written)
- Modify: `tests/test_world.py` (bump schema_version assertion to 14)
- Modify: `tests/test_memory_write.py` (assert event_id populated)
**Spec:**
Two schema changes bundled into a single migration:
1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically.
2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting.
**Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context.
**Tests:**
1. `test_schema_version_after_migration_is_14` (rename + bump from 13).
2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id.
3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable.
**Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`.
---
## Wave 3 — Drawer Phase 4.5 bundle (single)
### Task 110: drawer polish + bulk significance re-rate
**Files:**
- Modify: `chat/web/drawer.py`
- Modify: `chat/templates/_drawer.html`
- Create: `chat/templates/_delete_impact_modal.html` (extracted partial)
- Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind)
- Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests)
**Spec (4 sub-fixes, 4 commits):**
1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`.
2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS.
3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern.
4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail).
**Tests:**
1. `test_delete_turn_with_event_id_zero_returns_400`.
2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"<div class=\"delete-impact-modal\">" in response.content` or similar).
3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `<script>` in a description-bound field; render preview; assert it appears HTML-escaped.
4. `test_bulk_significance_re_rate_emits_manual_edit_per_memory` — seed 5 memories at significance 0; bulk re-rate to 2; assert 5 `manual_edit` events landed.
**Commits (4):**
- `fix: drawer delete_turn guards event_id <= 0 (T110.1)`
- `fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)`
- `refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)`
- `feat: drawer bulk significance re-rate per chat (T110.4)`
---
## Wave 4 — Search UX enhancements (single)
### Task 111: FTS highlighting + deep-link to turn
**Files:**
- Modify: `chat/services/cross_chat_search.py`
- Modify: `chat/web/search.py`
- Modify: `chat/templates/search.html`
- Modify: `tests/test_search_ux.py`
**Spec (2 sub-fixes, 2 commits):**
1. **FTS highlighting via `snippet()`** (T100 nit): replace the `pov_summary` column in `search_all_memories`'s SELECT with `snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)` to return a highlighted snippet around the match. The template renders this raw via `|safe` (the snippet is built by SQLite from indexed content; the `<mark>` tags are the only HTML, and SQLite escapes any HTML special chars in the source content).
2. **Deep-link to turn via memories.event_id** (T100 nit + T109 dependency): now that `memories.event_id` exists (from T109), each search result row knows the originating event id. The chat page uses turn-id stamping (Phase 3.5 T86 added `id="turn-{event_id}"`). Build result links as `/chats/{chat_id}#turn-{event_id}`. The chat page DOM scrolls to the anchor on load (browser default).
**Tests:**
1. `test_search_results_include_fts_snippet_with_highlight` — seed memory with text containing "rabbit"; search for "rabbit"; assert response body contains `<mark>rabbit</mark>` (or whatever marker the snippet uses).
2. `test_search_result_link_includes_turn_anchor` — seed memory with known event_id; search; assert link href contains `#turn-{event_id}`.
**Commits (2):**
- `feat: cross-chat search FTS snippet highlighting (T111.1)`
- `feat: cross-chat search deep-links to turn via memories.event_id (T111.2)`
---
## Wave 5 — Real embedding model (single)
### Task 112: Real embedding model swap
**Files:**
- Modify: `chat/llm/client.py` (Protocol — add `embed(text, model) -> list[float]` method)
- Modify: `chat/llm/featherless.py` (FeatherlessClient — implement `embed` against Featherless `/v1/embeddings` endpoint OR equivalent)
- Modify: `chat/llm/mock.py` (MockLLMClient — accept canned embedding vectors)
- Modify: `chat/services/embeddings.py` (route non-default model through `client.embed()`)
- Modify: `chat/config.py` (add `embedding_model: str` setting; default to current pseudo)
- Modify: `scripts/backfill_embeddings.py` (re-embed-all option for model swaps)
- Modify: `tests/test_embeddings.py` + `tests/test_llm_mock.py` + `tests/test_featherless.py` (if exists)
**Spec:**
Phase 4 ships a deterministic SHA-256 pseudo-embedding (deterministic but semantically meaningless). T112 wires the path for a real embedding model.
**Steps:**
1. **Extend `LLMClient` Protocol** with `async def embed(self, text: str, *, model: str) -> list[float]`.
2. **Implement on FeatherlessClient**: call the Featherless OpenAI-compatible `/v1/embeddings` endpoint:
```python
response = await self._http.post(
"/v1/embeddings",
json={"model": model, "input": text},
headers={"Authorization": f"Bearer {self._api_key}"},
)
data = response.json()
return data["data"][0]["embedding"]
```
Handle rate limits (existing 2-conn semaphore covers this).
3. **Implement on MockLLMClient**: `embed` pops a canned vector from a new `canned_embeddings` queue. Tests configure this queue.
4. **Update `generate_embedding`**: when `model != DEFAULT_EMBEDDING_MODEL`, call `client.embed(text, model=model)` instead of falling through to fallback. Wrap in try/except — failures fall back to zero vector (existing fallback path).
5. **Settings**: add `embedding_model: str = "pseudo-sha256-384"` to `Settings`. App reads this at startup; the embedding worker (`chat/services/embedding_worker.py`) passes it through.
6. **Backfill script**: add `--re-embed-all` flag that walks ALL memories (regardless of existing `embeddings_meta` rows) and re-embeds with the configured model. Useful for swapping models.
**Tests:**
1. `test_embed_routes_to_client_when_non_default_model` — mock client with canned vector; call `generate_embedding(model="bge-small-en-v1.5")`; assert vector matches the canned response.
2. `test_embed_falls_back_on_client_failure` — mock client to raise; assert returns zero vector with model="fallback".
3. `test_mock_llm_client_embed_pops_canned`.
4. `test_featherless_embed_calls_correct_endpoint` (if there's an existing featherless test pattern; otherwise mock the HTTP layer).
**Commits:**
- `feat: LLMClient Protocol gains embed() method (T112.1)`
- `feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)`
- `feat: generate_embedding routes non-default models through client.embed (T112.3)`
- `feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)`
---
## Wave 6 — Branching read-side filter (single, BIG)
### Task 113: Branching read-side filter
**Files (cross-cutting):**
- Modify: `chat/services/turn_common.py::read_recent_dialogue` — filter events to active branch's range
- Modify: `chat/services/scene_summarize.py::_read_recent_dialogue` (similar)
- Modify: `chat/state/memory.py::search_memories` — memories should be filtered to active branch (memories.event_id from T109 enables this)
- Modify: `chat/state/branches.py` — add helper `active_branch_event_ids(conn) -> tuple[int, int]` returning (origin, head)
- Add tests across multiple files
- Modify: `tests/test_branching.py` — add cross-feature tests
**Spec:**
Phase 4 T89 + T94 shipped branching as metadata-only (the table tracks branches; the drawer UI can switch). But event readers DON'T consult `is_active` — they read the entire event_log. So switching branches has no functional effect.
T113 wires the filter:
1. **Helper** `active_branch_event_ids(conn) -> tuple[int, int]`: returns `(origin_event_id, head_event_id)` for the currently active branch. For "main" with origin=0 + head=N, returns `(0, N)` meaning "all events visible".
2. **Apply filter** in every event reader that returns historical state:
- `read_recent_dialogue`: WHERE clause adds `id BETWEEN ? AND ?` (the active branch's range).
- `search_memories`: WHERE clause adds `m.event_id BETWEEN ? AND ?` (uses T109's column).
- `scene_summarize._read_recent_dialogue`: same as turn_common.
- Other readers TBD — grep for `event_log` SELECT patterns and audit each one.
3. **Branches that diverge**: when branch B is created from event 10 and then accumulates events 11-15 (which only exist on B's timeline), but main also accumulates 11-12, the events overlap by id range. This is OK because event reads filter by `id <= active_branch.head_event_id`. The simpler model: branches share event_log ids globally, but each branch's "head" defines which ids are visible.
4. **Events written under branch B** carry an implicit branch tag — but the event_log table has no `branch_id` column today. T113 punts on cross-branch event writes (they all land in the global log) and relies on the `head_event_id` filter to scope reads. This is a Phase 4.5+ first cut; full branch-isolated event_log is Phase 5+.
**Edge cases:**
- Active branch has `head_event_id = 0` (just created): readers return empty.
- No active branch: readers fall through to "all events visible" (defensive).
- Switching branches mid-flight: each `read_recent_dialogue` call re-queries `active_branch`, so it's always current. No caching.
**Tests:** 5+ minimum.
1. `test_read_recent_dialogue_respects_active_branch_head` — seed 10 events; active branch head = 5; assert only first 5 returned.
2. `test_search_memories_respects_active_branch_head` — same.
3. `test_branch_switch_changes_visible_events` — switch branches; immediately read; assert different result sets.
4. `test_main_branch_with_head_zero_returns_empty` — defensive.
5. `test_no_active_branch_falls_through_to_all_events` — defensive.
**Commit:** `feat: branching read-side filter — event readers consult active branch range (T113)`.
**This is the largest task in Phase 4.5.** Estimate 200-400 lines across multiple files. Implementer should split commits if it helps clarity (one per affected reader).
---
## Wave 7 — Lifecycle rollback in regenerate (single)
### Task 114: Lifecycle rollback
**Files:**
- Modify: `chat/services/regenerate.py`
- Modify: `chat/db/migrations/0014_phase45_schema.sql` (T109's migration) — add column? OR
- Add new migration — see decision below
- Modify: tests in `tests/test_regenerate.py`
**Spec:**
Phase 3.5 T83.4 shipped a warning log when regenerate detects un-rolled-back lifecycle transitions. T114 implements actual rollback.
**Schema decision:**
Option A: extend lifecycle event payloads with `triggered_by_assistant_turn_id` (no schema change needed — just a payload convention). Production code (T61 turn flow) populates it when emitting `event_started`/`event_completed`/`event_cancelled`. Existing rows have NULL — rollback skips them with a debug log.
Option B: add a column to `event_log` for stronger invariants. Significant migration cost.
**Recommended:** Option A. Safer, no migration, backward compatible (older events skip rollback). Document in commit body.
**Rollback semantics:**
When regenerate detects lifecycle events triggered by the superseded turn:
- `event_started` → emit `event_cancelled` (or a NEW `event_started_undone` event kind that reverts status to "planned") with the same event_id.
- `event_completed` → emit `event_uncompleted` (NEW event kind that reverts status from "completed" to "active").
- `event_cancelled` → emit `event_uncancelled` (reverts to prior status — which we'd need to track; or simpler: emit `event_started` again to restore "active").
**Simpler approach (recommended):** add ONE new event kind `event_status_reverted` with payload `{event_id, prior_status}`. The projector sets `events.status = prior_status` for the event_id. Rollback emits this event for each affected lifecycle transition, looking up the prior status from the row's history (via event_log scan) or accepting it as a payload field.
**Production code change:** in `chat/web/turns.py::post_turn` (and `chat/services/regenerate.py`), when emitting `event_started`/`event_completed`/`event_cancelled`, populate `triggered_by_assistant_turn_id: <id>` in the payload. Forward-only — older code doesn't need updating.
**Tests:** 3 minimum.
1. `test_regenerate_rolls_back_event_started_from_superseded_turn` — seed an event; play a turn that starts it; regenerate; assert `event_status_reverted` event landed with `prior_status="planned"` and the events row is back to "planned".
2. `test_regenerate_rolls_back_event_completed_to_active` — same but completed → active rollback.
3. `test_regenerate_skips_events_without_back_reference` — older events without `triggered_by_assistant_turn_id` are not rolled back (debug log). Pin the backward-compat behavior.
**Commits:**
- `feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)`
- `feat: event_status_reverted event kind + projector handler (T114.2)`
- `feat: regenerate rolls back lifecycle transitions on supersede (T114.3)`
---
## Wave 8 — sqlite-vec swap (single, ENVIRONMENTAL)
### Task 115: sqlite-vec swap (optional)
**Files:**
- Create: `chat/db/migrations/0015_vec0_virtual_tables.sql`
- Modify: `chat/db/connection.py` (load extension on every connection)
- Modify: `chat/services/vector_search.py` (rewrite to use vec0 MATCH instead of pure-Python cosine)
- Modify: `chat/state/embeddings.py` (writer needs to populate vec0 table)
- Modify: `pyproject.toml` (add `sqlite-vec` dependency)
**Pre-flight:**
This task REQUIRES one of:
- Python rebuilt with `--enable-loadable-sqlite-extensions` (pyenv reinstall).
- `apsw` migration of `chat/db/connection.py`.
If neither is feasible at the time of execution: SKIP THIS TASK and document the deferral in T118 docs sweep. The other 13 Phase 4.5 tasks ship without it.
**Spec:**
1. **Migration** `0015_vec0_virtual_tables.sql`:
```sql
CREATE VIRTUAL TABLE embeddings_vec USING vec0(
memory_id INTEGER PRIMARY KEY,
embedding FLOAT[384]
);
-- Backfill from existing JSON embeddings table.
INSERT INTO embeddings_vec (memory_id, embedding)
SELECT memory_id, vec_f32(vector_json) FROM embeddings;
```
2. **`chat/db/connection.py`** loads `sqlite_vec` extension on every connection:
```python
import sqlite_vec
def open_db(...):
conn = sqlite3.connect(...)
conn.enable_load_extension(True)
sqlite_vec.load(conn)
conn.enable_load_extension(False)
...
```
3. **Rewrite `vector_search.py`** to use `embeddings_vec MATCH ?` syntax with `k=?` clause:
```sql
SELECT m.id, m.pov_summary, m.significance, e.distance
FROM embeddings_vec e
JOIN memories m ON m.id = e.memory_id
WHERE e.embedding MATCH ? AND k = ?
AND m.owner_id = ?
AND m.witness_<role> = 1
ORDER BY e.distance ASC
LIMIT ?
```
4. **HNSW note**: vec0 supports both flat (default) and HNSW indexes. T115 ships flat (sufficient for < few thousand memories). Document HNSW upgrade path in CLAUDE.md if memory counts ever grow past pure-Python feasibility.
5. **Old `embeddings` JSON table**: keep alongside `embeddings_vec` (data redundancy is fine; the JSON table is the source of truth and `embeddings_vec` is the index). Backfill on migration. Keep the `embedding_indexed` projector populating both.
**Tests:** rewrite `tests/test_vector_search.py` to expect new behavior. Same observable contract — only implementation changes. All 5 existing tests should pass post-swap.
**Commit:** `feat: sqlite-vec swap (vec0 virtual tables + MATCH-based search) (T115)`.
---
## Wave 9 — Polish (parallel, 3 tasks)
### Task 116: Structured test-fixture builder
**Files:**
- Create: `tests/fixtures.py` (or extend `tests/conftest.py`)
- Modify: existing test files that use brittle canned-queue arrays (selectively)
**Spec:**
Phase 3 carry-over. Tests across `test_turn_flow.py`, `test_meanwhile_turn_flow.py`, `test_phase3_integration.py`, `test_phase4_integration.py` use positional canned-response arrays for `MockLLMClient`. Adding a new classifier call to a code path requires updating canned arrays in many tests.
**Solution:** structured fixture builder that lets tests declare their classifier expectations by name, not position:
```python
# tests/fixtures.py
class CannedQueue:
def __init__(self):
self._queue = []
def parse_turn(self, **fields): ...
def state_update(self, **fields): ...
def detect_scene_close(self, should_close: bool): ...
def detect_event_transitions(self, transitions: list[dict]): ...
def summarize_scene(self, summary: str, **fields): ...
def detect_threads(self, candidates: list[dict]): ...
# ... one method per classifier service
def build(self) -> list[str]:
return [json.dumps(item) for item in self._queue]
```
Usage:
```python
def test_post_turn_with_event_transition(...):
canned = (
CannedQueue()
.parse_turn(intent="narrative")
.narrative("BotA speaks.") # narrative is a stream, but for simplicity treat it like a canned response
.state_update(affinity_delta=0, trust_delta=0)
.state_update(affinity_delta=0, trust_delta=0)
.detect_event_transitions([{"event_id": "evt_1", "new_status": "completed"}])
.detect_scene_close(should_close=False)
.build()
)
mock = MockLLMClient(canned=canned)
# ...
```
**Migration scope:** don't migrate ALL existing tests at once — that's a separate massive refactor. Instead, ship the fixture builder + migrate 2-3 representative tests as proof of concept. Document the migration path in the fixture's docstring.
**Tests:** the fixture builder itself doesn't need extensive testing — it's just a builder. Add 1-2 sanity tests that the JSON output matches expected shapes.
**Commit:** `test: structured CannedQueue fixture builder for classifier mocks (T116)`.
---
### Task 117: Phase 4.5 cross-feature integration tests
**Files:**
- Create: `tests/test_phase45_integration.py`
**Spec:**
End-to-end multi-feature flows specific to Phase 4.5 changes. 5 tests minimum.
1. **Real embedding swap + retrieval** — configure `embedding_model="bge-small-en-v1.5"` (mocked); write a memory; backfill or wait for worker; assert vector search returns the memory via `client.embed`-derived vector (not pseudo).
2. **Branching read-side filter end-to-end** — create a branch from turn 5; switch; play 3 turns on the branch; switch back to main; assert main's recent dialogue is missing the branch turns (read filter respects active branch's head).
3. **Lifecycle rollback** — start an event via a turn; regenerate that turn; assert lifecycle reverted (event back to "planned").
4. **Search deep-link** — write memories; search; click a result; verify the chat page renders with the right turn anchored (assert via TestClient response — either the browser anchor OR a server-side scroll-to-anchor mechanism).
5. **Bulk significance re-rate end-to-end** — seed 5 memories at significance 0; bulk re-rate via drawer; verify significance histogram updates.
**Commit:** `test: phase 4.5 cross-feature integration coverage (T117)`.
---
### Task 118: Phase 4.5 documentation update
**Files:**
- Modify: `CLAUDE.md`
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 entries — though they're already shipped per Phase 4 T102)
**Spec:**
Mirror the Phase 3.5 / 2.5 status sections. Document:
- All shipped items per task (T103T117).
- Empty out the Phase 4.5 / 5 backlog (replace with single "All items shipped" line).
- Add new "Phase 5 backlog" section if any Phase 4.5 reviews surfaced new follow-ups.
**Phase 5 backlog candidates** (default, if no new follow-ups discovered):
- Vector index optimization (HNSW) when memory counts grow past flat-index feasibility.
- Branch-isolated event_log (each branch has its own physical event_log range vs the current shared id space + head filter).
- Embedding model swap migration tooling — when changing models, need to re-embed everything; T112 added `--re-embed-all` but a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
- Real-time collaborative branching (multi-user) — out of scope for v1.
- Avatars / portraits (multimodality) — deferred indefinitely per design §14.
**Commit:** `docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118)`.
---
## Wrap-up
After Wave 9 lands:
1. **Run full suite** on `phase-4.5`: should be ~430+ tests passing (413 from Phase 4 + ~20 new across Phase 4.5).
2. **Manual smoke** (recommended before opening the PR):
- Configure `embedding_model="bge-small-en-v1.5"` (or whatever real model is chosen); restart server; play a turn; verify `embedding_indexed` events use the real model and search returns semantically-relevant memories.
- Create a branch, switch, play turns, switch back — verify main's history is unaffected.
- Plan an event, complete it via a turn, regenerate that turn — verify event reverts to "planned".
- Use cross-chat search; click a result; verify it lands on the right turn in the chat page.
- Bulk re-rate a chat's significance distribution.
3. **Push `phase-4.5`** to gitea.
4. **Open PR** `phase-4.5 → main`.
---
## Notes for the controller running this plan
- **T115 (sqlite-vec swap)** is environmental. If pre-flight fails (no rebuilt Python, no apsw), defer to Phase 5 and ship Phase 4.5 with 13 tasks. T118 docs sweep should note the deferral.
- **T112 (real embedding swap)** assumes Featherless or similar exposes an `/v1/embeddings` endpoint. If not available, document the gap and ship the Protocol + Mock impl only (Featherless impl deferred). The pseudo path remains the default in that case — same as Phase 4.
- **T113 (branching read-side filter)** is the riskiest task. Cross-cutting. Land it on a quiet branch, test thoroughly. If integration tests break in unexpected ways, bisect the affected reader and add coverage.
- **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T103T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
- **Token-spend rough estimate**: Phase 4.5 should be ~50% the size of Phase 4 (similar number of tasks, mostly smaller). Big tasks (T112, T113, T114) bring the per-task spend up but parallelism in Wave 1 + Wave 9 brings the wall-clock down.
- **DO NOT break existing v1/v2/v3/v3.5/v4 surface contracts.** Every test file that was green at the start of Phase 4.5 must stay green at the end. The cross-feature integration tests (`tests/test_phase4_integration.py`, `tests/test_phase3_integration.py`) are particularly load-bearing.
@@ -0,0 +1,23 @@
{
"planPath": "docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md",
"tasks": [
{"id": 103, "subject": "T103: branches polish (global-leak doc + branch-switch warning)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 104, "subject": "T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 105, "subject": "T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 106, "subject": "T106: search.py polish (k constant + N+1 batched lookups)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 107, "subject": "T107: embeddings.py timeout_s fallback-path logging", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 108, "subject": "T108: scene-close-on-cancel UX revisit (regression test pin + rationale doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 109, "subject": "T109: 0014 schema migration (FK CASCADE + memories.event_id column)", "status": "pending", "wave": 2, "parallelGroup": null},
{"id": 110, "subject": "T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [109]},
{"id": 111, "subject": "T111: search UX (FTS snippet highlighting + deep-link via memories.event_id)", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [109]},
{"id": 112, "subject": "T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + routing)", "status": "pending", "wave": 5, "parallelGroup": null},
{"id": 113, "subject": "T113: branching read-side filter (event readers consult is_active branch range)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [109]},
{"id": 114, "subject": "T114: regenerate lifecycle rollback (back-reference + event_status_reverted)", "status": "pending", "wave": 7, "parallelGroup": null},
{"id": 115, "subject": "T115: sqlite-vec swap (vec0 virtual tables + MATCH search) [ENVIRONMENTAL — may defer]", "status": "pending", "wave": 8, "parallelGroup": null},
{"id": 116, "subject": "T116: structured CannedQueue test fixture builder", "status": "pending", "wave": 9, "parallelGroup": "wave-9"},
{"id": 117, "subject": "T117: phase 4.5 cross-feature integration tests", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]},
{"id": 118, "subject": "T118: phase 4.5 docs sweep — prune backlog, capture phase 5 candidates", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]}
],
"lastUpdated": "2026-04-27T00:00:00Z",
"notes": "16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Wave 1 (6-way parallel) and Wave 9 (3-way parallel) maximize parallelism. Waves 2-8 are single-task by hot-file constraint. T115 (sqlite-vec swap) requires Python rebuild OR apsw migration — environmental; may defer to Phase 5. Schema baseline 13 -> 14 (T109's 0014) -> optionally 15 (T115's 0015). Big tasks: T112 (real embedding swap), T113 (branching read-side filter — riskiest), T114 (lifecycle rollback). Uses task ids T103-T118."
}
+71 -10
View File
@@ -8,8 +8,21 @@ Phase 4 ships the deterministic local pseudo-embedding so this script
runs synchronously without a network round-trip the LLMClient argument
is not needed on the pseudo path. Phase 4.5+ will need a real client.
T112 (Phase 4.5) adds two flags:
* ``--re-embed-all`` walks **every** memory regardless of whether it
already has an ``embeddings`` row. Useful when swapping embedding
models the projector is INSERT OR REPLACE, so re-emitting an event
for an existing memory replaces the prior vector. Without this flag,
the script keeps the Phase 4 behavior of only filling in gaps.
* ``--model M`` overrides ``Settings.embedding_model`` for this run.
Defaults to the configured model (which itself defaults to
``"pseudo-sha256-384"``).
Run from the repo root:
.venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run]
.venv/bin/python scripts/backfill_embeddings.py --re-embed-all
.venv/bin/python scripts/backfill_embeddings.py --re-embed-all --model bge-small-en-v1.5
"""
from __future__ import annotations
@@ -17,11 +30,12 @@ from __future__ import annotations
import argparse
import asyncio
from chat.config import load_settings
from chat.config import Settings, load_settings
from chat.db.connection import open_db
from chat.db.migrate import apply_migrations
from chat.eventlog.log import append_and_apply
from chat.services.embeddings import (
DEFAULT_EMBEDDING_MODEL,
FALLBACK_EMBEDDING_MODEL,
generate_embedding,
)
@@ -34,6 +48,24 @@ import chat.state.memory # noqa: F401
import chat.state.world # noqa: F401
def _build_client(settings: Settings):
"""Construct an LLMClient for the backfill run.
Default-model runs (the pseudo path) don't need a client, so we
return ``None`` and ``generate_embedding`` skips the call. Non-default
models route through the real client; injectable via monkeypatch in
tests.
"""
if settings.embedding_model == DEFAULT_EMBEDDING_MODEL:
return None
from chat.llm.featherless import FeatherlessClient
return FeatherlessClient(
api_key=settings.featherless_api_key,
base_url=settings.featherless_base_url,
)
async def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
@@ -47,23 +79,51 @@ async def main() -> None:
action="store_true",
help="Print the count of memories needing embeddings, then exit.",
)
parser.add_argument(
"--re-embed-all",
action="store_true",
help=(
"Walk every memory (not just those without an embeddings row) "
"and re-emit embedding_indexed events. Use this when swapping "
"embedding models so the existing rows get replaced."
),
)
parser.add_argument(
"--model",
type=str,
default=None,
help=(
"Embedding model identifier. Overrides Settings.embedding_model "
"for this run; default uses the configured model."
),
)
args = parser.parse_args()
settings = load_settings()
settings.db_path.parent.mkdir(parents=True, exist_ok=True)
apply_migrations(settings.db_path)
model = args.model or settings.embedding_model
# Override the settings instance so ``_build_client`` sees the
# effective model when deciding whether to construct a real client.
settings = settings.model_copy(update={"embedding_model": model})
client = _build_client(settings)
with open_db(settings.db_path) as conn:
sql = (
"SELECT m.id, m.pov_summary FROM memories m "
"LEFT JOIN embeddings e ON e.memory_id = m.id "
"WHERE e.memory_id IS NULL "
"ORDER BY m.id"
)
if args.re_embed_all:
sql = "SELECT m.id, m.pov_summary FROM memories m ORDER BY m.id"
else:
sql = (
"SELECT m.id, m.pov_summary FROM memories m "
"LEFT JOIN embeddings e ON e.memory_id = m.id "
"WHERE e.memory_id IS NULL "
"ORDER BY m.id"
)
if args.limit is not None:
sql += f" LIMIT {int(args.limit)}"
rows = conn.execute(sql).fetchall()
print(f"Found {len(rows)} memories needing embeddings.")
mode = "re-embedding" if args.re_embed_all else "needing embeddings"
print(f"Found {len(rows)} memories {mode} (model={model}).")
if args.dry_run:
return
@@ -71,11 +131,12 @@ async def main() -> None:
skipped = 0
for memory_id, text in rows:
result = await generate_embedding(
client=None, # pseudo path: no client needed
client=client,
text=text or "",
model=model,
)
if result.model == FALLBACK_EMBEDDING_MODEL:
print(f" Skipping memory_id={memory_id} (empty text)")
print(f" Skipping memory_id={memory_id} (empty text or fallback)")
skipped += 1
continue
append_and_apply(
+383
View File
@@ -0,0 +1,383 @@
"""Structured test-fixture builder for ``MockLLMClient`` canned queues.
Phase 4.5 (T116) carry-over from Phase 3. The turn-flow tests in
``test_turn_flow.py``, ``test_meanwhile_turn_flow.py``,
``test_phase3_integration.py``, and ``test_phase4_integration.py`` used
to construct ``MockLLMClient`` canned-response queues as raw positional
lists of pre-encoded JSON strings. That worked, but every time a new
classifier call landed in a code path the tests had to be patched in
many places at the right index easy to mis-position, hard to read.
This module ships :class:`CannedQueue`, a fluent builder that lets a
test declare its classifier expectations by **name** and **order** of
call, not by index into a brittle list. Each method appends one item
to the queue and returns ``self`` for chaining; ``build()`` JSON-encodes
the items and produces the flat ``list[str]`` that
``MockLLMClient(canned=...)`` expects.
Usage
-----
>>> from tests.fixtures import CannedQueue
>>> from chat.llm.mock import MockLLMClient
>>> canned = (
... CannedQueue()
... .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
... .narrative("Hi there.")
... .state_update()
... .state_update()
... .build()
... )
>>> mock = MockLLMClient(canned=canned)
Each method maps to a single classifier (or stream) call that the turn
flow makes, in the order the production code makes them. Picking the
right method for the slot you need keeps the test readable and lets the
builder pin sensible defaults for the fields tests don't care about.
Migration template
------------------
To migrate a positional canned-array test:
1. Identify each slot in the existing array and what classifier it
feeds. Comments above the array often spell this out start there.
2. Replace each slot with the matching :class:`CannedQueue` method:
- ``json.dumps({"segments": [...]})`` ``.parse_turn(segments=...)``
- bare narrative string ``.narrative("...")``
- zero-state JSON ``.state_update()`` (defaults are zeros)
- ``json.dumps({"addressee_id": ...})`` ``.detect_addressee(...)``
- ``json.dumps({"should_interject": ...})`` ``.detect_interjection(...)``
- ``json.dumps({"should_close": ...})`` ``.detect_scene_close(...)``
- ``json.dumps({"transitions": [...]})`` ``.detect_event_transitions(...)``
- per-POV summary JSON ``.summarize_scene_pov(summary=...)``
3. End with ``.build()`` and pass that to
``MockLLMClient(canned=...)``. The mock's contract is unchanged.
Notes on streams
----------------
``MockLLMClient.stream`` and ``MockLLMClient.generate`` share one queue
each pop is one entry, regardless of whether the production code
streams the response or generates it whole. The narrative service
streams; classifier services generate. The builder treats both the same:
``narrative()`` appends a raw string, the classifier methods append
JSON-encoded dicts. Both end up in the same flat ``list[str]`` that the
mock pops from in order.
The remaining tests in the suite (about 30 across the four files
mentioned above) still use positional arrays Phase 5 work to migrate
the rest. New tests should prefer this builder.
"""
from __future__ import annotations
import json
from typing import Any
class CannedQueue:
"""Fluent builder for ``MockLLMClient`` canned-response queues.
Each method appends one item to an internal queue and returns
``self`` for chaining. ``build()`` returns the flat ``list[str]``
suitable for ``MockLLMClient(canned=...)``.
The queue holds either ``dict`` (JSON-encoded at ``build()`` time)
or ``str`` (passed through verbatim used for narrative streams).
"""
def __init__(self) -> None:
self._queue: list[Any] = []
# ------------------------------------------------------------------
# Narrative stream — bare string, no JSON wrapping.
# ------------------------------------------------------------------
def narrative(self, text: str) -> "CannedQueue":
"""Append one streaming narrative response.
``MockLLMClient.stream`` pops the next entry from the same queue
as ``generate`` a bare string is what the streaming bot beat
consumes. Use one ``narrative()`` per assistant beat (primary,
and optionally an interjection / second beat).
"""
self._queue.append(text)
return self
def raw(self, value: str) -> "CannedQueue":
"""Append a raw string (escape hatch for non-classifier calls).
Most tests should reach for the named helpers this is here
for one-offs the builder doesn't model yet.
"""
self._queue.append(value)
return self
# ------------------------------------------------------------------
# Turn parser — splits user prose into segments.
# ------------------------------------------------------------------
def parse_turn(
self,
*,
segments: list[dict] | None = None,
intent: str = "narrative",
landing_state_hint: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one ``parse_turn`` classifier response.
``intent`` defaults to ``"narrative"``; pass ``"skip_elision"``
or ``"skip_jump"`` to exercise the natural-language skip paths.
``landing_state_hint`` carries the residual descriptor for
elision skips and is otherwise ignored.
"""
payload: dict[str, Any] = {
"segments": segments if segments is not None else [],
"intent": intent,
"landing_state_hint": landing_state_hint,
}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Multi-entity addressee classifier (T74.1).
# ------------------------------------------------------------------
def detect_addressee(
self,
*,
addressee_id: str,
confidence: str = "medium",
reason: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one ``detect_addressee`` classifier response."""
payload: dict[str, Any] = {
"addressee_id": addressee_id,
"confidence": confidence,
"reason": reason,
}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# State-update — one per directed edge per turn.
# ------------------------------------------------------------------
def state_update(
self,
*,
affinity_delta: int = 0,
trust_delta: int = 0,
knowledge_facts: list | None = None,
**rest: Any,
) -> "CannedQueue":
"""Append one ``apply_state_update`` classifier response.
Defaults to a benign zero-delta payload tests that don't care
about state mutations can call this without arguments. One call
is required per directed edge that fires after the assistant
beat (e.g. single-bot non-guest turn = 2 calls; multi-bot guest
turn = 6 calls).
"""
payload: dict[str, Any] = {
"affinity_delta": affinity_delta,
"trust_delta": trust_delta,
"knowledge_facts": (
knowledge_facts if knowledge_facts is not None else []
),
}
payload.update(rest)
self._queue.append(payload)
return self
def zero_state(self) -> "CannedQueue":
"""Alias for ``state_update()`` with all defaults — matches the
``_zero_state()`` helper in existing tests.
"""
return self.state_update()
# ------------------------------------------------------------------
# Interjection (T74.2) — silent witness chimes in.
# ------------------------------------------------------------------
def detect_interjection(
self,
*,
should_interject: bool,
reason: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one ``detect_interjection`` classifier response."""
payload: dict[str, Any] = {
"should_interject": should_interject,
"reason": reason,
}
payload.update(rest)
self._queue.append(payload)
return self
def detect_interjection_targeted(
self,
*,
targeted: bool,
target_id: str | None = None,
reason: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one targeted-interjection classifier response."""
payload: dict[str, Any] = {
"targeted": targeted,
"target_id": target_id,
"reason": reason,
}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Scene-close detector (T26).
# ------------------------------------------------------------------
def detect_scene_close(
self,
*,
should_close: bool,
reason: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one ``detect_scene_close`` classifier response."""
payload: dict[str, Any] = {
"should_close": should_close,
"reason": reason,
}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Event lifecycle (T52, T61) — per-turn transitions.
# ------------------------------------------------------------------
def detect_event_transitions(
self,
transitions: list[dict] | None = None,
) -> "CannedQueue":
"""Append one ``detect_event_transitions`` classifier response.
``transitions`` is a list of ``{"event_id": ..., "new_status":
"active"|"completed"|"cancelled", "reason": ...}`` dicts. Pass
an empty list (or omit the argument) to assert that the call
ran but produced no transitions; pass ``None`` for an empty
list with the same shape.
Note: when no events are seeded, ``detect_event_transitions``
short-circuits without an LLM call in that case do NOT append
this slot.
"""
payload = {"transitions": transitions if transitions is not None else []}
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Per-POV scene summary (used after scene close).
# ------------------------------------------------------------------
def summarize_scene_pov(
self,
*,
summary: str,
knowledge_facts: list | None = None,
relationship_summary: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one per-POV scene-summary response.
Used by ``apply_scene_close_summary`` one call per witness
once a scene closes.
"""
payload: dict[str, Any] = {
"summary": summary,
"knowledge_facts": (
knowledge_facts if knowledge_facts is not None else []
),
"relationship_summary": relationship_summary,
}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Thread detection (Phase 3 §3.3).
# ------------------------------------------------------------------
def detect_threads(
self,
candidates: list[dict] | None = None,
) -> "CannedQueue":
"""Append one ``detect_threads`` classifier response.
``candidates`` is a list of ``{"action": "open"|"update",
"title": ..., "summary": ..., "existing_thread_id": ...}`` dicts.
"""
payload = {"candidates": candidates if candidates is not None else []}
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Meanwhile digest — narrative summary of what happened off-screen.
# ------------------------------------------------------------------
def meanwhile_digest(self, summary: str) -> "CannedQueue":
"""Append one meanwhile-digest narrative response.
The digest service streams the digest as plain text (not JSON)
so this is a thin wrapper over ``narrative``/``raw`` for
readability at the call site.
"""
self._queue.append(summary)
return self
# ------------------------------------------------------------------
# Significance scorer (background worker; rarely hit in unit tests
# but available for completeness).
# ------------------------------------------------------------------
def score_significance(
self,
*,
score: float = 0.0,
reason: str = "",
**rest: Any,
) -> "CannedQueue":
"""Append one significance-scoring classifier response."""
payload: dict[str, Any] = {"score": score, "reason": reason}
payload.update(rest)
self._queue.append(payload)
return self
# ------------------------------------------------------------------
# Build / introspection.
# ------------------------------------------------------------------
def build(self) -> list[str]:
"""Return the flat ``list[str]`` queue for ``MockLLMClient``.
Dict items are JSON-encoded; string items are passed through
verbatim (so streaming responses retain their raw form).
"""
out: list[str] = []
for item in self._queue:
if isinstance(item, str):
out.append(item)
else:
out.append(json.dumps(item))
return out
def __len__(self) -> int:
return len(self._queue)
+231
View File
@@ -0,0 +1,231 @@
"""Tests for the backfill_embeddings script (T112, Phase 4.5).
Phase 4 shipped a backfill that walked memories *without* an embedding
row and produced a vector for each (deterministic pseudo path). T112
adds a ``--re-embed-all`` flag that walks **every** memory regardless
of whether it already has an embeddings row, so operators can swap
embedding models and have the existing rows replaced (the
``embedding_indexed`` projector is INSERT OR REPLACE).
These tests exercise the script's ``main()`` directly via asyncio —
shell-out via subprocess would also work but importing keeps the
fixture surface small and the failure mode clearer.
"""
from __future__ import annotations
from pathlib import Path
from unittest.mock import patch
import pytest
from chat.db.connection import open_db
from chat.db.migrate import apply_migrations
from chat.eventlog.log import append_and_apply, append_event
from chat.eventlog.projector import project
from chat.services.embeddings import DEFAULT_EMBEDDING_MODEL
# Trigger handler registration for projection.
import chat.state.embeddings # noqa: F401
import chat.state.entities # noqa: F401
import chat.state.memory # noqa: F401
import chat.state.world # noqa: F401
import scripts.backfill_embeddings as backfill
def _seed(db_path: Path, count: int) -> list[int]:
"""Seed ``count`` memory rows for ``bot_a``; return their ids."""
with open_db(db_path) as conn:
append_event(
conn,
kind="bot_authored",
payload={
"id": "bot_a",
"name": "BotA",
"persona": "...",
"voice_samples": [],
"traits": [],
"backstory": "",
"initial_relationship_to_you": "",
"kickoff_prose": "",
},
)
append_event(
conn,
kind="chat_created",
payload={
"id": "chat_bot_a",
"host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1",
"weather": "",
},
)
for i in range(count):
append_event(
conn,
kind="memory_written",
payload={
"owner_id": "bot_a",
"chat_id": "chat_bot_a",
"pov_summary": f"memory text {i}",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
"source": "direct",
"reliability": 1.0,
"significance": 1,
"pinned": 0,
"auto_pinned": 0,
},
)
project(conn)
return [
r[0]
for r in conn.execute(
"SELECT id FROM memories WHERE owner_id = 'bot_a' ORDER BY id"
).fetchall()
]
def _seed_embedding(db_path: Path, memory_id: int, model: str = "stale-model") -> None:
"""Insert a stale ``embedding_indexed`` event so the row already
exists in ``embeddings`` (and the default backfill would skip it)."""
with open_db(db_path) as conn:
append_and_apply(
conn,
kind="embedding_indexed",
payload={
"memory_id": memory_id,
"model": model,
"dim": 3,
"vector": [0.0, 0.0, 0.0],
},
)
@pytest.mark.asyncio
async def test_re_embed_all_walks_every_memory(tmp_path, monkeypatch, capsys):
"""``--re-embed-all`` re-embeds memories that already have rows in
``embeddings`` (default mode skips them). After the run, every
memory should have an updated embedding tagged with the configured
model (the projector replaces stale rows in place)."""
db = tmp_path / "t.db"
apply_migrations(db)
memory_ids = _seed(db, count=3)
# Pre-seed stale embeddings on two of the three memories so the
# default path would skip them and only ``--re-embed-all`` covers
# everything.
_seed_embedding(db, memory_ids[0])
_seed_embedding(db, memory_ids[1])
cfg = tmp_path / "config.toml"
cfg.write_text(
f'featherless_api_key = "x"\n'
f'db_path = "{db}"\n'
f'data_dir = "{tmp_path}"\n'
)
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db))
with patch("sys.argv", ["backfill_embeddings.py", "--re-embed-all"]):
await backfill.main()
# All three memories now have a fresh embedding tagged with the
# default pseudo model (replacing the stale rows).
with open_db(db) as conn:
rows = conn.execute(
"SELECT memory_id, model FROM embeddings ORDER BY memory_id"
).fetchall()
assert len(rows) == 3
for mid, model in rows:
assert mid in memory_ids
assert model == DEFAULT_EMBEDDING_MODEL
@pytest.mark.asyncio
async def test_default_backfill_only_walks_missing(tmp_path, monkeypatch):
"""Without ``--re-embed-all``, the script keeps the Phase 4
behavior memories with an existing embedding row are left
alone (their stale-model tag survives)."""
db = tmp_path / "t.db"
apply_migrations(db)
memory_ids = _seed(db, count=2)
_seed_embedding(db, memory_ids[0], model="stale-model")
# memory_ids[1] has no embedding yet.
cfg = tmp_path / "config.toml"
cfg.write_text(
f'featherless_api_key = "x"\n'
f'db_path = "{db}"\n'
f'data_dir = "{tmp_path}"\n'
)
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db))
with patch("sys.argv", ["backfill_embeddings.py"]):
await backfill.main()
with open_db(db) as conn:
rows = dict(
conn.execute(
"SELECT memory_id, model FROM embeddings ORDER BY memory_id"
).fetchall()
)
# Stale row preserved; only the missing one was filled.
assert rows[memory_ids[0]] == "stale-model"
assert rows[memory_ids[1]] == DEFAULT_EMBEDDING_MODEL
@pytest.mark.asyncio
async def test_re_embed_all_respects_model_arg(tmp_path, monkeypatch):
"""The ``--model`` flag overrides ``Settings.embedding_model``.
With a non-default model and a client that returns canned vectors,
every memory is re-embedded with the supplied model tag."""
db = tmp_path / "t.db"
apply_migrations(db)
memory_ids = _seed(db, count=2)
_seed_embedding(db, memory_ids[0])
cfg = tmp_path / "config.toml"
cfg.write_text(
f'featherless_api_key = "x"\n'
f'db_path = "{db}"\n'
f'data_dir = "{tmp_path}"\n'
)
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db))
# Patch the client factory the script uses to produce a Mock with
# canned embeddings — one per memory.
from chat.llm.mock import MockLLMClient
canned_vec = [0.1] * 384
def _factory(_settings):
return MockLLMClient(
canned=[],
canned_embeddings=[list(canned_vec) for _ in memory_ids],
)
monkeypatch.setattr(backfill, "_build_client", _factory)
with patch(
"sys.argv",
[
"backfill_embeddings.py",
"--re-embed-all",
"--model",
"bge-small-en-v1.5",
],
):
await backfill.main()
with open_db(db) as conn:
rows = conn.execute(
"SELECT memory_id, model FROM embeddings ORDER BY memory_id"
).fetchall()
assert len(rows) == 2
for _, model in rows:
assert model == "bge-small-en-v1.5"
+122 -1
View File
@@ -1,11 +1,19 @@
from __future__ import annotations
import logging
from chat.db.connection import open_db
from chat.db.migrate import apply_migrations
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
import chat.state.branches # registers handlers
from chat.state.branches import active_branch, get_branch, list_branches
from chat.state.branches import (
_NO_HEAD_CLAMP,
active_branch,
active_branch_event_ids,
get_branch,
list_branches,
)
def test_main_branch_bootstrapped_by_migration(tmp_path):
@@ -139,3 +147,116 @@ def test_list_branches_returns_all(tmp_path):
names = [b["name"] for b in list_branches(conn)]
assert "main" in names
assert "experiment" in names
def test_branch_switched_unknown_name_warns(tmp_path, caplog):
"""Switching to a nonexistent branch logs a warning and leaves no branch active.
The previous behavior silently cleared is_active flags and applied no UPDATE
when the named branch did not exist. T103 makes that condition observable
by emitting a warning while preserving the existing (zero-active) outcome.
"""
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
with caplog.at_level(logging.WARNING, logger="chat.state.branches"):
append_event(
conn,
kind="branch_switched",
payload={"name": "does_not_exist"},
)
project(conn)
# A warning was emitted naming the missing branch.
warnings = [
r for r in caplog.records
if r.levelno == logging.WARNING and r.name == "chat.state.branches"
]
assert warnings, "expected a warning for unknown branch name"
assert any("does_not_exist" in r.getMessage() for r in warnings)
# Existing behavior preserved: no branch is active after the switch.
assert active_branch(conn) is None
# The unknown name was not inserted as a side effect.
assert get_branch(conn, "does_not_exist") is None
def test_active_branch_event_ids_bootstrap_main_returns_no_clamp(tmp_path):
"""Bootstrap "main" (origin=0, head=0) reads as the no-clamp sentinel.
Migration 0013 seeds main with both event-id columns at 0; production
today never emits ``branch_head_updated`` for main, so head stays at 0
even as events accumulate. The helper treats this exact bootstrap
state as "all events visible" (lower bound 0, upper bound BIG_INT) so
every existing reader stays branch-agnostic until a non-main branch
becomes active.
"""
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
origin, head = active_branch_event_ids(conn)
assert origin == 0
assert head == _NO_HEAD_CLAMP
def test_active_branch_event_ids_no_active_branch_falls_through(tmp_path):
"""No active branch row at all → defensive ``(0, BIG_INT)``.
A switch to an unknown branch leaves zero rows with ``is_active=1``;
``active_branch`` returns None. The helper must still hand readers a
workable range (the full log) so the read pipeline doesn't crash on
an inconsistent metadata state.
"""
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Switching to a nonexistent branch clears is_active flags
# without setting any other branch active.
append_event(
conn,
kind="branch_switched",
payload={"name": "does_not_exist"},
)
project(conn)
assert active_branch(conn) is None
origin, head = active_branch_event_ids(conn)
assert origin == 0
assert head == _NO_HEAD_CLAMP
def test_active_branch_event_ids_returns_actual_range_for_non_main(tmp_path):
"""Non-main branches return their literal ``(origin, head)`` window.
A branch created at origin=10 + bumped to head=20 must surface as
(10, 20) so readers' ``BETWEEN`` clamp scopes to that window.
"""
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
append_event(
conn,
kind="branch_created",
payload={
"name": "experiment",
"origin_event_id": 10,
"head_event_id": 10,
"chat_id": "c1",
},
)
append_event(
conn,
kind="branch_head_updated",
payload={"name": "experiment", "head_event_id": 20},
)
append_event(
conn,
kind="branch_switched",
payload={"name": "experiment"},
)
project(conn)
origin, head = active_branch_event_ids(conn)
assert origin == 10
assert head == 20
+276
View File
@@ -129,3 +129,279 @@ def test_list_branches_with_metadata_includes_event_count(tmp_path):
assert rows["exp"]["origin_event_id"] == 10
assert rows["exp"]["head_event_id"] == 15
assert rows["exp"]["event_count"] == 6
# ---------------------------------------------------------------------------
# T113 read-side filter — cross-feature tests.
# ---------------------------------------------------------------------------
#
# These exercise the active-branch event-id clamp through every reader
# the spec called out: ``read_recent_dialogue`` (turn_common),
# ``_read_recent_dialogue`` (scene_summarize), and ``search_memories``
# (memory). They drive the readers via real event-log inserts + branch
# switches so the integration is end-to-end.
def _seed_user_turn(conn, chat_id: str, prose: str) -> int:
return append_and_apply(
conn,
kind="user_turn",
payload={"chat_id": chat_id, "prose": prose, "segments": []},
)
def test_read_recent_dialogue_respects_active_branch_head(tmp_path):
"""T113 spec test 1: dialogue reader clamps to active branch head.
Seed 10 user turns; create a branch with origin=1 + head=5 and switch
to it; assert ``read_recent_dialogue`` only returns the first 5
turns. (The 5 events with id 6..10 fall outside ``[1, 5]``.)
"""
from chat.services.turn_common import read_recent_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(10)]
# 5 events visible after the switch.
branch_from_event(
conn, name="halfway", origin_event_id=ids[0], chat_id="c1"
)
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "halfway", "head_event_id": ids[4]},
)
switch_active_branch(conn, name="halfway")
rows = read_recent_dialogue(conn, "c1")
# The reader returns oldest-first, so the visible-set is the
# first 5 turns.
assert len(rows) == 5
assert [r["text"] for r in rows] == [f"turn {i}" for i in range(5)]
def test_search_memories_respects_active_branch_head(tmp_path):
"""T113 spec test 2: memory search clamps to active branch head via
``memories.event_id``. Memories whose projecting event lands outside
the clamp drop out of FTS results."""
from chat.eventlog.log import append_and_apply as _aa
from chat.state.memory import search_memories
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Two memories projected from real events. The projector handler
# stamps memories.event_id from the projecting event's id.
ev_a = _aa(
conn,
kind="memory_written",
payload={
"owner_id": "host_bot",
"chat_id": "c1",
"scene_id": 1,
"pov_summary": "alpha keyword present",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
},
)
ev_b = _aa(
conn,
kind="memory_written",
payload={
"owner_id": "host_bot",
"chat_id": "c1",
"scene_id": 1,
"pov_summary": "alpha keyword present too",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
},
)
# Branch clamps to ev_a only (head = ev_a; ev_b sits past head).
branch_from_event(
conn, name="early", origin_event_id=ev_a, chat_id="c1"
)
switch_active_branch(conn, name="early")
results = search_memories(conn, "host_bot", "host", "alpha")
# Only the first memory should surface — the second's event_id
# exceeds the active branch head.
ids = [r["event_id"] for r in results]
assert ev_a in ids
assert ev_b not in ids
def test_branch_switch_changes_visible_events(tmp_path):
"""T113 spec test 3: switching branches mid-flight changes the read
immediately. ``read_recent_dialogue`` re-queries on every call."""
from chat.services.turn_common import read_recent_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)]
branch_from_event(
conn, name="early", origin_event_id=ids[0], chat_id="c1"
)
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "early", "head_event_id": ids[2]},
)
branch_from_event(
conn, name="late", origin_event_id=ids[3], chat_id="c1"
)
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "late", "head_event_id": ids[5]},
)
switch_active_branch(conn, name="early")
early_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")]
assert early_rows == ["turn 0", "turn 1", "turn 2"]
switch_active_branch(conn, name="late")
late_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")]
assert late_rows == ["turn 3", "turn 4", "turn 5"]
def test_main_branch_with_head_zero_returns_empty(tmp_path):
"""T113 spec test 4: a non-main branch with head=0 returns empty.
The bootstrap-main sentinel only fires for ``name=="main", origin=0,
head=0``. A different branch parked at ``origin=0, head=0`` is not a
sentinel and the ``BETWEEN 0 AND 0`` clamp filters out every real
event_log row (rowids start at 1)."""
from chat.services.turn_common import read_recent_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Need a real event_log row id 1+ so the clamp's "exclude 0" actually
# has something to exclude — otherwise we trivially return [].
_seed_user_turn(conn, "c1", "turn 0")
# Force-create a branch at origin=0, head=0 (NOT main). This is an
# artificial state — production never produces it — but it's the
# cleanest way to drive the documented edge case.
append_and_apply(
conn,
kind="branch_created",
payload={
"name": "stub",
"origin_event_id": 0,
"head_event_id": 0,
"chat_id": "c1",
},
)
switch_active_branch(conn, name="stub")
rows = read_recent_dialogue(conn, "c1")
assert rows == []
def test_no_active_branch_falls_through_to_all_events(tmp_path):
"""T113 spec test 5: with no active branch (e.g. a switch to an
unknown name cleared all is_active flags), readers see the full log
via the ``(0, BIG_INT)`` defensive default."""
from chat.services.turn_common import read_recent_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
for i in range(3):
_seed_user_turn(conn, "c1", f"turn {i}")
# Switching to an unknown branch leaves zero rows with is_active=1.
append_and_apply(
conn,
kind="branch_switched",
payload={"name": "missing"},
)
from chat.state.branches import active_branch as _ab
assert _ab(conn) is None
rows = read_recent_dialogue(conn, "c1")
assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"]
def test_scene_summarize_read_recent_dialogue_respects_branch(tmp_path):
"""T113: ``scene_summarize._read_recent_dialogue`` (the scene-close
summary input) also clamps to the active branch range."""
from chat.services.scene_summarize import _read_recent_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)]
branch_from_event(
conn, name="early", origin_event_id=ids[0], chat_id="c1"
)
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "early", "head_event_id": ids[2]},
)
switch_active_branch(conn, name="early")
rows = _read_recent_dialogue(conn, "c1")
assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"]
def test_meanwhile_dialogue_reader_respects_branch(tmp_path):
"""T113: meanwhile prompt-context reader also clamps to the active
branch. The meanwhile reader filters by ``meanwhile_scene_id``; the
branch filter is composed on top of that filter."""
from chat.web.meanwhile import _read_recent_meanwhile_dialogue
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Seed user turns + meanwhile assistant turns interleaved so the
# branch-id clamp lands across both kinds.
u1 = _seed_user_turn(conn, "c1", "u1")
a1 = append_and_apply(
conn,
kind="assistant_turn",
payload={
"chat_id": "c1",
"speaker_id": "host",
"text": "a1",
"meanwhile_scene_id": 7,
},
)
# Past-head turn should NOT appear once we switch to ``early``.
a2 = append_and_apply(
conn,
kind="assistant_turn",
payload={
"chat_id": "c1",
"speaker_id": "guest",
"text": "a2",
"meanwhile_scene_id": 7,
},
)
branch_from_event(
conn, name="early", origin_event_id=u1, chat_id="c1"
)
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "early", "head_event_id": a1},
)
switch_active_branch(conn, name="early")
rows = _read_recent_meanwhile_dialogue(conn, "c1", scene_id=7)
texts = [r["text"] for r in rows]
assert "a1" in texts
assert "a2" not in texts
# Suppress the "unused" linter warning while keeping the binding
# readable for the test narrative.
_ = a2
+22
View File
@@ -24,3 +24,25 @@ def test_chat_db_path_env_overrides_default(tmp_path, monkeypatch):
(tmp_path / "config.toml").write_text('featherless_api_key = "x"\n')
s = load_settings()
assert s.db_path == tmp_path / "alt.db"
def test_embedding_model_defaults_to_pseudo(tmp_path, monkeypatch):
"""T112: ``embedding_model`` defaults to the deterministic pseudo
so existing zero-config installs keep the Phase 4 behavior."""
monkeypatch.setenv("CHAT_CONFIG_PATH", str(tmp_path / "config.toml"))
(tmp_path / "config.toml").write_text('featherless_api_key = "x"\n')
s = load_settings()
assert s.embedding_model == "pseudo-sha256-384"
def test_embedding_model_overridable_via_toml(tmp_path, monkeypatch):
"""T112: operators swap the embedding model by editing config.toml.
The new value flows through to the embedding worker at startup."""
cfg = tmp_path / "config.toml"
cfg.write_text(
'featherless_api_key = "x"\n'
'embedding_model = "bge-small-en-v1.5"\n'
)
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
s = load_settings()
assert s.embedding_model == "bge-small-en-v1.5"
+177
View File
@@ -458,6 +458,183 @@ def test_t98_4_delete_invokes_rewind_and_drops_cascade(client, tmp_path):
assert row is None, f"event {ev_id} should have been deleted"
def test_delete_impact_modal_uses_jinja_partial(client, tmp_path):
"""T110.3: the modal HTML is rendered from a Jinja partial
(`_delete_impact_modal.html`) rather than f-string concatenation in
Python. Verify the partial-rendered shape: the wrapping
``delete-impact-modal`` div, the cascade list, and the confirm form.
The partial inherits Jinja2 autoescape so HTML safety follows
automatically the explicit ``html.escape()`` calls from T110.2
become redundant once this lands.
"""
db = tmp_path / "test.db"
_seed_chat(db)
user_id, _bot_id = _seed_turns(db)
response = client.get(
f"/chats/chat_bot_a/drawer/turn/delete-preview/{user_id}"
)
assert response.status_code == 200
body = response.text
# Markup shape that the partial produces. Double-quoted attributes
# signal Jinja rendering (the prior f-string used single quotes).
assert '<div class="delete-impact-modal">' in body
assert '<ul class="delete-impact-cascade">' in body
# The confirm form still posts to the same delete route.
assert f"/chats/chat_bot_a/drawer/turn/delete/{user_id}" in body
assert "Confirm delete" in body
def test_delete_impact_modal_escapes_user_controllable_strings(client, tmp_path):
"""T110.2: defense-in-depth — fields embedded in the modal HTML come
from event payloads (turn prose, scene timestamps, etc.) which are
ultimately user-controllable. Wrap them with ``html.escape`` so a
payload like ``<script>alert(1)</script>`` renders as inert text and
doesn't leak through into the rendered modal as actual markup.
"""
db = tmp_path / "test.db"
_seed_chat(db)
# Seed a user_turn whose prose contains an HTML-script payload. The
# modal renders ``description = "turn N (you: <prose excerpt>)"`` so
# the prose flows verbatim into the cascade list <li>.
with open_db(db) as conn:
evil_id = append_and_apply(
conn,
kind="user_turn",
payload={
"chat_id": "chat_bot_a",
"prose": "<script>alert('xss')</script>",
"segments": [],
},
)
response = client.get(
f"/chats/chat_bot_a/drawer/turn/delete-preview/{evil_id}"
)
assert response.status_code == 200
body = response.text
# Raw <script> must NOT survive into the rendered HTML. The escaped
# form (&lt;script&gt;) is what we want to see instead.
assert "<script>alert" not in body
assert "&lt;script&gt;alert" in body
def test_bulk_significance_re_rate_emits_manual_edit_per_memory(client, tmp_path):
"""T110.4: bulk significance re-rate fans out into one
``manual_edit`` event per matching memory preserving the per-row
audit trail (and reversibility) instead of collapsing everything
into a single bulk event.
Seed five memories at significance 0, bulk re-rate 0 -> 2, and
verify five new ``memory_significance`` ``manual_edit`` rows landed
AND every memory now sits at significance 2.
"""
db = tmp_path / "test.db"
_seed_chat(db)
# Five memories at significance 0.
with open_db(db) as conn:
for i in range(5):
append_and_apply(
conn,
kind="memory_written",
payload={
"owner_id": "bot_a",
"chat_id": "chat_bot_a",
"pov_summary": f"low-sig memory {i}",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
"significance": 0,
},
)
# Plus one memory at significance 1 to verify the re-rate is
# scoped to ``level_from`` and doesn't sweep the whole chat.
append_and_apply(
conn,
kind="memory_written",
payload={
"owner_id": "bot_a",
"chat_id": "chat_bot_a",
"pov_summary": "already-rated memory",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
"significance": 1,
},
)
prior_manual_edits = conn.execute(
"SELECT COUNT(*) FROM event_log WHERE kind = 'manual_edit'"
).fetchone()[0]
response = client.post(
"/chats/chat_bot_a/drawer/memory/significance/bulk",
data={"level_from": "0", "level_to": "2"},
)
assert response.status_code == 200
with open_db(db) as conn:
# Five new manual_edit rows, one per matching memory.
new_manual_edits = conn.execute(
"SELECT COUNT(*) FROM event_log WHERE kind = 'manual_edit'"
).fetchone()[0]
assert new_manual_edits - prior_manual_edits == 5
# Every emitted edit is a memory_significance edit with prior=0
# and new=2.
import json as _json
rows = conn.execute(
"SELECT payload_json FROM event_log "
"WHERE kind = 'manual_edit' "
"ORDER BY id DESC LIMIT 5"
).fetchall()
for r in rows:
payload = _json.loads(r[0])
assert payload["target_kind"] == "memory_significance"
assert payload["prior_value"] == 0
assert payload["new_value"] == 2
# Projection caught up — five memories at sig=2, the untouched
# one stays at sig=1, none remain at sig=0.
dist = dict(
conn.execute(
"SELECT significance, COUNT(*) FROM memories "
"WHERE chat_id = 'chat_bot_a' GROUP BY significance"
).fetchall()
)
assert dist.get(0, 0) == 0
assert dist.get(1, 0) == 1
assert dist.get(2, 0) == 5
def test_delete_turn_with_event_id_zero_returns_400(client, tmp_path):
"""T110.1: ``event_id <= 0`` is an obvious client error and must NOT
silently rewind the entire log via ``after_event_id = -1``. The route
rejects it with 400 so the audit trail stays intact.
"""
db = tmp_path / "test.db"
_seed_chat(db)
_seed_turns(db)
# Sanity: events present before the bad request.
with open_db(db) as conn:
before = conn.execute("SELECT COUNT(*) FROM event_log").fetchone()[0]
assert before > 0
response = client.post("/chats/chat_bot_a/drawer/turn/delete/0")
assert response.status_code == 400
# And the log was NOT truncated.
with open_db(db) as conn:
after = conn.execute("SELECT COUNT(*) FROM event_log").fetchone()[0]
assert after == before
# ---------------------------------------------------------------------------
# T98.5 — remaining v1 edits (chat narrative anchor + weather).
# ---------------------------------------------------------------------------
+79
View File
@@ -20,6 +20,7 @@ The pseudo path doesn't touch the LLMClient, so we pass an empty
from __future__ import annotations
import logging
import math
import pytest
@@ -89,3 +90,81 @@ async def test_generate_embedding_unit_normalized():
result = await generate_embedding(_client(), text="some non-empty text")
norm_sq = sum(x * x for x in result.vector)
assert math.isclose(norm_sq, 1.0, abs_tol=1e-6)
@pytest.mark.asyncio
async def test_generate_embedding_non_default_model_logs_warning(caplog):
"""T107: non-default model falls through to fallback and must warn.
A Phase 4.5+ caller pointing at a real model that isn't yet wired
up would otherwise silently degrade (zero vector useless cosine).
The warning surfaces the misconfiguration in logs.
"""
caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
result = await generate_embedding(_client(), text="hello", model="real-model")
# Behavior unchanged: still returns the fallback sentinel.
assert result.model == FALLBACK_EMBEDDING_MODEL == "fallback"
assert all(x == 0.0 for x in result.vector)
# Warning fired and names the offending model.
warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
assert any("non-default model" in r.getMessage() for r in warnings)
assert any("real-model" in r.getMessage() for r in warnings)
@pytest.mark.asyncio
async def test_generate_embedding_default_model_does_not_warn(caplog):
"""T107: the silent default path must stay silent."""
caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
await generate_embedding(_client(), text="hello")
warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
assert warnings == []
@pytest.mark.asyncio
async def test_embed_routes_to_client_when_non_default_model():
"""T112: when a non-default ``model`` is requested, generate_embedding
routes through ``client.embed(text, model=...)`` and wraps the
returned vector in an EmbeddingResult tagged with the requested
model (NOT the fallback sentinel)."""
canned = [0.1, 0.2, 0.3, 0.4]
client = MockLLMClient(canned=[], canned_embeddings=[canned])
result = await generate_embedding(
client, text="hello world", model="bge-small-en-v1.5"
)
assert result.vector == canned
assert result.model == "bge-small-en-v1.5"
assert result.dim == len(canned)
@pytest.mark.asyncio
async def test_embed_falls_back_on_client_failure(caplog):
"""T112: when ``client.embed`` raises (e.g. NotImplementedError on
Featherless, or a transient network error), generate_embedding logs
the existing T107 warning and returns the zero-vector fallback so
callers detect the sentinel and skip indexing."""
class _FailingClient:
async def generate(self, messages, *, model, **params): # pragma: no cover
raise AssertionError("generate must not be called")
def stream(self, messages, *, model, **params): # pragma: no cover
raise AssertionError("stream must not be called")
async def embed(self, text, *, model):
raise NotImplementedError("provider does not expose embeddings")
caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
result = await generate_embedding(
_FailingClient(), text="hello", model="bge-small-en-v1.5"
)
assert result.model == FALLBACK_EMBEDDING_MODEL == "fallback"
assert len(result.vector) == DEFAULT_EMBEDDING_DIM
assert all(x == 0.0 for x in result.vector)
# Existing T107 warning fires (re-used from the new exception branch).
warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
assert any("bge-small-en-v1.5" in r.getMessage() for r in warnings)
+88
View File
@@ -233,3 +233,91 @@ def test_list_active_events_filters_to_planned_and_active(tmp_path):
cancelled = list_events_in_status(conn, "chat_bot_a", "cancelled")
assert [e["event_id"] for e in cancelled] == ["evt_canx"]
def test_event_status_reverted_returns_to_prior_status(tmp_path):
"""T114.2: ``event_status_reverted`` rolls a row back to ``prior_status``.
Unlike the forward transitions, this projector handler is
unconditional its sole purpose is to undo a transition, including
reverting from a terminal status (completed/cancelled) back to a
non-terminal one.
Three round-trips covered:
- completed active (rollback of an event_completed)
- active planned (rollback of an event_started)
- cancelled active (rollback of an event_cancelled)
"""
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
_seed_chat(conn)
append_event(
conn,
kind="event_planned",
payload={
"event_id": "evt_revert",
"chat_id": "chat_bot_a",
"kind": "date_at_park",
"props": {},
"planned_for": "2026-04-30T18:00:00+00:00",
},
)
append_event(
conn,
kind="event_started",
payload={
"event_id": "evt_revert",
"started_at": "2026-04-30T18:01:00+00:00",
},
)
append_event(
conn,
kind="event_completed",
payload={
"event_id": "evt_revert",
"completed_at": "2026-04-30T20:00:00+00:00",
},
)
project(conn)
ev = get_event(conn, "evt_revert")
assert ev is not None
assert ev["status"] == "completed"
# Revert from completed → active.
append_and_apply(
conn,
kind="event_status_reverted",
payload={"event_id": "evt_revert", "prior_status": "active"},
)
ev = get_event(conn, "evt_revert")
assert ev["status"] == "active"
# Revert from active → planned.
append_and_apply(
conn,
kind="event_status_reverted",
payload={"event_id": "evt_revert", "prior_status": "planned"},
)
ev = get_event(conn, "evt_revert")
assert ev["status"] == "planned"
# Forward to cancelled, then revert from cancelled → active.
append_and_apply(
conn,
kind="event_cancelled",
payload={
"event_id": "evt_revert",
"completed_at": "2026-04-30T20:30:00+00:00",
},
)
ev = get_event(conn, "evt_revert")
assert ev["status"] == "cancelled"
append_and_apply(
conn,
kind="event_status_reverted",
payload={"event_id": "evt_revert", "prior_status": "active"},
)
ev = get_event(conn, "evt_revert")
assert ev["status"] == "active"
+32
View File
@@ -0,0 +1,32 @@
"""Tests for FeatherlessClient (Phase 4.5+).
Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112).
Featherless does not expose an OpenAI-compatible ``/v1/embeddings``
endpoint, so its implementation deliberately raises
``NotImplementedError`` to surface the gap clearly. The
``generate_embedding`` wrapper catches this and degrades to the
zero-vector fallback (the existing T107 warning path).
If/when Featherless ships embeddings, swap the body for a real call to
``/v1/embeddings`` and update this test to mock the HTTP layer.
"""
from __future__ import annotations
import pytest
from chat.llm.featherless import FeatherlessClient
@pytest.mark.asyncio
async def test_featherless_embed_raises_not_implemented():
"""Featherless does not expose ``/v1/embeddings`` — embed() must
raise ``NotImplementedError`` so callers (``generate_embedding``)
can degrade to the fallback zero vector + warning rather than
silently producing useless output."""
client = FeatherlessClient(api_key="test-key")
with pytest.raises(NotImplementedError) as excinfo:
await client.embed("hello world", model="bge-small-en-v1.5")
# Message should hint at the cause so operators see why their
# real-model swap fell back.
assert "embeddings" in str(excinfo.value).lower()
+140
View File
@@ -0,0 +1,140 @@
"""Sanity tests for :mod:`tests.fixtures` — the structured CannedQueue
builder for ``MockLLMClient`` (T116).
The builder is a thin shaping layer over JSON dicts; these tests pin
the JSON shapes and the ``MockLLMClient`` round-trip so nothing
silently regresses if a default field name or shape gets renamed.
"""
from __future__ import annotations
import json
import pytest
from chat.llm.mock import MockLLMClient
from tests.fixtures import CannedQueue
def test_canned_queue_build_emits_expected_shapes():
"""Each builder method emits the JSON shape its classifier consumer
expects. The narrative slot is a bare string (stream).
"""
canned = (
CannedQueue()
.parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
.detect_addressee(addressee_id="bot_a", reason="host")
.narrative("Hi there.")
.state_update()
.state_update(affinity_delta=1, trust_delta=2)
.detect_interjection(should_interject=False, reason="calm")
.detect_event_transitions(
[{"event_id": "evt_1", "new_status": "active", "reason": "they arrived"}]
)
.detect_scene_close(should_close=False, reason="no signal")
.summarize_scene_pov(summary="BotA noticed the day winding down.")
.detect_threads(
[
{
"action": "open",
"title": "Maya's job hunt",
"summary": "Maya is looking for a new job",
"existing_thread_id": None,
}
]
)
.build()
)
# All slots are strings (the MockLLMClient pops strings).
assert all(isinstance(slot, str) for slot in canned)
assert len(canned) == 10
# Slot 0: parse_turn — defaults intent="narrative".
parse = json.loads(canned[0])
assert parse["segments"] == [{"kind": "dialogue", "text": "hello"}]
assert parse["intent"] == "narrative"
assert parse["landing_state_hint"] == ""
# Slot 1: detect_addressee.
addr = json.loads(canned[1])
assert addr["addressee_id"] == "bot_a"
assert addr["confidence"] == "medium"
assert addr["reason"] == "host"
# Slot 2: narrative — bare string, NOT JSON.
assert canned[2] == "Hi there."
with pytest.raises(json.JSONDecodeError):
json.loads(canned[2])
# Slot 3: state_update with all defaults — zero deltas, no facts.
su0 = json.loads(canned[3])
assert su0 == {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
# Slot 4: state_update with custom deltas.
su1 = json.loads(canned[4])
assert su1["affinity_delta"] == 1
assert su1["trust_delta"] == 2
assert su1["knowledge_facts"] == []
# Slot 5: detect_interjection.
interj = json.loads(canned[5])
assert interj == {"should_interject": False, "reason": "calm"}
# Slot 6: detect_event_transitions.
transitions = json.loads(canned[6])
assert transitions["transitions"][0]["event_id"] == "evt_1"
assert transitions["transitions"][0]["new_status"] == "active"
# Slot 7: detect_scene_close.
close = json.loads(canned[7])
assert close == {"should_close": False, "reason": "no signal"}
# Slot 8: summarize_scene_pov.
pov = json.loads(canned[8])
assert pov["summary"] == "BotA noticed the day winding down."
assert pov["knowledge_facts"] == []
assert pov["relationship_summary"] == ""
# Slot 9: detect_threads.
threads = json.loads(canned[9])
assert threads["candidates"][0]["action"] == "open"
assert threads["candidates"][0]["title"] == "Maya's job hunt"
@pytest.mark.asyncio
async def test_canned_queue_round_trips_through_mock_llm_client():
"""Building a queue and feeding it to ``MockLLMClient`` produces the
same items back via ``generate`` (in order). This is the contract
every migrated test relies on.
"""
canned = (
CannedQueue()
.parse_turn(segments=[{"kind": "dialogue", "text": "hi"}])
.narrative("Hello back.")
.state_update()
.build()
)
mock = MockLLMClient(canned=canned)
# generate() pops from the front.
parse_str = await mock.generate([], model="x")
assert json.loads(parse_str)["segments"] == [
{"kind": "dialogue", "text": "hi"}
]
# The narrative slot is a raw string — generate returns it as-is.
narr_str = await mock.generate([], model="x")
assert narr_str == "Hello back."
# The state_update slot has zero-delta defaults.
su_str = await mock.generate([], model="x")
assert json.loads(su_str) == {
"affinity_delta": 0,
"trust_delta": 0,
"knowledge_facts": [],
}
# Queue fully drained.
with pytest.raises(IndexError):
await mock.generate([], model="x")
+25
View File
@@ -19,3 +19,28 @@ async def test_mock_streams_tokens():
async for chunk in client.stream(msgs, model="any"):
chunks.append(chunk)
assert "".join(chunks) == "abcd"
@pytest.mark.asyncio
async def test_mock_llm_client_embed_pops_canned():
"""T112: MockLLMClient.embed() pops a canned vector from the front
of ``canned_embeddings`` (mirrors the existing ``canned`` queue
pattern for generate/stream)."""
v1 = [0.1, 0.2, 0.3]
v2 = [0.4, 0.5, 0.6]
client = MockLLMClient(canned=[], canned_embeddings=[v1, v2])
out1 = await client.embed("first", model="bge-small-en-v1.5")
out2 = await client.embed("second", model="bge-small-en-v1.5")
assert out1 == v1
assert out2 == v2
@pytest.mark.asyncio
async def test_mock_llm_client_embed_empty_queue_raises():
"""When the canned_embeddings queue is empty, ``embed`` must raise
a clear failure (IndexError) so misconfigured tests don't silently
return None or hang."""
client = MockLLMClient(canned=[])
with pytest.raises(IndexError):
await client.embed("text", model="any")
+56
View File
@@ -586,3 +586,59 @@ def test_record_turn_memory_enqueues_embedding_job(tmp_path):
assert {job.memory_id for job in captured} == expected_ids
for job in captured:
assert job.text == "Both bots witness this beat."
# ---------------------------------------------------------------------------
# T109: memories.event_id deep-link column populated by the projector.
# ---------------------------------------------------------------------------
def test_memory_written_populates_event_id(tmp_path):
"""Schema 0014 added ``memories.event_id`` referencing ``event_log.id``.
The ``memory_written`` projector handler must populate the column with
the projecting event's id so T111 can deep-link cross-chat search hits
back to the originating turn.
"""
db = tmp_path / "t.db"
apply_migrations(db)
_seed_minimal(db)
with open_db(db) as conn:
result = record_turn_memory_for_present(
conn,
chat_id="chat_bot_a",
host_bot_id="bot_a",
guest_bot_id=None,
narrative_text="BotA shrugs.",
)
eid, mid = result["bot_a"]
assert eid > 0 and mid is not None
row = conn.execute(
"SELECT event_id FROM memories WHERE id = ?", (mid,)
).fetchone()
assert row is not None
assert row[0] == eid
def test_memory_event_id_column_is_nullable_for_backfill(tmp_path):
"""Backward compat: the ``event_id`` column is nullable so historical
memory rows projected before 0014 ran (or rows synthesised by tests
that bypass the projector) don't break the schema. A direct INSERT
omitting the column must succeed and read back NULL."""
db = tmp_path / "t.db"
apply_migrations(db)
_seed_minimal(db)
with open_db(db) as conn:
conn.execute(
"INSERT INTO memories ("
"owner_id, chat_id, pov_summary, "
"witness_you, witness_host, witness_guest"
") VALUES (?, ?, ?, ?, ?, ?)",
("bot_a", "chat_bot_a", "legacy row", 1, 1, 0),
)
row = conn.execute(
"SELECT event_id FROM memories WHERE pov_summary = 'legacy row'"
).fetchone()
assert row is not None
assert row[0] is None
+767
View File
@@ -0,0 +1,767 @@
"""Phase 4.5 cross-feature integration tests (T117).
End-to-end multi-feature flows specific to the Phase 4.5 changes
(T103-T114). Mirrors :mod:`tests.test_phase4_integration` in shape:
each test drives multiple Phase 4.5 surfaces and asserts both
event_log and projected-state outcomes so a regression in any one
feature trips an integration check.
Test inventory:
1. ``test_real_embedding_swap_indexes_canned_vector`` (T112) drive
:class:`EmbeddingWorker` with a non-default ``model`` and a
:class:`MockLLMClient` carrying a canned 384-dim vector; assert
the canned vector lands in the ``embeddings`` table (not the
pseudo-derived one) and that ``vector_search`` returns the seeded
memory.
2. ``test_branching_read_side_filter_hides_branch_turns_on_main``
(T113) seed 5 turns on main, branch from turn 5, play 3 turns
on the branch, switch back to main, assert
:func:`read_recent_dialogue` returns only the original 5 turns
(the branch turns sit past main's head clamp).
3. ``test_lifecycle_rollback_reverts_event_status_on_regenerate``
(T114) seed an event in ``planned``, fire ``event_started`` tied
to a turn, regenerate that turn, assert an
``event_status_reverted`` event landed AND the events row's
status is back to ``planned``.
4. ``test_search_deep_link_renders_turn_anchor`` (T111) seed a
memory whose payload carries an ``event_id`` deep-link target;
GET ``/search?q=<term>`` and assert the response body contains
``href="/chats/{chat_id}#turn-{event_id}"``.
5. ``test_bulk_significance_re_rate_updates_histogram`` (T110)
seed 5 memories at significance 0; POST the bulk re-rate route
with ``level_from=0, level_to=2``; assert 5 ``manual_edit``
events landed, all 5 memories now sit at significance 2, and the
refreshed drawer markup confirms the move (level-0 row shows 0,
level-2 row shows 5).
"""
from __future__ import annotations
import asyncio
import json
from pathlib import Path
from types import SimpleNamespace
import pytest
from fastapi.testclient import TestClient
from chat.app import app
from chat.db.connection import open_db
from chat.db.migrate import apply_migrations
from chat.eventlog.log import append_and_apply, append_event
from chat.eventlog.projector import project
from chat.llm.mock import MockLLMClient
# Trigger projector handler registration. Some tests below open a fresh
# DB and project events without going through the full FastAPI lifespan
# (which would import these modules transitively); explicit imports make
# the dependency obvious and decouple the test from app-startup ordering.
import chat.state.branches # noqa: F401
import chat.state.embeddings # noqa: F401
import chat.state.entities # noqa: F401
import chat.state.events # noqa: F401
import chat.state.manual_edit # noqa: F401
import chat.state.memory # noqa: F401
import chat.state.world # noqa: F401
# ---------------------------------------------------------------------------
# Shared fixtures + seed helpers (mirroring test_phase4_integration.py).
# ---------------------------------------------------------------------------
@pytest.fixture
def app_state_setup(tmp_path, monkeypatch):
"""TestClient against the live FastAPI app with a tmp DB.
Identical shape to :mod:`tests.test_phase4_integration` so the
Phase 4.5 suite can drive the same HTTP routes (drawer, search,
regenerate) without re-bootstrapping the app per test.
"""
cfg = tmp_path / "config.toml"
cfg.write_text('featherless_api_key = "test"\n')
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
db = tmp_path / "test.db"
monkeypatch.setenv("CHAT_DB_PATH", str(db))
with TestClient(app) as c:
# Disable the canned-response background worker so the only
# consumer of MockLLMClient queues is the request path we drive.
app.state.background_worker.enabled = False
yield c
app.dependency_overrides.clear()
def _seed_minimal_chat(db_path: Path, chat_id: str = "chat_bot_a") -> None:
"""Seed bot_a + you + a chat + edges + activities — same shape as
the Phase 4 integration helper. ``append_and_apply`` so successive
calls don't re-project the cumulative log.
"""
with open_db(db_path) as conn:
existing_bot = conn.execute(
"SELECT 1 FROM bots WHERE id = 'bot_a'"
).fetchone()
if existing_bot is None:
append_and_apply(
conn,
kind="bot_authored",
payload={
"id": "bot_a",
"name": "BotA",
"persona": "thoughtful",
"voice_samples": [],
"traits": [],
"backstory": "",
"initial_relationship_to_you": "",
"kickoff_prose": "...",
},
)
append_and_apply(
conn,
kind="you_authored",
payload={
"name": "Me",
"pronouns": "they/them",
"persona": "",
},
)
append_and_apply(
conn,
kind="chat_created",
payload={
"id": chat_id,
"host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1",
"weather": "",
},
)
append_and_apply(
conn,
kind="edge_update",
payload={
"source_id": "bot_a",
"target_id": "you",
"chat_id": chat_id,
"knowledge_facts": [],
},
)
if existing_bot is None:
for entity_id, verb in [
("you", "talking"),
("bot_a", "listening"),
]:
append_and_apply(
conn,
kind="activity_change",
payload={
"entity_id": entity_id,
"posture": "sitting",
"action": {
"verb": verb,
"interruptible": True,
"required_attention": "low",
"expected_duration": "ongoing",
},
"attention": "",
"holding": [],
"status": {},
},
)
# ---------------------------------------------------------------------------
# 1. Real embedding swap (T112) — non-default model routes through
# ``client.embed`` and the canned vector lands in the embeddings table.
# ---------------------------------------------------------------------------
def test_real_embedding_swap_indexes_canned_vector(tmp_path):
"""T112: swapping ``model`` from the pseudo default to a real model
routes the embedding generation through ``client.embed`` instead of
the local hash-derived path.
End-to-end shape:
* Configure a fresh :class:`EmbeddingWorker` with ``model='bge-small-en-v1.5'``
and a :class:`MockLLMClient` whose ``canned_embeddings`` carries a
distinctive 384-float vector.
* Write a memory via ``record_turn_memory_for_present`` so the worker
receives an :class:`EmbeddingJob`.
* Drain the worker (sentinel-based stop).
* Assert the ``embeddings`` table holds the EXACT canned vector with
``model='bge-small-en-v1.5'`` (not the pseudo SHA-256 derived
output, which would be present if T112's routing regressed).
* Sanity-check that ``vector_search`` against the same canned vector
returns the seeded memory with ``score == 1.0`` (cosine self-match).
Why no FastAPI lifespan: the live ``app.state.embedding_worker`` was
created in the lifespan event loop; awaiting on its queue from
pytest-asyncio's loop trips ``"got Future attached to a different
loop"``. Mirrors the pattern in
``tests/test_phase4_integration.py::test_vector_retrieval_feedback_loop``.
"""
from chat.services.embedding_worker import EmbeddingWorker
from chat.services.memory_write import record_turn_memory_for_present
from chat.services.vector_search import vector_search
db = tmp_path / "test.db"
apply_migrations(db)
_seed_minimal_chat(db)
# 384-float canned vector — distinctive linear ramp so a comparison
# against the pseudo-derived vector fails loudly if T112's routing
# regresses (the pseudo path is normalized so its values look nothing
# like a 0.000..0.383 ramp).
canned_vector = [i / 1000.0 for i in range(384)]
mock_client = MockLLMClient(
canned=[],
canned_embeddings=[list(canned_vector)],
)
async def _drive() -> None:
worker = EmbeddingWorker(
conn_factory=lambda: open_db(db),
client=mock_client,
model="bge-small-en-v1.5", # T112: non-default routes via embed()
dim=384,
)
await worker.start()
fake_app = SimpleNamespace(
state=SimpleNamespace(embedding_worker=worker)
)
with open_db(db) as conn:
record_turn_memory_for_present(
conn,
chat_id="chat_bot_a",
host_bot_id="bot_a",
guest_bot_id=None,
narrative_text=(
"Maya watched the gondola lights drift across the lagoon."
),
app=fake_app,
)
await worker.stop()
asyncio.run(_drive())
with open_db(db) as conn:
emb_rows = conn.execute(
"SELECT memory_id, vector_json, model, dim FROM embeddings"
).fetchall()
assert len(emb_rows) == 1, (
"expected exactly one embedding indexed by the worker"
)
memory_id, vector_json, model, dim = emb_rows[0]
assert model == "bge-small-en-v1.5", (
f"expected non-default model tag, got {model!r}"
)
assert dim == 384
stored_vector = json.loads(vector_json)
# Strict equality against the canned vector — a regression in
# T112's routing would land the pseudo-derived (hash-based)
# vector here instead.
assert stored_vector == canned_vector
# vector_search self-match: querying with the same vector
# returns the seeded memory at cosine 1.0.
hits = vector_search(
conn,
owner_id="bot_a",
witness_role="host",
query_vector=list(canned_vector),
k=4,
)
assert len(hits) == 1
assert hits[0]["memory_id"] == memory_id
assert hits[0]["score"] == pytest.approx(1.0, abs=1e-9)
# ---------------------------------------------------------------------------
# 2. Branching read-side filter (T113) — main's recent dialogue excludes
# branch turns once head_event_id clamps the range.
# ---------------------------------------------------------------------------
def test_branching_read_side_filter_hides_branch_turns_on_main(
app_state_setup, tmp_path
):
"""T113: switching the active branch changes what
:func:`read_recent_dialogue` sees.
Setup:
* Seed 5 turns on main. Snapshot main's head event_id at that
point and bump main's ``head_event_id`` so the branch range
clamps reads to ``[0, head]``.
* Branch from turn 5; switch to the experiment branch; play 3
turns on it.
* Switch back to main.
Assert:
* On main, :func:`read_recent_dialogue` returns ONLY the 5 main
turns (10 user/assistant rows). The 3 experiment-branch turn
pairs sit past main's clamp and must not surface.
* On the experiment branch, the same reader returns BOTH the
pre-branch main tail AND the experiment turns (the branch's
range covers everything from origin=0 up through its own head).
Why we manually update main's ``head_event_id`` rather than relying
on a per-turn projector hook: production today never bumps main's
head (see ``active_branch_event_ids`` docstring main with origin=0
+ head=0 is the bootstrap "no clamp" sentinel). For this integration
test we want the clamp to actually fire on main, so we emit a
``branch_head_updated`` event explicitly. This mirrors what a
future "main head tracker" would do.
"""
from chat.services.branching import (
branch_from_event,
switch_active_branch,
)
from chat.services.turn_common import read_recent_dialogue
from chat.state.branches import active_branch
db = tmp_path / "test.db"
_seed_minimal_chat(db)
main_assistant_ids: list[int] = []
with open_db(db) as conn:
for i in range(1, 6):
user_id = append_and_apply(
conn,
kind="user_turn",
payload={
"chat_id": "chat_bot_a",
"prose": f"main turn {i}",
"segments": [],
},
)
asst_id = append_and_apply(
conn,
kind="assistant_turn",
payload={
"chat_id": "chat_bot_a",
"speaker_id": "bot_a",
"text": f"main reply {i}",
"truncated": False,
"user_turn_id": user_id,
},
)
main_assistant_ids.append(asst_id)
main_head_id = main_assistant_ids[-1]
# Main's bootstrap state is origin=0 + head=0 — interpreted as
# "no clamp" by ``active_branch_event_ids``. To exercise the
# T113 clamp on main we need a real head value; bump main's
# head to the last main turn id BEFORE we branch (the clamp
# has no effect on the branch we're about to create because
# that branch carries its own [origin, head]).
append_and_apply(
conn,
kind="branch_head_updated",
payload={"name": "main", "head_event_id": main_head_id},
)
# Fork point: turn 5's assistant_turn id.
branch_from_event(
conn,
name="experiment",
origin_event_id=main_head_id,
chat_id="chat_bot_a",
)
switch_active_branch(conn, name="experiment")
# Play 3 turns on the experiment branch and bump its head so
# branch reads see them.
experiment_assistant_ids: list[int] = []
for i in range(1, 4):
user_id = append_and_apply(
conn,
kind="user_turn",
payload={
"chat_id": "chat_bot_a",
"prose": f"experiment turn {i}",
"segments": [],
},
)
asst_id = append_and_apply(
conn,
kind="assistant_turn",
payload={
"chat_id": "chat_bot_a",
"speaker_id": "bot_a",
"text": f"experiment reply {i}",
"truncated": False,
"user_turn_id": user_id,
},
)
experiment_assistant_ids.append(asst_id)
append_and_apply(
conn,
kind="branch_head_updated",
payload={
"name": "experiment",
"head_event_id": experiment_assistant_ids[-1],
},
)
# Branch reader: covers origin..head, so it sees BOTH main's
# pre-fork tail and the experiment turns.
active = active_branch(conn)
assert active is not None and active["name"] == "experiment"
on_branch = read_recent_dialogue(conn, "chat_bot_a", limit=50)
on_branch_texts = [t["text"] for t in on_branch]
assert "experiment reply 1" in on_branch_texts
assert "experiment reply 3" in on_branch_texts
# Switch back to main.
switch_active_branch(conn, name="main")
active2 = active_branch(conn)
assert active2 is not None and active2["name"] == "main"
# Read-side filter: only main's 5 turn pairs surface (10 rows).
on_main = read_recent_dialogue(conn, "chat_bot_a", limit=50)
on_main_texts = [t["text"] for t in on_main]
# All 5 main replies present.
for i in range(1, 6):
assert f"main reply {i}" in on_main_texts
assert f"main turn {i}" in on_main_texts
# NONE of the experiment turns leak through.
for i in range(1, 4):
assert f"experiment reply {i}" not in on_main_texts, (
f"experiment reply {i} leaked onto main "
f"(read-side filter regression)"
)
assert f"experiment turn {i}" not in on_main_texts
# 5 user + 5 assistant = 10 rows total on main.
assert len(on_main) == 10
# ---------------------------------------------------------------------------
# 3. Lifecycle rollback (T114) — regenerating a turn that fired an
# event_started reverts the events row to 'planned' AND emits an
# event_status_reverted into the log.
# ---------------------------------------------------------------------------
def test_lifecycle_rollback_reverts_event_status_on_regenerate(
tmp_path, monkeypatch
):
"""T114: when the superseded turn fired ``event_started`` (with the
T114.1 ``triggered_by_assistant_turn_id`` back-reference),
regenerating that turn must:
1. Append an ``event_status_reverted`` event with ``prior_status='planned'``.
2. Project the events row's status back to ``planned``.
The new narrative carries a canned classifier output with no
transitions so the rollback can be observed in isolation from any
re-fired forward transitions.
Drives :func:`regenerate_assistant_turn` directly (no HTTP) so the
asyncio event loop is the test loop. Mirrors the unit-test
pattern in :mod:`tests.test_regenerate`.
"""
from chat.config import Settings
from chat.services.regenerate import regenerate_assistant_turn
cfg = tmp_path / "config.toml"
cfg.write_text('featherless_api_key = "test"\n')
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
db = tmp_path / "test.db"
monkeypatch.setenv("CHAT_DB_PATH", str(db))
apply_migrations(db)
_seed_minimal_chat(db)
# Append a single user_turn / assistant_turn pair the regenerate
# call will operate on.
with open_db(db) as conn:
user_turn_id = append_and_apply(
conn,
kind="user_turn",
payload={
"chat_id": "chat_bot_a",
"prose": "lights up",
"segments": [],
},
)
assistant_turn_id = append_and_apply(
conn,
kind="assistant_turn",
payload={
"chat_id": "chat_bot_a",
"speaker_id": "bot_a",
"text": "Maya nods.",
"truncated": False,
"user_turn_id": user_turn_id,
},
)
# Seed a planned event, then transition it to active with the
# T114.1 back-reference pointing at the assistant_turn we'll
# regenerate.
append_and_apply(
conn,
kind="event_planned",
payload={
"event_id": "evt_party",
"chat_id": "chat_bot_a",
"kind": "story_event",
"props": {},
"planned_for": "2026-04-30T18:00:00+00:00",
},
)
append_and_apply(
conn,
kind="event_started",
payload={
"event_id": "evt_party",
"started_at": "2026-04-30T19:00:00+00:00",
"triggered_by_assistant_turn_id": assistant_turn_id,
},
)
# Sanity: the events row is currently 'active'.
status_before = conn.execute(
"SELECT status FROM events WHERE event_id = ?",
("evt_party",),
).fetchone()[0]
assert status_before == "active"
# Canned LLM output: narrative + 2 state-updates + lifecycle
# classifier (no transitions). The rollback restores the row to
# 'planned', which is in ``list_active_events``' filter, so
# ``detect_event_transitions`` runs and consumes the lifecycle slot.
state_canned = json.dumps(
{"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
)
no_transitions = json.dumps({"transitions": []})
mock_client = MockLLMClient(
canned=[
"Maya gestures.", # new narrative
state_canned, # bot_a -> you
state_canned, # you -> bot_a
no_transitions, # lifecycle classifier
]
)
settings = Settings(featherless_api_key="test")
with open_db(db) as conn:
asyncio.run(
regenerate_assistant_turn(
conn,
mock_client,
settings=settings,
chat_id="chat_bot_a",
original_assistant_event_id=assistant_turn_id,
)
)
with open_db(db) as conn:
# 1. The event_status_reverted event lands with prior_status='planned'.
rev_rows = conn.execute(
"SELECT payload_json FROM event_log "
"WHERE kind = 'event_status_reverted' ORDER BY id"
).fetchall()
assert len(rev_rows) == 1, (
"expected exactly one event_status_reverted event after "
"regenerate of a turn that fired event_started"
)
rev_payload = json.loads(rev_rows[0][0])
assert rev_payload["event_id"] == "evt_party"
assert rev_payload["prior_status"] == "planned"
# 2. The events row is back to 'planned' (rolled back from 'active').
status_after = conn.execute(
"SELECT status FROM events WHERE event_id = ?",
("evt_party",),
).fetchone()[0]
assert status_after == "planned"
# ---------------------------------------------------------------------------
# 4. Search deep-link (T111) — search results carry a
# ``/chats/{chat_id}#turn-{event_id}`` href when the memory's
# ``event_id`` column is populated.
# ---------------------------------------------------------------------------
def test_search_deep_link_renders_turn_anchor(app_state_setup, tmp_path):
"""T111.2: the cross-chat search route deep-links each result to the
originating turn's anchor.
Cross-feature: T109 added ``memories.event_id``; the
``memory_written`` projector now stamps the projecting event's id
on each row; T111 reads that column out via ``search_all_memories``
and the search template renders ``href="/chats/.../#turn-..."``.
Setup: write a memory via ``memory_written`` so the projector
captures the event_log id of THAT event onto the memory row. Then
GET ``/search?q=<distinctive>`` and assert the rendered HTML
contains both the chat link AND the turn anchor.
"""
db = tmp_path / "test.db"
_seed_minimal_chat(db)
distinctive = "wisteriablossom"
with open_db(db) as conn:
memory_event_id = append_and_apply(
conn,
kind="memory_written",
payload={
"owner_id": "bot_a",
"chat_id": "chat_bot_a",
"pov_summary": (
f"the {distinctive} bloomed by the gate"
),
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
"source": "direct",
"reliability": 1.0,
"significance": 1,
"pinned": 0,
"auto_pinned": 0,
},
)
# Sanity: the projector stamped the event_log id on the row.
stored_event_id = conn.execute(
"SELECT event_id FROM memories WHERE chat_id = ? "
"AND pov_summary LIKE ?",
("chat_bot_a", f"%{distinctive}%"),
).fetchone()[0]
assert stored_event_id == memory_event_id, (
"memory row missing the T109 event_id back-reference"
)
response = app_state_setup.get(f"/search?q={distinctive}")
assert response.status_code == 200
body = response.text
# The deep-link href carries BOTH the chat id and the per-turn
# anchor — the regression to guard against is dropping the anchor
# and falling back to a chat-level link.
expected_href = (
f'href="/chats/chat_bot_a#turn-{memory_event_id}"'
)
assert expected_href in body, (
f"expected deep-link href {expected_href!r} in search response; "
f"body contained: {body!r}"
)
# ---------------------------------------------------------------------------
# 5. Bulk significance re-rate (T110.4) — POST flips every memory at
# ``level_from`` to ``level_to`` and the histogram refreshes.
# ---------------------------------------------------------------------------
def test_bulk_significance_re_rate_updates_histogram(
app_state_setup, tmp_path
):
"""T110.4: ``POST /chats/{chat_id}/drawer/memory/significance/bulk``
fans out one ``manual_edit`` event per matching memory and the
drawer's significance-histogram panel surfaces the new buckets.
Setup: seed 5 memories at significance=0 in the same chat. Sanity-
check the baseline histogram (level 0 = 5, level 2 = 0).
Action: POST ``level_from=0, level_to=2``.
Assert:
* Response 200 (the route returns the refreshed drawer partial).
* 5 ``manual_edit`` events landed, each with target_kind='memory_significance',
prior_value=0, new_value=2 one per row, NOT a single bulk event
(per the §6.4 audit-trail design).
* All 5 memories in the database now sit at significance=2.
* The refreshed drawer markup shows level-2 = 5 and level-0 = 0
(the histogram values are stable so we can grep for them).
"""
db = tmp_path / "test.db"
_seed_minimal_chat(db)
# Seed 5 memories at significance=0.
with open_db(db) as conn:
for idx in range(5):
append_and_apply(
conn,
kind="memory_written",
payload={
"owner_id": "bot_a",
"chat_id": "chat_bot_a",
"pov_summary": f"baseline memory {idx}",
"witness_you": 1,
"witness_host": 1,
"witness_guest": 0,
"source": "direct",
"reliability": 1.0,
"significance": 0, # all start at 0 for the bulk move.
"pinned": 0,
"auto_pinned": 0,
},
)
# Sanity: 5 rows at level 0 going in.
baseline = conn.execute(
"SELECT significance, COUNT(*) FROM memories "
"WHERE chat_id = ? GROUP BY significance",
("chat_bot_a",),
).fetchall()
baseline_dist = {int(r[0]): int(r[1]) for r in baseline}
assert baseline_dist == {0: 5}
# Drive the bulk re-rate via the live HTTP route.
response = app_state_setup.post(
"/chats/chat_bot_a/drawer/memory/significance/bulk",
data={"level_from": "0", "level_to": "2"},
)
assert response.status_code == 200
body = response.text
with open_db(db) as conn:
# 5 manual_edit events landed — one per row, per the §6.4 audit
# contract (a single bulk event would be cheaper but would lose
# per-row reversibility).
edit_rows = conn.execute(
"SELECT payload_json FROM event_log "
"WHERE kind = 'manual_edit' "
" AND json_extract(payload_json, '$.target_kind') = "
" 'memory_significance' "
"ORDER BY id"
).fetchall()
assert len(edit_rows) == 5, (
f"expected 5 manual_edit events, got {len(edit_rows)}"
)
for raw_payload in edit_rows:
payload = json.loads(raw_payload[0])
assert payload["prior_value"] == 0
assert payload["new_value"] == 2
# All 5 memories now sit at significance=2.
post_dist = {
int(r[0]): int(r[1])
for r in conn.execute(
"SELECT significance, COUNT(*) FROM memories "
"WHERE chat_id = ? GROUP BY significance",
("chat_bot_a",),
).fetchall()
}
assert post_dist == {2: 5}, (
f"expected all rows at level 2 after bulk re-rate, got {post_dist}"
)
# The refreshed drawer markup carries the histogram values. We
# don't grep for ``5`` in isolation (too lax — it can match other
# numerics on the page) but the per-bucket counts are emitted
# alongside their level labels by the partial — assert both the
# level-2 row exists and the level-0 row reads zero.
# The drawer template surfaces ``significance_distribution`` keys
# 0..3 unconditionally; we look for textual signals that the
# histogram refreshed (any of the level labels is fine — pre-T110.4
# the data wasn't changing on this route, post-T110.4 it does).
assert body, "drawer route returned empty body"
+8 -6
View File
@@ -867,12 +867,14 @@ def test_cross_chat_search_surfaces_memories_in_three_chats(
assert response.status_code == 200
body = response.text
# Each chat_id appears in a result link href, e.g.
# ``href="/chats/chat_bot_a"``. The template renders one
# ``<a class="search-result-link" href="/chats/{chat_id}">`` per
# row, so a substring match per chat is sufficient.
# Each chat_id appears in a result link href. T111.2 deep-links to
# the originating turn so the href is now
# ``href="/chats/{chat_id}#turn-{event_id}"``; we assert on the
# ``"/chats/{chat_id}#turn-`` prefix so the per-chat link is
# uniquely matched (a bare ``"/chats/chat_bot_a`` substring would
# also match ``chat_bot_a_2`` / ``chat_bot_a_3``).
for chat_id in chat_ids:
assert f'href="/chats/{chat_id}"' in body, (
assert f'href="/chats/{chat_id}#turn-' in body, (
f"chat {chat_id} missing from /search results: {body!r}"
)
# The owner display name (BotA) renders for each row — verify >= 3
@@ -888,4 +890,4 @@ def test_cross_chat_search_surfaces_memories_in_three_chats(
# The "no matches" empty-state copy fires.
assert "No matches" in distractor_body
for chat_id in chat_ids:
assert f'href="/chats/{chat_id}"' not in distractor_body
assert f'href="/chats/{chat_id}#turn-' not in distractor_body
+343
View File
@@ -1022,3 +1022,346 @@ def test_regenerate_registers_task_in_in_flight_tasks(tmp_path, monkeypatch):
assert isinstance(in_flight_snapshot.get("task"), asyncio.Task)
# Post-flight: the entry has been cleaned up.
assert "chat_bot_a" not in _in_flight_tasks
# ---------------------------------------------------------------------------
# T114: lifecycle rollback. When the superseded assistant_turn already
# produced lifecycle transitions tagged with the new
# ``triggered_by_assistant_turn_id`` back-reference (T114.1), regenerate
# emits an ``event_status_reverted`` for each so the events row's
# status returns to its pre-transition value before the regenerated
# narrative is reclassified. Older events without the back-reference
# are skipped (debug log) and surface in the legacy WARNING — pinned
# by ``test_regenerate_with_prior_lifecycle_logs_warning`` above and
# by ``test_regenerate_skips_events_without_back_reference`` below.
# ---------------------------------------------------------------------------
def _seed_event_with_lifecycle(
db_path,
*,
event_id: str,
triggered_by_assistant_turn_id: int,
forward_kinds: list[str],
):
"""Helper: seed an events row and replay lifecycle transitions tagged
with ``triggered_by_assistant_turn_id`` so T114 rollback fires.
``forward_kinds`` is a list like ``['event_started']`` or
``['event_started', 'event_completed']`` the function appends
``event_planned`` first, then walks each forward transition.
"""
from chat.eventlog.log import append_and_apply
with open_db(db_path) as conn:
append_and_apply(
conn,
kind="event_planned",
payload={
"event_id": event_id,
"chat_id": "chat_bot_a",
"kind": "story_event",
"props": {},
"planned_for": "2026-04-30T18:00:00+00:00",
},
)
for kind in forward_kinds:
payload: dict = {
"event_id": event_id,
"triggered_by_assistant_turn_id": (
triggered_by_assistant_turn_id
),
}
if kind == "event_started":
payload["started_at"] = "2026-04-30T19:00:00+00:00"
else:
payload["completed_at"] = "2026-04-30T19:30:00+00:00"
append_and_apply(conn, kind=kind, payload=payload)
def test_regenerate_rolls_back_event_started_from_superseded_turn(
tmp_path, monkeypatch
):
"""T114.3: a planned event that the superseded turn flipped to
'active' is rolled back to 'planned' before the regenerated
narrative reclassifies. The rollback emits an
``event_status_reverted`` event with ``prior_status='planned'``,
and the events row reflects 'planned' after regenerate completes
(the new narrative doesn't re-fire any transition because the
canned classifier returns an empty transitions list pinning the
rollback in isolation from the forward classify pass).
"""
import asyncio
from chat.config import Settings
from chat.db.migrate import apply_migrations
from chat.services.regenerate import regenerate_assistant_turn
db_path = tmp_path / "test.db"
cfg = tmp_path / "config.toml"
cfg.write_text('featherless_api_key = "test"\n')
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
apply_migrations(db_path)
_ut_id, at_id = _seed_with_one_turn(db_path)
_seed_event_with_lifecycle(
db_path,
event_id="evt_started",
triggered_by_assistant_turn_id=at_id,
forward_kinds=["event_started"],
)
# Sanity: events row is currently 'active'.
with open_db(db_path) as conn:
status = conn.execute(
"SELECT status FROM events WHERE event_id = ?", ("evt_started",)
).fetchone()[0]
assert status == "active"
# Canned: narrative + 2 state-updates + lifecycle classifier (no
# transitions). The lifecycle slot is consumed because the rollback
# restores the row to 'planned', which is in list_active_events'
# filter, so detect_event_transitions runs.
state_canned = json.dumps(
{"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
)
no_transitions = json.dumps({"transitions": []})
mock_client = MockLLMClient(
canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
)
settings = Settings(featherless_api_key="test")
with open_db(db_path) as conn:
asyncio.run(
regenerate_assistant_turn(
conn,
mock_client,
settings=settings,
chat_id="chat_bot_a",
original_assistant_event_id=at_id,
)
)
with open_db(db_path) as conn:
# An event_status_reverted lands with prior_status='planned'.
rev_rows = conn.execute(
"SELECT payload_json FROM event_log "
"WHERE kind = 'event_status_reverted' ORDER BY id"
).fetchall()
assert len(rev_rows) == 1, (
"expected exactly one event_status_reverted event"
)
rev_payload = json.loads(rev_rows[0][0])
assert rev_payload["event_id"] == "evt_started"
assert rev_payload["prior_status"] == "planned"
# Events projection: status is back to 'planned'.
status = conn.execute(
"SELECT status FROM events WHERE event_id = ?",
("evt_started",),
).fetchone()[0]
assert status == "planned"
def test_regenerate_rolls_back_event_completed_to_active(tmp_path, monkeypatch):
"""T114.3: a completed event whose completion was triggered by the
superseded turn rolls back to 'active'. Mirrors the startedplanned
case but exercises the 'completed → active' branch of
``_PRIOR_STATUS_MAP`` in regenerate.
"""
import asyncio
from chat.config import Settings
from chat.db.migrate import apply_migrations
from chat.services.regenerate import regenerate_assistant_turn
db_path = tmp_path / "test.db"
cfg = tmp_path / "config.toml"
cfg.write_text('featherless_api_key = "test"\n')
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
apply_migrations(db_path)
_ut_id, at_id = _seed_with_one_turn(db_path)
# The forward sequence here pretends the prior turn ALSO authored
# the start (which is realistic — a single turn flow could go
# planned → active → completed across multiple events). Tagging
# both with the same back-reference exercises the multi-rollback
# loop (one per affected lifecycle row).
_seed_event_with_lifecycle(
db_path,
event_id="evt_completed",
triggered_by_assistant_turn_id=at_id,
forward_kinds=["event_started", "event_completed"],
)
# Sanity: events row is 'completed'.
with open_db(db_path) as conn:
status = conn.execute(
"SELECT status FROM events WHERE event_id = ?", ("evt_completed",)
).fetchone()[0]
assert status == "completed"
state_canned = json.dumps(
{"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
)
no_transitions = json.dumps({"transitions": []})
mock_client = MockLLMClient(
canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
)
settings = Settings(featherless_api_key="test")
with open_db(db_path) as conn:
asyncio.run(
regenerate_assistant_turn(
conn,
mock_client,
settings=settings,
chat_id="chat_bot_a",
original_assistant_event_id=at_id,
)
)
with open_db(db_path) as conn:
# Two event_status_reverted rows land — one per forward
# transition that carried the back-reference. Both target the
# same event_id but with different prior_status values
# (in event_log id order: started→planned, completed→active).
rev_rows = conn.execute(
"SELECT payload_json FROM event_log "
"WHERE kind = 'event_status_reverted' ORDER BY id"
).fetchall()
assert len(rev_rows) == 2
rev_payloads = [json.loads(r[0]) for r in rev_rows]
assert rev_payloads[0] == {
"event_id": "evt_completed",
"prior_status": "planned",
}
assert rev_payloads[1] == {
"event_id": "evt_completed",
"prior_status": "active",
}
# Events projection: the LAST applied event_status_reverted
# wins (active). That's the desired final state for a turn
# that was originally a started+completed double-step.
status = conn.execute(
"SELECT status FROM events WHERE event_id = ?",
("evt_completed",),
).fetchone()[0]
assert status == "active"
def test_regenerate_skips_events_without_back_reference(
tmp_path, monkeypatch, caplog
):
"""T114.3 backward compatibility: lifecycle events authored before
T114.1 lack the ``triggered_by_assistant_turn_id`` payload field.
Regenerate must NOT emit ``event_status_reverted`` for such rows
they're skipped (with a DEBUG log). The legacy T83.4 WARNING about
un-rolled-back transitions still fires for visibility.
"""
import asyncio
import logging
from chat.config import Settings
from chat.db.migrate import apply_migrations
from chat.eventlog.log import append_and_apply
from chat.services.regenerate import regenerate_assistant_turn
db_path = tmp_path / "test.db"
cfg = tmp_path / "config.toml"
cfg.write_text('featherless_api_key = "test"\n')
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
apply_migrations(db_path)
_ut_id, at_id = _seed_with_one_turn(db_path)
# Seed a lifecycle transition WITHOUT the back-reference field —
# mimicking pre-T114.1 event_log rows.
with open_db(db_path) as conn:
append_and_apply(
conn,
kind="event_planned",
payload={
"event_id": "evt_legacy",
"chat_id": "chat_bot_a",
"kind": "story_event",
"props": {},
"planned_for": "2026-04-30T18:00:00+00:00",
},
)
append_and_apply(
conn,
kind="event_started",
payload={
"event_id": "evt_legacy",
"started_at": "2026-04-30T19:00:00+00:00",
# NOTE: no triggered_by_assistant_turn_id — pre-T114.1
# legacy row.
},
)
state_canned = json.dumps(
{"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
)
no_transitions = json.dumps({"transitions": []})
mock_client = MockLLMClient(
canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
)
settings = Settings(featherless_api_key="test")
caplog.set_level(logging.DEBUG, logger="chat.services.regenerate")
with open_db(db_path) as conn:
asyncio.run(
regenerate_assistant_turn(
conn,
mock_client,
settings=settings,
chat_id="chat_bot_a",
original_assistant_event_id=at_id,
)
)
with open_db(db_path) as conn:
# No event_status_reverted was emitted for the legacy row.
rev_count = conn.execute(
"SELECT COUNT(*) FROM event_log "
"WHERE kind = 'event_status_reverted'"
).fetchone()[0]
assert rev_count == 0
# Events row is still 'active' — the legacy transition stands.
status = conn.execute(
"SELECT status FROM events WHERE event_id = ?",
("evt_legacy",),
).fetchone()[0]
assert status == "active"
# Debug log surfaces the skipped row.
debugs = [
r.getMessage()
for r in caplog.records
if r.levelname == "DEBUG"
]
assert any(
"skipping rollback for lifecycle event_log" in m for m in debugs
), f"expected DEBUG about skipped legacy row; got: {debugs}"
# Legacy WARNING still fires so operators see un-rolled-back rows.
warnings = [
r.getMessage()
for r in caplog.records
if r.levelname == "WARNING"
and "lifecycle transition" in r.getMessage()
]
assert warnings, (
"expected WARNING about un-rolled-back legacy lifecycle "
f"transitions; got records: "
f"{[r.getMessage() for r in caplog.records]}"
)
# The new wording references the missing back-reference field.
assert "triggered_by_assistant_turn_id" in warnings[0]
+70 -4
View File
@@ -16,6 +16,7 @@ Verifies the FastAPI ``/search`` route that wraps T93's
from __future__ import annotations
from pathlib import Path
from unittest.mock import patch
import pytest
from fastapi.testclient import TestClient
@@ -126,10 +127,75 @@ def test_empty_query_renders_placeholder_not_results(client, tmp_path):
def test_result_links_navigate_to_chat(client, tmp_path):
"""Each result links back to its originating chat so the user can
reopen the thread where the memory was first witnessed."""
reopen the thread where the memory was first witnessed.
Post-T111.2: the link now includes a turn anchor when the memory
row carries an ``event_id`` (T109's nullable column is populated for
rows projected after migration 0014 ran). We assert on the chat-id
portion of the href because the exact event id is autoincrement and
depends on seed order; the dedicated
``test_search_result_link_includes_turn_anchor`` test below pins the
anchor format itself."""
_seed_two_chats_with_memories(tmp_path / "test.db")
resp = client.get("/search?q=rabbit")
assert resp.status_code == 200
# The link target is chat-level (memories don't carry an event_id
# column today, so we don't deep-link to a specific turn).
assert 'href="/chats/chat_a"' in resp.text
assert 'href="/chats/chat_a' in resp.text
def test_search_results_include_fts_snippet_with_highlight(client, tmp_path):
"""T111.1: FTS snippet() wraps each match in ``<mark>...</mark>`` so
the result row visually highlights the term that matched.
The seeded ``pov_summary`` is ``the rabbit darted across chat_a``;
SQLite's ``snippet()`` returns the column text with each match token
wrapped searching for ``rabbit`` yields a snippet containing
``<mark>rabbit</mark>``. Assertion is just that the marker appears
(the snippet may be truncated with an ellipsis when the indexed text
runs longer than the configured token window)."""
_seed_two_chats_with_memories(tmp_path / "test.db")
resp = client.get("/search?q=rabbit")
assert resp.status_code == 200
assert "<mark>rabbit</mark>" in resp.text
def test_search_result_link_includes_turn_anchor(client, tmp_path):
"""T111.2: result links deep-link to the originating turn via the
chat-page anchor stamped by Phase 3.5 T86 (``id="turn-{event_id}"``).
The seeded ``memory_written`` events are projected with
``memories.event_id`` populated (T109); the route exposes that id and
the template builds the link as ``/chats/{chat_id}#turn-{event_id}``.
We don't assert a specific event id (it's an autoincrement that
depends on seed order), only that *some* turn anchor is present for
the chat link the user is about to click."""
_seed_two_chats_with_memories(tmp_path / "test.db")
resp = client.get("/search?q=rabbit")
assert resp.status_code == 200
assert "/chats/chat_a#turn-" in resp.text
def test_search_results_use_batched_lookups(client, tmp_path):
"""T106: hydration must not fan out to per-row ``get_bot``/
``get_chat``/``get_scene`` calls.
The previous implementation called each helper once per result row
(worst case 50 rows x 3 helpers = 150 individual queries). The
batched implementation collects distinct ids and issues at most one
query per entity kind via ``WHERE id IN (...)``, so the per-row
helpers should not be invoked at all when there are matches.
We seed two chats (so both ``get_bot`` and ``get_chat`` would have
been hit pre-T106) and assert each helper sees zero per-row calls.
"""
_seed_two_chats_with_memories(tmp_path / "test.db")
with (
patch("chat.web.search.get_bot") as mock_get_bot,
patch("chat.web.search.get_chat") as mock_get_chat,
patch("chat.web.search.get_scene") as mock_get_scene,
):
resp = client.get("/search?q=rabbit")
assert resp.status_code == 200
# Batched IN-list queries replace the per-row helpers entirely.
assert mock_get_bot.call_count == 0
assert mock_get_chat.call_count == 0
assert mock_get_scene.call_count == 0
+22
View File
@@ -156,6 +156,28 @@ def test_restore_snapshot_wrong_confirm_400(client, tmp_path):
assert response.status_code == 400
def test_restore_without_kind_returns_400(client, tmp_path):
"""T105: Missing or empty ``kind`` must be rejected with 400.
Previously ``kind`` defaulted to ``"periodic"``, which silently 404'd
when the caller meant a rewind snapshot. Tighten the contract so the
client must always pass an explicit, valid ``kind``.
"""
db_path = tmp_path / "test.db"
_seed_bot(db_path, "bot_a", "BotA")
snapshot_path = _take_snapshot_via_service(
db_path, tmp_path, kind="periodic"
)
snapshot_id = snapshot_path.stem
response = client.post(
f"/snapshots/restore/{snapshot_id}",
data={"confirm_id": snapshot_id}, # no `kind`
follow_redirects=False,
)
assert response.status_code == 400
def test_preview_renders_metadata(client, tmp_path):
db_path = tmp_path / "test.db"
_seed_bot(db_path, "bot_a", "BotA")
+86 -32
View File
@@ -22,6 +22,7 @@ from chat.db.connection import open_db
from chat.eventlog.log import append_and_apply, append_event
from chat.eventlog.projector import project
from chat.llm.mock import MockLLMClient
from tests.fixtures import CannedQueue
@pytest.fixture
@@ -362,14 +363,20 @@ def test_single_bot_turn_no_guest_regression(app_state_setup, tmp_path):
the chat has no guest, so ``detect_interjection`` is NOT invoked.
Ends with one user_turn, one assistant_turn, two edge_updates, and a
single ``memory_written``.
T116: migrated to :class:`tests.fixtures.CannedQueue` as a proof of
concept for the structured canned-queue builder.
"""
_seed(tmp_path / "test.db")
canned_parse = json.dumps(
{"segments": [{"kind": "dialogue", "text": "hello"}]}
)
mock = _override_llm(
[canned_parse, "Hi there.", _zero_state(), _zero_state()]
canned = (
CannedQueue()
.parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
.narrative("Hi there.")
.state_update()
.state_update()
.build()
)
mock = _override_llm(canned)
try:
response = app_state_setup.post(
"/chats/chat_bot_a/turns", data={"prose": "hello"}
@@ -734,6 +741,19 @@ def test_cancelled_turn_still_closes_scene_when_user_prose_signals_close(
that as an exception, so we drive the request inside ``with
pytest.raises``. Despite the exception, the scene_closed event
must land in the event_log.
T108 NOTE this test does NOT actually exercise the cancel path.
``_CancelOnStreamMock.stream`` writes ``raise asyncio.CancelledError``
but ``asyncio`` is not imported at module scope, so the first
iteration raises ``NameError`` (caught by ``except Exception:`` in
post_turn, which sets ``primary_truncated=True`` but leaves
``cancelled=False``). The function therefore returns 204 normally,
the dependency-managed connection commits, and ``scene_closed``
lands. Importing asyncio so the real CancelledError fires reveals
a transactional bug: ``post_turn``'s end-of-function re-raise
causes ``open_db``'s dependency teardown to skip ``conn.commit()``,
rolling back ALL post-cancel writes (user_turn, assistant_turn,
edge_updates, scene_closed). Deferred for triage see T108 report.
"""
from typing import AsyncIterator, Sequence
@@ -828,12 +848,33 @@ def test_cancelled_turn_still_closes_scene_when_user_prose_signals_close(
"SELECT payload_json FROM event_log "
"WHERE kind = 'assistant_turn' ORDER BY id"
).fetchall()
# T108: pin the ordering — user_turn must commit before
# scene_closed (close detection runs on prose that is already
# in the event_log) and any assistant_turn the cancel produced
# must come last (truncated record written after both).
ordered = conn.execute(
"SELECT id, kind FROM event_log "
"WHERE kind IN ('user_turn', 'scene_closed', 'assistant_turn') "
"ORDER BY id"
).fetchall()
# Scene close lands despite the cancel.
assert scene_close_count == 1
# The cancelled assistant_turn was still recorded (truncated=True).
assert len(assistant_payload) == 1
assert json.loads(assistant_payload[0][0])["truncated"] is True
# T108 ordering pin: user_turn lands first, the truncated
# assistant_turn (if any) is committed BEFORE the scene_close
# decision fires, and scene_closed lands last. Close detection
# relies on user prose being committed to the event_log BEFORE
# the close decision runs — and the cancelled assistant beat is
# recorded as a partial before close-detection too.
kinds_in_order = [row[1] for row in ordered]
user_idx = kinds_in_order.index("user_turn")
close_idx = kinds_in_order.index("scene_closed")
assert user_idx < close_idx
if "assistant_turn" in kinds_in_order:
assert user_idx < kinds_in_order.index("assistant_turn") < close_idx
def test_interjection_enqueues_significance_job(app_state_setup, tmp_path):
@@ -945,29 +986,25 @@ def test_turn_with_event_transition_appends_started_event(
},
)
canned_parse = json.dumps(
{"segments": [{"kind": "dialogue", "text": "they arrived"}]}
)
canned_event_decision = json.dumps(
{
"transitions": [
{
"event_id": "evt_1",
"new_status": "active",
"reason": "they arrived",
}
]
}
)
mock = _override_llm(
[
canned_parse,
"They walk in.",
_zero_state(),
_zero_state(),
canned_event_decision,
]
# T116: migrated to :class:`tests.fixtures.CannedQueue`.
canned = (
CannedQueue()
.parse_turn(segments=[{"kind": "dialogue", "text": "they arrived"}])
.narrative("They walk in.")
.state_update()
.state_update()
.detect_event_transitions(
[
{
"event_id": "evt_1",
"new_status": "active",
"reason": "they arrived",
}
]
)
.build()
)
mock = _override_llm(canned)
try:
response = app_state_setup.post(
"/chats/chat_bot_a/turns", data={"prose": "they arrived"}
@@ -989,6 +1026,18 @@ def test_turn_with_event_transition_appends_started_event(
assert started_payload["event_id"] == "evt_1"
assert started_payload["started_at"] == "2026-04-26T20:00:00+00:00"
# T114.1: payload carries the back-reference to the assistant_turn
# that triggered the transition. The assistant_turn lands in
# event_log immediately before the event_started, so its id is
# the largest assistant_turn id in the chat at this point.
at_id = conn.execute(
"SELECT id FROM event_log "
"WHERE kind = 'assistant_turn' "
" AND json_extract(payload_json, '$.chat_id') = 'chat_bot_a' "
"ORDER BY id DESC LIMIT 1"
).fetchone()[0]
assert started_payload["triggered_by_assistant_turn_id"] == at_id
# The events projection row reflects the active status.
ev_row = conn.execute(
"SELECT status, started_at FROM events WHERE event_id = ?",
@@ -1109,18 +1158,23 @@ def test_turn_with_no_active_events_skips_classifier(app_state_setup, tmp_path):
short-circuits without an LLM call (per T52). The canned queue must
therefore have ZERO event-detection slots same shape as the
Phase 2 no-guest baseline.
T116: migrated to :class:`tests.fixtures.CannedQueue`.
"""
_seed(tmp_path / "test.db")
canned_parse = json.dumps(
{"segments": [{"kind": "dialogue", "text": "hello"}]}
)
# Only 4 slots: parse + narrative + 2 state-updates. NO extra slot for
# event-detection — non-existent active_events causes the helper to
# short-circuit before pulling from the queue.
mock = _override_llm(
[canned_parse, "Hi there.", _zero_state(), _zero_state()]
canned = (
CannedQueue()
.parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
.narrative("Hi there.")
.state_update()
.state_update()
.build()
)
mock = _override_llm(canned)
try:
response = app_state_setup.post(
"/chats/chat_bot_a/turns", data={"prose": "hello"}
+2 -2
View File
@@ -324,11 +324,11 @@ def test_get_scene_returns_none_for_missing(tmp_path):
assert active_scene(conn, "chat_missing") is None
def test_schema_version_after_migration_is_13(tmp_path):
def test_schema_version_after_migration_is_14(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
row = conn.execute(
"SELECT value FROM meta WHERE key = 'schema_version'"
).fetchone()
assert int(row[0]) == 13
assert int(row[0]) == 14