diff --git a/CLAUDE.md b/CLAUDE.md index ab0a5dc..1545301 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -322,53 +322,48 @@ Phase 4 polish shipped end-to-end across 15 tasks (T88–T102). Vector retrieval ### Phase 4.5 / 5 backlog -New follow-ups discovered during Phase 4 reviews and execution. None are blocking; pick up at any time. +All items shipped or deferred to Phase 5 (see "Phase 5 backlog" below). Final schema version: 14. -#### From T88 review +## Phase 4.5 status -- **`embeddings` FK lacks `ON DELETE CASCADE`**: deindex events are the only deletion path; if memories ever get deleted directly (raw SQL), embedding rows orphan. Defensible since projector model uses explicit deindex events, but worth a comment or `ON DELETE CASCADE` addition. +Phase 4.5 cleanup shipped 13 of 14 planned tasks (T103–T117 with T115 deferred; T118 is this docs sweep). Two CLAUDE.md backlogs (Phase 3.6/4, Phase 4.5/5) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 5 backlog" section below. Schema baseline advanced from version 13 to **14** (migration 0014: `memories.event_id`). Test count grew from ~413 (Phase 4) to ~457 (+~44 new tests across the wave). -#### From T89 review +- **Wave 1 — trivial polish (parallel)**: + - **T103** branches polish — global-branch (`chat_id IS NULL`) leak documented in `list_branches`; branch-switch to nonexistent name now logs a warning. + - **T104** `memory.py` DRY — `MAX(id)` helper extracted; `fts_rank=None` contract documented for vector-only rows. + - **T105** `snapshots.py` polish — `datetime`/`timezone` imports hoisted to module level; strict `kind` validation in restore/preview (rejects missing); `created_at` from file mtime documented. + - **T106** `search.py` polish — `k=50` extracted to module constant; N+1 `get_bot`/`get_chat`/`get_scene` lookups batched. + - **T107** `embeddings.py` — `timeout_s` fallback-path warning when non-default model misconfigured. +- **Wave 2 — scene-close-on-cancel (single)**: + - **T108** strengthened the T74.3 regression test + documented rationale in `turns.py`. **Surfaced a deferred bug**: existing pin only passes because `asyncio` isn't imported in the test module (NameError caught instead of CancelledError). When CancelledError fires for real, `post_turn`'s end-of-function re-raise causes `open_db`'s dependency teardown to skip `conn.commit()`, rolling back ALL post-cancel writes. Documented and deferred to Phase 5 triage. +- **Wave 3 — schema 0014 (single)**: + - **T109** `memories.event_id` column (foundation for T111 deep-link). FK CASCADE on `embeddings.memory_id` deferred (memories rows are never deleted today; defensive constraint can't fire — saved for broader migration cleanup in Phase 5). +- **Wave 4 — drawer Phase 4.5 bundle (single)**: + - **T110** `event_id <= 0` guard in `delete_turn` + `html.escape()` on delete-impact modal + Jinja partial extraction + bulk significance re-rate per chat (one `manual_edit` event per memory). +- **Wave 5 — search UX (single)**: + - **T111** FTS snippet highlighting via `snippet()` + deep-link to turn via `memories.event_id`. +- **Wave 6 — real embedding model swap (single)**: + - **T112** `LLMClient.embed()` Protocol + Mock impl with `canned_embeddings` + `FeatherlessClient.embed()` (raises `NotImplementedError` — Featherless OAI-compat doesn't expose embeddings, gap documented) + `generate_embedding` routes non-default models through `client.embed()` with fallback + `--re-embed-all` backfill flag. +- **Wave 7 — branching read-side filter (single)**: + - **T113** `active_branch_event_ids(conn)` helper + applied to `read_recent_dialogue`, `scene_summarize._read_recent_dialogue`, `search_memories`, and `meanwhile._read_recent_meanwhile_dialogue`. Cross-chat search and projector queries deliberately NOT filtered (cross-chat is by design; projectors must see full log). Bootstrap "main" branch (origin=0, head=0) detected as the no-clamp sentinel. +- **Wave 8 — regenerate lifecycle rollback (single)**: + - **T114** `triggered_by_assistant_turn_id` payload back-reference on `event_started`/`event_completed`/`event_cancelled` + new `event_status_reverted` event kind + projector handler in `chat/state/events.py` + regenerate flow emits revert events for affected lifecycle transitions. +- **Wave 9 — final polish + integration (parallel)**: + - **T115** sqlite-vec swap — **DEFERRED to Phase 5**. Pre-flight failed: host Python build doesn't expose `sqlite3.Connection.enable_load_extension` (raises `AttributeError`). Requires either Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Phase 4 pure-Python cosine remains in production. + - **T116** structured `CannedQueue` test fixture builder + 2–3 POC test migrations (Phase 5 to migrate the rest). + - **T117** Phase 4.5 cross-feature integration tests (5 minimum: real embedding swap, branching read-side filter, lifecycle rollback, search deep-link, bulk significance re-rate). + - **T118** documentation (this section). -- **`list_branches(chat_id=...)` filter leaks global branches** (`chat_id IS NULL`) into every chat scope. Intentional? Document. -- **Branch-switch to nonexistent silently leaves zero active branches** — log a warning when this would happen. +### Phase 5 backlog -#### From T91 review +New follow-ups discovered during Phase 4.5 reviews and execution, plus carry-over deferrals. None are blocking; pick up at any time. -- **Real embedding model swap**: Phase 4 ships pseudo-embedding (deterministic SHA-256 hash). Phase 4.5+ should swap to a real model (Featherless `bge-small-en-v1.5` if available; or local `sentence-transformers/all-MiniLM-L6-v2`). The 384-dim is hardcoded in `0012_embeddings.sql`; if dim changes, migrate first. -- **`timeout_s` unused on pseudo path** — fine, but log when non-default model falls through to fallback so misconfigured callers don't silently degrade. - -#### From T96 review - -- **Duplicate `MAX(id)` lookup** between `_composite_rerank` and the fused-path tail — DRY follow-up. -- **`fts_rank=None` for vector-only rows** — document downstream contract. - -#### From T98 review - -- **`event_id <= 0` guard in `delete_turn`** — currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: 400`. -- **`html.escape()` on `compute_delete_impact` output rendered into the modal** — defense in depth (currently model-controlled strings, but if event payload fields ever appear in descriptions, autoescape needed). -- **Extract delete-impact modal HTML to a Jinja partial** — testability + autoescape inheritance. - -#### From T99 review - -- **Hoist `datetime`/`timezone` imports to module level** in `chat/web/snapshots.py`. -- **`kind` defaulting in restore/preview** — reject missing `kind` rather than silent 404. -- **`created_at` from file mtime** vs filename-encoded timestamp — small drift if files copied; document. - -#### From T100 review - -- **Hardcoded `k=50`** — extract to module constant. -- **N+1 lookups (`get_bot`/`get_chat`/`get_scene` per row)** — fine at `k=50`, revisit if `k` grows. -- **FTS highlighting via `snippet()`** — Phase 4 skipped this; UX nice-to-have. -- **Result links chat-level only** — `memories` table has no `event_id` column; deep-linking to specific turn requires schema addition. - -#### Deferred items - -- **sqlite-vec swap** when host Python supports `enable_load_extension`. -- **Real embedding model** with proper semantic similarity. -- **Branching read-side filter**: T89 ships data-model + UI but event readers don't yet consult `is_active`. Each branch is metadata-only labeled ranges. Consult-on-read is Phase 4.5+ work. -- **Bulk significance re-rate** in drawer (T98.2 deferred — only per-memory edit shipped). -- **Vector index optimization** (HNSW) — only relevant if memory counts grow past pure-Python feasibility. -- **`scene-close-on-cancel` UX revisit** (Phase 2.5 carry-over). -- **Cross-feature canned-queue brittleness fixture builder** (Phase 3 carry-over). -- **Full lifecycle-rollback in regenerate** — Phase 3.5 T83.4 shipped a warning log; proper rollback needs schema-level back-references (`triggered_by_assistant_turn_id` payload field). +- **T115 sqlite-vec swap** (environmental blocker): host Python's `sqlite3.Connection` does not expose `enable_load_extension` — `python -c "import sqlite3; sqlite3.connect(':memory:').enable_load_extension(True)"` raises `AttributeError`. Fix requires either a Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Pure-Python cosine remains in production until then. +- **T108 follow-up: cancel-path commit bug** — `post_turn`'s re-raised `CancelledError` causes `open_db` dependency teardown to skip `conn.commit()`, rolling back all post-cancel writes. The existing T74.3 regression test passes only because `asyncio` isn't imported in the test module (NameError masks the real cancel path). Triage required — either commit before re-raise, or restructure the route to never re-raise after the close-detection branch. +- **`embeddings` FK CASCADE on `memory_id`** — deferred from T109; do as part of a broader migration consolidation in Phase 5. +- **`CannedQueue` fixture migration** — T116 shipped the builder + POC migrations; remaining tests still use positional canned arrays. Migrate in Phase 5. +- **Vector index optimization (HNSW)** — currently scales to a few thousand memories on the flat-index pure-Python cosine path; revisit when counts grow past flat-index feasibility. +- **Branch-isolated `event_log`** — each branch has its own physical `event_log` range vs the current shared id space + head filter; full branch isolation is Phase 5+. +- **Embedding model swap migration tooling** — T112 added `--re-embed-all`; a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+. +- **Real-time collaborative branching** (multi-user) — out of scope for v1. +- **Avatars / portraits** (multimodality) — deferred indefinitely per design §14. diff --git a/chat/app.py b/chat/app.py index 80b0553..7241cd0 100644 --- a/chat/app.py +++ b/chat/app.py @@ -94,9 +94,15 @@ async def lifespan(app: FastAPI): # Phase 4's pseudo-embedding path is local so the worker doesn't need # an LLM client; we still pass one so the Phase 4.5 swap to a real # model is a one-line change. + # T112 (Phase 4.5): the embedding model is now configurable via + # ``Settings.embedding_model``. Default ``"pseudo-sha256-384"`` + # keeps the local-only path; swapping to a real model routes + # through ``client.embed(...)`` and falls back to a zero vector + # plus warning if the provider doesn't support embeddings. embedding_worker = EmbeddingWorker( conn_factory=lambda: open_db(settings.db_path), client=_factory(), + model=settings.embedding_model, ) await embedding_worker.start() app.state.embedding_worker = embedding_worker diff --git a/chat/config.py b/chat/config.py index 8eb19b6..d10dea4 100644 --- a/chat/config.py +++ b/chat/config.py @@ -39,6 +39,14 @@ class Settings(BaseModel): data_dir: Path = REPO_ROOT / "data" bind_host: str = "127.0.0.1" bind_port: int = 8000 + # T112 (Phase 4.5): embedding model identifier. Default is the + # deterministic local pseudo (semantically meaningless but keeps the + # vector pipeline structurally valid). Swap to a real model name + # (e.g. "bge-small-en-v1.5") once the LLMClient implementation + # supports embed() — currently FeatherlessClient does NOT, so a + # non-default value will trigger the zero-vector fallback path + # plus a T107 warning until a different provider is wired in. + embedding_model: str = "pseudo-sha256-384" def load_settings() -> Settings: config_path = Path(os.environ.get("CHAT_CONFIG_PATH", DEFAULT_CONFIG)) diff --git a/chat/db/migrations/0014_phase45_schema.sql b/chat/db/migrations/0014_phase45_schema.sql new file mode 100644 index 0000000..0d7d491 --- /dev/null +++ b/chat/db/migrations/0014_phase45_schema.sql @@ -0,0 +1,25 @@ +-- 0014_phase45_schema.sql — Phase 4.5 Wave 2 schema bump (T109). +-- +-- Two schema concerns are bundled into this migration: +-- +-- 1. ``embeddings.memory_id`` FK should ideally carry ``ON DELETE CASCADE`` +-- (T88 review nit). DEFERRED to Phase 5: ``embeddings`` rows are only ever +-- deleted when the parent ``memories`` row is deleted, and ``memories`` +-- rows are never deleted today (memory hide is a soft flag; the surgical +-- ``deindex_event`` path operates on ``event_log`` and does NOT cascade +-- to projection rows). The CASCADE constraint therefore can't fire under +-- current usage — adding the SQLite table-rebuild dance (rename, recreate, +-- copy, drop, reindex) for a defensive constraint is unwarranted bloat +-- in a polish wave. Revisit during the broader Phase 5 migration cleanup +-- when other table reshapes make the rebuild worthwhile. +-- +-- 2. Add ``memories.event_id`` (NULLABLE INTEGER, references ``event_log.id``) +-- so cross-chat search results can deep-link back to the originating +-- turn (foundation for T111). The column is nullable so historical +-- memory rows projected before 0014 ran continue to round-trip cleanly; +-- new rows are populated by the ``memory_written`` projector handler +-- from the projecting event's id. This is a pure additive change — no +-- backfill is performed. Older rows simply read NULL until/unless a +-- later migration backfills them; T111 surfaces are coded to accept +-- NULL gracefully (no deep-link rendered). +ALTER TABLE memories ADD COLUMN event_id INTEGER REFERENCES event_log(id); diff --git a/chat/llm/client.py b/chat/llm/client.py index ca34a2d..5c079e1 100644 --- a/chat/llm/client.py +++ b/chat/llm/client.py @@ -12,3 +12,11 @@ class Message: class LLMClient(Protocol): async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str: ... def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]: ... + # T112 (Phase 4.5): real-embedding seam. Implementations either call a + # provider's ``/v1/embeddings`` endpoint or, when the provider doesn't + # expose embeddings (e.g. Featherless today), raise ``NotImplementedError`` + # so ``generate_embedding`` can catch it and degrade to the zero-vector + # fallback. The Protocol is structural, so this method only needs to + # exist on implementations; existing callers that don't use it are + # unaffected. + async def embed(self, text: str, *, model: str) -> list[float]: ... diff --git a/chat/llm/featherless.py b/chat/llm/featherless.py index cf1138b..2eff3de 100644 --- a/chat/llm/featherless.py +++ b/chat/llm/featherless.py @@ -53,3 +53,26 @@ class FeatherlessClient: delta = chunk.choices[0].delta.content or "" if delta: yield delta + + async def embed(self, text: str, *, model: str) -> list[float]: + """Embeddings via Featherless — currently unsupported. + + T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()`` + for a future real-embedding swap. Featherless's OpenAI-compatible + surface does NOT expose ``/v1/embeddings`` at the time of writing, + so this implementation raises ``NotImplementedError`` rather than + attempting a request that would 404. The + :func:`chat.services.embeddings.generate_embedding` wrapper + catches this and degrades to the existing zero-vector fallback + (with the T107 warning), so misconfigured callers fail loudly in + logs but the request path keeps working. + + If Featherless ships embeddings, swap the body for an + ``self._client.embeddings.create(model=..., input=...)`` call + guarded by ``self._sem()`` (mirrors ``generate``/``stream``). + """ + raise NotImplementedError( + "Featherless does not expose /v1/embeddings; " + "configure a different embedding provider or stick with " + "the default pseudo-sha256-384 model." + ) diff --git a/chat/llm/mock.py b/chat/llm/mock.py index 75ab786..5afc1ef 100644 --- a/chat/llm/mock.py +++ b/chat/llm/mock.py @@ -4,8 +4,23 @@ from .client import Message class MockLLMClient: - def __init__(self, canned: list[str]): + """In-memory LLMClient for tests. + + ``canned`` feeds ``generate``/``stream`` (one entry per call, popped + from the front). ``canned_embeddings`` (T112, Phase 4.5) feeds + ``embed`` the same way — each call pops the next vector. An empty + queue raises ``IndexError`` so misconfigured tests fail loudly + rather than returning ``None`` or hanging. + """ + + def __init__( + self, + canned: list[str], + *, + canned_embeddings: list[list[float]] | None = None, + ): self._canned = list(canned) + self._canned_embeddings: list[list[float]] = list(canned_embeddings or []) async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str: return self._canned.pop(0) @@ -14,3 +29,8 @@ class MockLLMClient: text = self._canned.pop(0) for ch in text: yield ch + + async def embed(self, text: str, *, model: str) -> list[float]: + # Mirrors the canned-queue pattern; empty queue raises so + # misconfigured tests surface clearly instead of returning None. + return self._canned_embeddings.pop(0) diff --git a/chat/services/cross_chat_search.py b/chat/services/cross_chat_search.py index cb0403f..d582610 100644 --- a/chat/services/cross_chat_search.py +++ b/chat/services/cross_chat_search.py @@ -26,13 +26,28 @@ def search_all_memories( """Search FTS5 across all owners and chats. Returns rows with ``{memory_id, owner_id, chat_id, scene_id, - pov_summary, significance, ts, fts_rank}``, sorted by FTS5 BM25 - rank ascending (lower rank = stronger match, surfaced first). + event_id, pov_summary, snippet, significance, ts, fts_rank}``, + sorted by FTS5 BM25 rank ascending (lower rank = stronger match, + surfaced first). + + ``event_id`` (T111.2 / T109) is the id of the ``event_log`` row that + drove the projecting ``memory_written`` event. May be ``None`` for + memory rows projected before the 0014 schema migration ran (the + column is nullable on purpose; T109 did not backfill historical + rows). The search-results UI uses it to deep-link to the originating + turn anchor (Phase 3.5 T86 stamps ``id="turn-{event_id}"`` on each + turn DOM node) and falls back to a chat-level link when ``None``. The ``memories`` table has no ``ts`` column; we expose ``created_at`` (the projector-side row insertion timestamp) under that key so the UI does not have to know the storage name. + ``snippet`` (T111.1) is the FTS5 ``snippet()`` output for the + matched ``pov_summary`` column: a windowed excerpt with each match + token wrapped in ``...`` for the search-results UI to + render verbatim. The full ``pov_summary`` is also returned so + non-highlighted callers (or fallbacks) keep the original string. + An empty / whitespace-only ``query`` short-circuits to ``[]`` to avoid an FTS5 ``MATCH ''`` syntax error and to keep the top-bar "no input yet" state from triggering a full-table scan. @@ -45,9 +60,20 @@ def search_all_memories( # from the content table because the FTS index only stores # ``pov_summary``. ORDER BY rank ASC because BM25 in FTS5 returns # negative scores where lower is better. + # + # ``snippet(memories_fts, 0, ...)`` (T111.1) targets column 0 of the + # FTS virtual table, which is ``pov_summary`` (the only column + # indexed by ``CREATE VIRTUAL TABLE memories_fts USING fts5( + # pov_summary, ...)`` in migration 0006). SQLite passes the raw + # column text through verbatim aside from inserting the configured + # before/after match markers, so the only HTML in the output is the + # ```` we injected — safe to render with ``|safe`` server-side. rows = conn.execute( - "SELECT m.id, m.owner_id, m.chat_id, m.scene_id, " - " m.pov_summary, m.significance, m.created_at, " + "SELECT m.id, m.owner_id, m.chat_id, m.scene_id, m.event_id, " + " m.pov_summary, " + " snippet(memories_fts, 0, '', '', '…', 32) " + " AS snippet, " + " m.significance, m.created_at, " " memories_fts.rank " "FROM memories_fts " "JOIN memories m ON m.id = memories_fts.rowid " @@ -63,10 +89,12 @@ def search_all_memories( "owner_id": r[1], "chat_id": r[2], "scene_id": r[3], - "pov_summary": r[4], - "significance": r[5], - "ts": r[6], - "fts_rank": r[7], + "event_id": r[4], + "pov_summary": r[5], + "snippet": r[6], + "significance": r[7], + "ts": r[8], + "fts_rank": r[9], } for r in rows ] diff --git a/chat/services/embeddings.py b/chat/services/embeddings.py index ece6eae..e38fde4 100644 --- a/chat/services/embeddings.py +++ b/chat/services/embeddings.py @@ -10,6 +10,7 @@ EmbeddingResult shape stays the same, only the generator changes. from __future__ import annotations import hashlib +import logging import math import struct @@ -18,6 +19,8 @@ from pydantic import BaseModel from chat.llm.client import LLMClient +_log = logging.getLogger(__name__) + DEFAULT_EMBEDDING_DIM = 384 DEFAULT_EMBEDDING_MODEL = "pseudo-sha256-384" FALLBACK_EMBEDDING_MODEL = "fallback" @@ -92,11 +95,27 @@ async def generate_embedding( # Pure-local pseudo path — no LLMClient call. return EmbeddingResult(vector=_pseudo_embed(text, dim), model=model, dim=dim) - # Future: real embedding via client.embed(...). Phase 4.5 work. - # For Phase 4, any non-default model falls through to fallback. - return EmbeddingResult( - vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim - ) + # T112 (Phase 4.5): non-default model — route through the client's + # ``embed()`` method. On any failure (including ``NotImplementedError`` + # from providers that don't expose embeddings, e.g. Featherless today), + # fall back to the zero vector and re-fire the T107 warning so + # misconfigured callers see the issue in logs rather than silently + # producing useless cosine results. + try: + vector = await client.embed(text, model=model) + return EmbeddingResult(vector=list(vector), model=model, dim=len(vector)) + except Exception as exc: # noqa: BLE001 — any failure must degrade gracefully + _log.warning( + "generate_embedding: non-default model %r returned fallback " + "(client.embed() raised %s: %s); " + "downstream search will degrade silently. Configure a supported model.", + model, + type(exc).__name__, + exc, + ) + return EmbeddingResult( + vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim + ) __all__ = [ diff --git a/chat/services/regenerate.py b/chat/services/regenerate.py index 6442bb2..de88049 100644 --- a/chat/services/regenerate.py +++ b/chat/services/regenerate.py @@ -95,6 +95,27 @@ from chat.web.render import render_turn_html _log = logging.getLogger(__name__) +# T114.3: map a lifecycle-transition event kind to the events-table +# status it implicitly transitioned *from*. Regenerate uses this to pick +# the ``prior_status`` value for the ``event_status_reverted`` rollback +# event so the projector sets the row back to where it was before the +# superseded turn fired the transition. +# +# - ``event_started`` was emitted when the row was 'planned' → revert to +# 'planned'. +# - ``event_completed`` was emitted when the row was 'active' → revert +# to 'active'. +# - ``event_cancelled`` could have fired from either 'planned' or +# 'active'. Best-effort default: 'active'. The forward transitions +# below only fire detect_event_transitions for currently-active rows, +# so 'active' is the realistic prior in practice. +_PRIOR_STATUS_MAP: dict[str, str] = { + "event_started": "planned", + "event_completed": "active", + "event_cancelled": "active", +} + + async def regenerate_assistant_turn( conn: Connection, client, @@ -115,17 +136,18 @@ async def regenerate_assistant_turn( cannot be found — the FastAPI route translates this to 404. .. note:: - **Lifecycle-rollback limitation (T83.4, Phase 4 follow-up).** + **Lifecycle rollback (T114, Phase 4.5).** When the superseded turn already produced lifecycle transitions (``event_started`` / ``event_completed`` / ``event_cancelled``), - this function does NOT roll those rows back before re-running - ``detect_event_transitions`` against the regenerated text. A - regenerate-after-completion can therefore double-emit promotion - artifacts if the new text re-completes the same event. Phase 3.5 - only documents the gap and emits a WARNING log naming the - affected event_log ids; the actual undo pass is invasive - (re-projection / inverse-handler dispatch) and is deferred to - Phase 4. See the ``# T83.4`` block below for the warning emit. + this function emits an ``event_status_reverted`` event for each + so the events row's status returns to its prior value before the + regenerated narrative is reclassified. Backward compatibility: + lifecycle events authored before T114.1 lack the + ``triggered_by_assistant_turn_id`` payload field; rollback skips + those (logged at DEBUG) so historic rows are not retroactively + reverted. A WARNING about un-rolled-back transitions is still + emitted when stragglers are found — the rollback handles the + common case while older logs continue to need manual review. """ chat = get_chat(conn, chat_id) if chat is None: @@ -158,20 +180,21 @@ async def regenerate_assistant_turn( original_assistant_payload = json.loads(row[0]) original_user_turn_id = original_assistant_payload.get("user_turn_id") - # T83.4: scan for downstream lifecycle transitions emitted by the - # superseded turn — they're not being rolled back (see method - # docstring). Heuristic: any ``event_started`` / ``event_completed`` - # / ``event_cancelled`` event_log row with id strictly greater than - # the original assistant_turn's id was emitted as part of (or after) - # that turn's processing. Lifecycle events don't carry ``chat_id`` - # in their payload (their payload references an ``event_id`` FK to - # the ``events`` table, which holds chat_id), so we join through - # ``events`` to scope to this chat. - # - # A WARNING log surfaces the affected event ids so operators can - # spot double-emit cases until the Phase 4 rollback pass lands. + # T114.3: roll back lifecycle transitions emitted by the superseded + # turn. The scan uses the same id-greater-than-superseded-turn + # heuristic as the legacy T83.4 warning, joined to ``events`` for + # chat scoping (lifecycle events don't carry chat_id in their + # payload — they reference an ``event_id`` FK to the ``events`` + # table, which holds chat_id). For each row whose payload carries + # ``triggered_by_assistant_turn_id == original_assistant_event_id`` + # (T114.1 back-reference), emit an ``event_status_reverted`` event + # so the events-row status returns to the pre-transition value. + # Lifecycle rows authored before T114.1 lack the back-reference; + # those are skipped (DEBUG log) and a WARNING tracks their count so + # operators still see legacy stragglers — preserves the T83.4 + # observability contract for un-rolled-back transitions. unrolled_lifecycle = conn.execute( - "SELECT el.id, el.kind FROM event_log AS el " + "SELECT el.id, el.kind, el.payload_json FROM event_log AS el " "JOIN events AS ev " " ON ev.event_id = json_extract(el.payload_json, '$.event_id') " "WHERE el.kind IN (" @@ -182,18 +205,73 @@ async def regenerate_assistant_turn( "ORDER BY el.id ASC", (chat_id, original_assistant_event_id), ).fetchall() - if unrolled_lifecycle: - # T90.2: phrased as "at-or-after turn " rather than "from - # superseded turn" because regenerating an OLDER turn lists - # intervening-turn transitions that legitimately stand on their - # own — those weren't authored by the superseded turn itself. + rolled_back_ids: list[int] = [] + skipped_no_backref: list[int] = [] + for el_id, el_kind, el_payload_json in unrolled_lifecycle: + try: + lifecycle_payload = json.loads(el_payload_json) + except (TypeError, ValueError): + skipped_no_backref.append(el_id) + continue + triggered_by = lifecycle_payload.get("triggered_by_assistant_turn_id") + if triggered_by != original_assistant_event_id: + # Either a legacy row (no field) or a transition triggered + # by a *different* turn — leave it alone. DEBUG so the + # message is available under verbose logging without + # spamming the default WARNING channel. + _log.debug( + "regenerate_assistant_turn: skipping rollback for " + "lifecycle event_log id=%d (kind=%s) — no back-reference " + "or different turn (triggered_by=%r vs superseded=%d)", + el_id, + el_kind, + triggered_by, + original_assistant_event_id, + ) + if triggered_by is None: + skipped_no_backref.append(el_id) + continue + prior_status = _PRIOR_STATUS_MAP.get(el_kind) + if prior_status is None: + # Defensive: the SQL filter already restricts to the three + # known kinds, but a future schema addition shouldn't crash + # the rollback path. + continue + target_event_id = lifecycle_payload.get("event_id") + if target_event_id is None: + continue + append_and_apply( + conn, + kind="event_status_reverted", + payload={ + "event_id": target_event_id, + "prior_status": prior_status, + }, + ) + rolled_back_ids.append(el_id) + if rolled_back_ids: + _log.info( + "regenerate_assistant_turn: rolled back %d lifecycle " + "transition(s) triggered by superseded turn %s " + "(event_log ids: %s)", + len(rolled_back_ids), + original_assistant_event_id, + rolled_back_ids, + ) + if skipped_no_backref: + # T83.4 (legacy) compatibility: still warn about stragglers + # without the back-reference so operators can spot pre-T114 + # double-emit risks. Phrased as "at-or-after turn " per + # T90.2 — older transitions may legitimately belong to other + # turns. _log.warning( "regenerate_assistant_turn: %d lifecycle transition(s) " - "at-or-after turn %s are NOT being rolled back (Phase 4 " - "follow-up). Affected event ids: %s", - len(unrolled_lifecycle), + "at-or-after turn %s are NOT being rolled back (no " + "triggered_by_assistant_turn_id back-reference). " + "Affected event ids: %s", + len(skipped_no_backref), original_assistant_event_id, - [r[0] for r in unrolled_lifecycle], + skipped_no_backref, ) # 1a. Look up any sibling interjection beat in the same turn group @@ -716,11 +794,13 @@ async def regenerate_assistant_turn( # runs inline after a completion so promotion artifacts land in the # same regenerate path. # - # T83.4 follow-up: when a regenerate replaces a turn that had - # already produced event transitions, those original transitions - # are NOT undone here (Phase 4 work). A WARNING log earlier in this - # function names the affected event_log ids — see the T83.4 block - # near the function entry. + # T114.3: original-turn transitions emitted before this regenerate + # ran were rolled back at the top of the function (see the + # ``# T114.3`` block) by appending ``event_status_reverted`` for + # each. The classify-and-emit pass below now operates against an + # ``events`` projection that has already been reverted, so it can + # safely re-fire transitions for the regenerated narrative without + # double-emitting promotion artifacts. new_active_events = list_active_events(conn, chat_id) if new_active_events: lifecycle_decision = await detect_event_transitions( @@ -738,6 +818,12 @@ async def regenerate_assistant_turn( payload={ "event_id": transition.event_id, "started_at": chat.get("time"), + # T114.1: back-reference to the assistant_turn + # that triggered this transition (see turns.py + # for rationale). + "triggered_by_assistant_turn_id": ( + new_assistant_event_id + ), }, ) elif transition.new_status == "completed": @@ -747,6 +833,10 @@ async def regenerate_assistant_turn( payload={ "event_id": transition.event_id, "completed_at": chat.get("time"), + # T114.1: back-reference (see above). + "triggered_by_assistant_turn_id": ( + new_assistant_event_id + ), }, ) promote_completed_event( @@ -762,6 +852,10 @@ async def regenerate_assistant_turn( payload={ "event_id": transition.event_id, "completed_at": chat.get("time"), + # T114.1: back-reference (see above). + "triggered_by_assistant_turn_id": ( + new_assistant_event_id + ), }, ) diff --git a/chat/services/scene_summarize.py b/chat/services/scene_summarize.py index 7551f8b..f6b6aa1 100644 --- a/chat/services/scene_summarize.py +++ b/chat/services/scene_summarize.py @@ -144,23 +144,36 @@ def _read_recent_dialogue( ``id >= since_event_id`` so callers needing a scene-scoped view (e.g. thread detection on close) don't pull turns that landed before the closing scene's ``scene_opened`` event. + + T113: also clamps by the active branch's ``[origin, head]`` event-id + range so scene-summary inputs respect the user's current branch. + Bootstrap-main and "no active branch" fall through to ``(0, BIG_INT)`` + so existing flows are unchanged. """ + from chat.state.branches import active_branch_event_ids + + origin, head = active_branch_event_ids(conn) if since_event_id is None: cur = conn.execute( "SELECT kind, payload_json FROM event_log " "WHERE kind IN ('user_turn', 'assistant_turn') " " AND superseded_by IS NULL AND hidden = 0 " + " AND id BETWEEN ? AND ? " "ORDER BY id DESC LIMIT ?", - (limit,), + (origin, head, limit), ) else: + # Compose ``since_event_id`` with the branch lower bound — readers + # want the tightest ``id >= max(since, origin)`` clamp without an + # extra Python pass. + lower = max(origin, since_event_id) cur = conn.execute( "SELECT kind, payload_json FROM event_log " "WHERE kind IN ('user_turn', 'assistant_turn') " " AND superseded_by IS NULL AND hidden = 0 " - " AND id >= ? " + " AND id BETWEEN ? AND ? " "ORDER BY id DESC LIMIT ?", - (since_event_id, limit), + (lower, head, limit), ) rows = list(reversed(cur.fetchall())) out: list[dict] = [] diff --git a/chat/services/turn_common.py b/chat/services/turn_common.py index 3c63420..91ecd22 100644 --- a/chat/services/turn_common.py +++ b/chat/services/turn_common.py @@ -30,6 +30,7 @@ from __future__ import annotations import json from sqlite3 import Connection +from chat.state.branches import active_branch_event_ids from chat.state.edges import get_edge @@ -60,15 +61,22 @@ def read_recent_dialogue( previous implementation filtered chat_id post-fetch in Python, which let foreign-chat rows fill the LIMIT and yield fewer than N relevant rows in busy multi-chat databases. + + T113: clamp by the active branch's ``[origin, head]`` event-id range so + switching branches actually changes what dialogue this read sees. + Bootstrap-main and "no active branch" both fall through to ``(0, + BIG_INT)`` — no functional change for the metadata-only Phase 4 era. """ + origin, head = active_branch_event_ids(conn) if exclude_event_id is None: cur = conn.execute( "SELECT id, kind, payload_json FROM event_log " "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') " " AND superseded_by IS NULL AND hidden = 0 " + " AND id BETWEEN ? AND ? " " AND json_extract(payload_json, '$.chat_id') = ? " "ORDER BY id DESC LIMIT ?", - (chat_id, limit), + (origin, head, chat_id, limit), ) else: cur = conn.execute( @@ -76,9 +84,10 @@ def read_recent_dialogue( "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') " " AND id != ? " " AND superseded_by IS NULL AND hidden = 0 " + " AND id BETWEEN ? AND ? " " AND json_extract(payload_json, '$.chat_id') = ? " "ORDER BY id DESC LIMIT ?", - (exclude_event_id, chat_id, limit), + (exclude_event_id, origin, head, chat_id, limit), ) rows = list(reversed(cur.fetchall())) out: list[dict] = [] diff --git a/chat/state/branches.py b/chat/state/branches.py index 101627e..4681fed 100644 --- a/chat/state/branches.py +++ b/chat/state/branches.py @@ -9,11 +9,15 @@ existing event readers remain branch-agnostic. """ from __future__ import annotations + +import logging from sqlite3 import Connection from chat.eventlog.projector import on from chat.eventlog.log import Event +logger = logging.getLogger(__name__) + @on("branch_created") def _apply_branch_created(conn: Connection, e: Event) -> None: @@ -37,9 +41,26 @@ def _apply_branch_switched(conn: Connection, e: Event) -> None: """Set is_active=1 on the named branch and is_active=0 on all others. Atomic via two UPDATEs ordered to avoid the unique-active-index race. + + If the named branch does not exist, a warning is emitted and the + is_active flags are still cleared (preserving prior behavior — the + second UPDATE simply matches no rows). Callers should validate the + name upstream; this guard surfaces accidental mismatches in the log. """ p = e.payload name = p["name"] + # Warn (don't raise) if the target branch is missing. The existing + # outcome — zero active branches — is preserved; this just makes the + # condition observable instead of silent. + exists = conn.execute( + "SELECT 1 FROM branches WHERE name = ? LIMIT 1", + (name,), + ).fetchone() + if exists is None: + logger.warning( + "branch_switched to unknown branch name %r; no branch will be active", + name, + ) # Clear ALL is_active flags first (avoids the unique-index trip). conn.execute("UPDATE branches SET is_active = 0 WHERE is_active = 1") conn.execute( @@ -79,6 +100,16 @@ def get_branch(conn: Connection, name: str) -> dict | None: def list_branches(conn: Connection, chat_id: str | None = None) -> list[dict]: + """Return branch rows, optionally scoped to a chat. + + When ``chat_id`` is provided the filter is ``chat_id = ? OR chat_id IS NULL``, + so global (null-chat) branches are returned in *every* per-chat scope. This + is intentional: the bootstrapped ``"main"`` branch (and any future + null-chat branches) are global by design — they belong to no single chat + and should appear alongside per-chat branches in any chat-scoped listing. + Callers that want only per-chat branches should filter the result on + ``chat_id is not None``. + """ if chat_id is None: rows = conn.execute( "SELECT id, name, origin_event_id, head_event_id, chat_id, " @@ -126,8 +157,58 @@ def active_branch(conn: Connection) -> dict | None: } +# T113: sentinel "no upper bound" used by ``active_branch_event_ids`` when the +# active branch's head is unset (the bootstrap "main" branch with origin=0 + +# head=0). Readers compose ``id BETWEEN origin AND head`` so a value larger +# than any possible row id behaves as "no clamp" without needing a separate +# code path. ``2**63 - 1`` is SQLite's max signed-int — safe forever. +_NO_HEAD_CLAMP = 2**63 - 1 + + +def active_branch_event_ids(conn: Connection) -> tuple[int, int]: + """Return ``(origin_event_id, head_event_id)`` for the currently active + branch, suitable as bounds for an ``event_log.id BETWEEN ? AND ?`` clamp + on user-facing reads (T113). + + Defensive defaults: + + * **No active branch row** (``active_branch`` returns ``None``) — return + ``(0, _NO_HEAD_CLAMP)`` so readers see all events. This preserves the + Phase 4 "branches are metadata-only" contract for any code path that + somehow runs without the migration-0013 bootstrap. + * **Bootstrap "main"** — the canonical ``name="main", origin=0, head=0`` + row inserted by migration 0013. Production today never emits + ``branch_head_updated`` for main, so head stays at 0 even as events + accumulate. We treat this exact bootstrap state as "no clamp" and + return ``(0, _NO_HEAD_CLAMP)`` so all events remain visible. This is + what every existing test (which never configures branches) relies on. + * **Any other branch** — return the literal ``(origin, head)`` from the + branch row. A branch created at origin=N has head=N initially (per + ``branch_from_event``), so ``BETWEEN N AND N`` returns just that one + seed event until the head is bumped via ``branch_head_updated``. + + Note on the schema mismatch with the T113 spec: the spec describes + ``head_event_id`` as nullable, but migration 0013 declared it + ``NOT NULL DEFAULT 0``. We read head=0 on bootstrap main as the + "unset" sentinel; non-main branches never reach head=0 in normal + flow (creation sets head=origin, and origin=0 only for main). + """ + branch = active_branch(conn) + if branch is None: + return (0, _NO_HEAD_CLAMP) + origin = int(branch.get("origin_event_id") or 0) + head = int(branch.get("head_event_id") or 0) + # Bootstrap "main" sentinel — see docstring above. Detect by name + + # both ids being 0 to avoid mis-firing on a hypothetical future + # branch that legitimately starts at origin=0. + if branch.get("name") == "main" and origin == 0 and head == 0: + return (0, _NO_HEAD_CLAMP) + return (origin, head) + + __all__ = [ "get_branch", "list_branches", "active_branch", + "active_branch_event_ids", ] diff --git a/chat/state/events.py b/chat/state/events.py index b2f0b1c..13b2424 100644 --- a/chat/state/events.py +++ b/chat/state/events.py @@ -67,6 +67,29 @@ def _apply_event_expired(conn: Connection, e: Event) -> None: ) +@on("event_status_reverted") +def _apply_event_status_reverted(conn: Connection, e: Event) -> None: + """T114.2: Revert an event row's status to ``prior_status``. + + Emitted by ``regenerate_assistant_turn`` when a superseded turn had + triggered a lifecycle transition (event_started / event_completed / + event_cancelled). The rollback step needs an inverse projection that + sets the row's status back to whatever it was *before* the now- + superseded transition fired. + + Unlike the forward transitions (which guard against terminal-status + overwrites) this handler is unconditional — the entire purpose is to + reverse a transition, including reverting from a terminal status + (completed/cancelled) back to a non-terminal one. + """ + p = e.payload + conn.execute( + "UPDATE events SET status = ?, updated_at = datetime('now') " + "WHERE event_id = ?", + (p["prior_status"], p["event_id"]), + ) + + def get_event(conn: Connection, event_id: str) -> dict | None: row = conn.execute( "SELECT event_id, chat_id, kind, status, props_json, planned_for, " diff --git a/chat/state/memory.py b/chat/state/memory.py index 42a7e95..0c2ab9d 100644 --- a/chat/state/memory.py +++ b/chat/state/memory.py @@ -13,13 +13,18 @@ def _row_to_dict(conn: Connection, row: tuple) -> dict: @on("memory_written") def _apply_memory_written(conn: Connection, e: Event) -> None: + # T109 (schema 0014): persist the projecting event's id on the memory + # row so cross-chat search results can deep-link back to the + # originating turn (T111). Older memory rows projected before 0014 + # ran read NULL here — the column is nullable for that reason. p = e.payload conn.execute( "INSERT INTO memories (" "owner_id, chat_id, scene_id, pov_summary, " "witness_you, witness_host, witness_guest, " - "chat_clock_at, source, reliability, significance, pinned, auto_pinned" - ") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", + "chat_clock_at, source, reliability, significance, pinned, auto_pinned, " + "event_id" + ") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", ( p["owner_id"], p["chat_id"], @@ -34,6 +39,7 @@ def _apply_memory_written(conn: Connection, e: Event) -> None: int(p.get("significance", 1)), int(p.get("pinned", 0)), int(p.get("auto_pinned", 0)), + e.id, ), ) @@ -112,6 +118,25 @@ SIGNIFICANCE_RANK_BIAS = 0.5 RRF_CONST = 60 +def _max_event_id(conn: Connection, owner_id: str) -> int: + """Return the largest ``memories.id`` for ``owner_id`` (1 if none exist). + + Used as the recency-boost denominator by both ``_composite_rerank`` and + ``_rrf_fuse_and_rerank`` (T104). The row id is a monotonic recency proxy + — newer memories have larger ids — so dividing by the per-owner max keeps + the boost in [0, 1] regardless of how many memories the owner has. + + Returns 1 (not 0) when the owner has no rows so callers can divide by + the result without a guard. The "no memories" case never actually hits + this helper because the FTS query above would have returned no rows, + but the safe default keeps the helper trivially reusable. + """ + row = conn.execute( + "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,) + ).fetchone() + return row[0] if row and row[0] else 1 + + def search_memories( conn: Connection, owner_id: str, @@ -163,6 +188,14 @@ def search_memories( When ``query_vector`` is None: FTS-only behaviour unchanged — all Phase 1-3.5 callers see the same row shape and ordering as before. + + **Row-shape contract (T104):** every returned dict carries an + ``fts_rank`` key. For FTS hits this is the BM25 score (a negative float, + lower-is-better). For *vector-only* hits surfaced by the fused path — + rows that matched the query embedding but did NOT match FTS — the + ``fts_rank`` value is ``None``. Downstream consumers must accept + ``None`` here; do not assume ``fts_rank`` is always numeric. The + ``composite_score`` is always a float on every returned row. """ if witness_role not in _VALID_WITNESS_ROLES: raise ValueError( @@ -180,12 +213,20 @@ def search_memories( # channel) so memories that are weak in FTS but strong in vector — and # vice versa — make it into the merge pool. over_fetch = max(k * 2, 20) if query_vector is not None else max(k * 4, 20) + # T113: branch-scope filter on ``m.event_id`` (T109's column). Memories + # whose ``event_id`` is NULL — projected before the 0014 schema migration + # ran — are *included* unconditionally so the branch filter never breaks + # legacy retrieval. Newer rows respect the active branch's bounds. + from chat.state.branches import active_branch_event_ids + + origin, head = active_branch_event_ids(conn) sql = ( f"SELECT {select_list}, memories_fts.rank AS fts_rank " "FROM memories_fts " "JOIN memories m ON m.id = memories_fts.rowid " f"WHERE m.owner_id = ? AND m.{witness_col} = 1 " "AND memories_fts MATCH ? " + "AND (m.event_id IS NULL OR m.event_id BETWEEN ? AND ?) " # T57: significance multiplier biases the FTS over-fetch order. BM25 # ``rank`` is lower-is-better, so subtracting ``significance * BIAS`` # surfaces higher-significance rows above lower-significance rows with @@ -194,7 +235,10 @@ def search_memories( "ORDER BY (memories_fts.rank - m.significance * ?) ASC " "LIMIT ?" ) - cur = conn.execute(sql, (owner_id, query, SIGNIFICANCE_RANK_BIAS, over_fetch)) + cur = conn.execute( + sql, + (owner_id, query, origin, head, SIGNIFICANCE_RANK_BIAS, over_fetch), + ) rows = cur.fetchall() # FTS-only path: preserve pre-T96 behaviour exactly. @@ -227,10 +271,7 @@ def _composite_rerank( Extracted from ``search_memories`` so the no-vector path stays a single call and the fused path can re-use the same boost formulae after RRF. """ - max_id_row = conn.execute( - "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,) - ).fetchone() - max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1 + max_id = _max_event_id(conn, owner_id) result_cols = cols + ["fts_rank"] enriched: list[dict] = [] @@ -301,6 +342,28 @@ def _rrf_fuse_and_rerank( query_vector=query_vector, k=vec_over_fetch, ) + # T113: drop vector hits that fall outside the active branch's event-id + # range. ``vector_search`` is a generic service used elsewhere; the + # branch filter applied to the FTS leg also has to apply here so the + # fused result respects the same scope. Memories with NULL event_id + # (legacy rows projected before T109's 0014 schema migration) are + # included unconditionally — same policy as the FTS leg. + from chat.state.branches import _NO_HEAD_CLAMP, active_branch_event_ids + + vec_origin, vec_head = active_branch_event_ids(conn) + if vec_hits and (vec_origin > 0 or vec_head < _NO_HEAD_CLAMP): + vec_ids = [h["memory_id"] for h in vec_hits] + placeholders_v = ",".join("?" * len(vec_ids)) + in_range = { + row[0] + for row in conn.execute( + f"SELECT id FROM memories " + f"WHERE id IN ({placeholders_v}) " + f" AND (event_id IS NULL OR event_id BETWEEN ? AND ?)", + (*vec_ids, vec_origin, vec_head), + ).fetchall() + } + vec_hits = [h for h in vec_hits if h["memory_id"] in in_range] vec_rank_by_id: dict[int, int] = { hit["memory_id"]: rank for rank, hit in enumerate(vec_hits) } @@ -343,10 +406,7 @@ def _rrf_fuse_and_rerank( # Final composite re-rank: significance + recency boosts on top of the # negated fusion score so the sort direction matches the FTS-only path. - max_id_row = conn.execute( - "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,) - ).fetchone() - max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1 + max_id = _max_event_id(conn, owner_id) result_cols = cols + ["fts_rank"] enriched: list[dict] = [] diff --git a/chat/templates/_delete_impact_modal.html b/chat/templates/_delete_impact_modal.html new file mode 100644 index 0000000..e5bab40 --- /dev/null +++ b/chat/templates/_delete_impact_modal.html @@ -0,0 +1,34 @@ +{# T110.3: delete-impact modal partial. + +Rendered from :func:`chat.web.drawer.delete_preview` via a Jinja2 +TemplateResponse so HTML autoescape covers user-controllable fields +(item.kind, item.description, notes) automatically — the prior +f-string assembly required explicit html.escape() calls (T110.2) +which become redundant under autoescape. + +Inputs: + ``chat_id`` — the URL chat id (used to build the confirm form action). + ``impact`` — an :class:`~chat.services.delete_impact.ImpactReport`. +#} +
+

Delete event {{ impact.target_event_id }}?

+

This will discard {{ impact.cascading|length }} events. Cascade:

+
    + {% if impact.cascading %} + {% for item in impact.cascading %} +
  • {{ item.kind }}: {{ item.description }}
  • + {% endfor %} + {% else %} +
  • none
  • + {% endif %} +
+
    + {% for note in impact.notes %} +
  • {{ note }}
  • + {% endfor %} +
+
+ +
+
diff --git a/chat/templates/_drawer.html b/chat/templates/_drawer.html index 8cfdd5f..6bbfeeb 100644 --- a/chat/templates/_drawer.html +++ b/chat/templates/_drawer.html @@ -547,6 +547,25 @@ {% endif %} + {# T110.4: bulk significance re-rate. Move every memory in this chat + at level_from to level_to with one manual_edit event per row, so + the audit trail stays per-memory. #} +
+ Bulk re-rate significance +
+ + + +
+
diff --git a/chat/templates/search.html b/chat/templates/search.html index ee61c24..ce0e8c7 100644 --- a/chat/templates/search.html +++ b/chat/templates/search.html @@ -21,14 +21,29 @@
    {% for r in results %}
  • - + {# T111.2: deep-link to the originating turn via the + ``id="turn-{event_id}"`` anchor stamped by Phase 3.5 T86. + ``event_id`` may be NULL for memory rows projected before the + 0014 migration ran (T109 did not backfill historical rows); in + that case fall back to a chat-level link with no anchor so we + never emit ``#turn-None``. #} +
    {{ r.owner_name }} · {{ r.chat_id }} {% if r.chat_name %}· {{ r.chat_name }}{% endif %} {% if r.scene_label %}· scene {{ r.scene_label }}{% endif %}
    -
    {{ r.pov_summary }}
    + {# T111.1: ``r.snippet`` is the FTS5 ``snippet()`` excerpt with + each match wrapped in ``...``. ``|safe`` is + required so the marker tags survive Jinja's auto-escape; the + snippet is built by SQLite from indexed text, so the only + HTML in the string is the ```` we configured (any + special chars from the source content are passed through as + literal text, NOT as HTML). This is the only ``|safe`` filter + on the page — chat_id, owner_name, etc. remain auto-escaped. #} +
    {{ r.snippet|safe }}
  • {% endfor %} diff --git a/chat/web/drawer.py b/chat/web/drawer.py index 5396ae8..5f94957 100644 --- a/chat/web/drawer.py +++ b/chat/web/drawer.py @@ -411,6 +411,64 @@ async def edit_memory_significance( return await drawer(chat_id, request, conn) +@router.post( + "/chats/{chat_id}/drawer/memory/significance/bulk", + response_class=HTMLResponse, +) +async def bulk_re_rate_significance( + chat_id: str, + request: Request, + level_from: int = Form(...), + level_to: int = Form(...), + conn=Depends(get_conn), +): + """T110.4: bulk re-rate every memory in this chat at ``level_from`` + to ``level_to``. + + Fans out into one ``manual_edit`` event per matching memory rather + than a single bulk event so the §6.4 audit trail stays per-row — + each affected memory carries its own ``prior_value -> new_value`` + snapshot, so an inverse edit can restore an individual row without + needing to inspect a bulk payload's member list. The drawer's + significance-distribution panel surfaces the new buckets on the + refreshed partial. + + Both levels are clamped to 0..3 (matching ``edit_memory_significance``) + and a no-op (``level_from == level_to``) is rejected with 400 so a + misclick can't pad the event log with empty edits. + """ + chat = get_chat(conn, chat_id) + if chat is None: + raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}") + + lf = max(0, min(3, int(level_from))) + lt = max(0, min(3, int(level_to))) + if lf == lt: + raise HTTPException( + status_code=400, + detail=f"level_from and level_to must differ (both = {lf})", + ) + + rows = conn.execute( + "SELECT id FROM memories WHERE chat_id = ? AND significance = ? " + "ORDER BY id ASC", + (chat_id, lf), + ).fetchall() + for row in rows: + memory_id = int(row[0]) + append_and_apply( + conn, + kind="manual_edit", + payload={ + "target_kind": "memory_significance", + "target_id": memory_id, + "prior_value": lf, + "new_value": lt, + }, + ) + return await drawer(chat_id, request, conn) + + @router.post( "/chats/{chat_id}/drawer/memory/{memory_id}/pin", response_class=HTMLResponse, @@ -1234,28 +1292,18 @@ async def delete_preview( report = compute_delete_impact(conn, target_event_id=int(event_id)) - # Build the modal HTML directly — the impact report is small and - # reusing the drawer template would require a fragment include just - # for this surface. Mirrors the rewind-preview style in - # :func:`chat.web.turns.rewind_preview`. - items_html = "".join( - f"
  • {item.kind}: {item.description}
  • " - for item in report.cascading + # T110.3: render via the ``_delete_impact_modal.html`` Jinja partial + # so HTML autoescape covers user-controllable fields (item.kind, + # item.description, notes) automatically. The prior implementation + # built the modal HTML via raw f-string concatenation and required + # explicit ``html.escape()`` calls (T110.2) on each interpolated + # field; under autoescape those calls become redundant. Mirrors the + # rewind-preview style in :func:`chat.web.turns.rewind_preview`. + return TEMPLATES.TemplateResponse( + request, + "_delete_impact_modal.html", + {"chat_id": chat_id, "impact": report}, ) - notes_html = "".join(f"
  • {note}
  • " for note in report.notes) - body = ( - "
    " - f"

    Delete event {report.target_event_id}?

    " - f"

    This will discard {len(report.cascading)} events. Cascade:

    " - f"
      {items_html or '
    • none
    • '}
    " - f"
      {notes_html}
    " - f"
    " - "" - "
    " - "
    " - ) - return HTMLResponse(body) @router.post( @@ -1278,7 +1326,19 @@ async def delete_turn( A snapshot is taken before truncation (inside ``execute_rewind``) so the user can recover via the snapshot index. + + T110.1 guards ``event_id <= 0``: a stale tab or hand-crafted request + posting ``event_id=0`` would otherwise compute ``after_event_id=-1`` + and silently truncate the entire log. ``id`` is auto-assigned by + SQLite starting at 1 so any caller's "real" id is always >= 1; a + zero or negative value can only mean a client bug, surfaced as 400. """ + if int(event_id) <= 0: + raise HTTPException( + status_code=400, + detail=f"event_id must be a positive integer, got {event_id}", + ) + chat = get_chat(conn, chat_id) if chat is None: raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}") diff --git a/chat/web/meanwhile.py b/chat/web/meanwhile.py index 52a91bc..f82db0d 100644 --- a/chat/web/meanwhile.py +++ b/chat/web/meanwhile.py @@ -71,18 +71,27 @@ def _read_recent_meanwhile_dialogue( that already match — avoids an unbounded scan as ``event_log`` grows. The user-side rows match on chat_id only since they aren't tagged with a scene id (they ride the chat-wide log). + + T113: clamp by the active branch's ``[origin, head]`` event-id range + so meanwhile prompt context respects the user's current branch. + Bootstrap-main and "no active branch" both fall through to ``(0, + BIG_INT)`` — no functional change for the metadata-only Phase 4 era. """ + from chat.state.branches import active_branch_event_ids + + origin, head = active_branch_event_ids(conn) cur = conn.execute( "SELECT id, kind, payload_json FROM event_log " "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') " " AND superseded_by IS NULL AND hidden = 0 " + " AND id BETWEEN ? AND ? " " AND json_extract(payload_json, '$.chat_id') = ? " " AND (" " kind IN ('user_turn', 'user_turn_edit') " " OR json_extract(payload_json, '$.meanwhile_scene_id') = ?" " ) " "ORDER BY id DESC LIMIT ?", - (chat_id, scene_id, limit), + (origin, head, chat_id, scene_id, limit), ) rows = cur.fetchall() rows.reverse() diff --git a/chat/web/search.py b/chat/web/search.py index 51d75ea..c8450bb 100644 --- a/chat/web/search.py +++ b/chat/web/search.py @@ -14,6 +14,12 @@ For each match we hydrate just enough metadata to render a row: * the originating scene title when one exists, * and the ``pov_summary`` itself. +T106 (Phase 4.5): hydration is batched. Pre-T106 the route called +``get_bot``/``get_chat``/``get_scene`` once per result row — N+1 with +``DEFAULT_SEARCH_K=50`` meaning up to 150 individual SELECTs per page +load. We now collect distinct ids first and fan-in via three +``WHERE id IN (...)`` queries, then map back per row. + We deliberately keep this module synchronous and template-only — no HTMX swaps, no JSON API — because the search box is a "leave the current chat to look something up" surface, not an inline drawer. @@ -21,7 +27,9 @@ current chat to look something up" surface, not an inline drawer. from __future__ import annotations +import json from pathlib import Path +from sqlite3 import Connection from fastapi import APIRouter, Depends, Request from fastapi.responses import HTMLResponse @@ -36,29 +44,145 @@ TEMPLATES = Jinja2Templates( directory=str(Path(__file__).resolve().parent.parent / "templates") ) +#: Maximum cross-chat FTS matches surfaced per ``/search`` page load. +#: Extracted as a module-level constant (T106) so the cap is tunable +#: without touching the route body. ``search_all_memories`` itself +#: defaults to a smaller ``k=20``; we override here because the +#: top-bar search is a "scan everything I've seen" surface, not an +#: inline drawer. +DEFAULT_SEARCH_K = 50 + router = APIRouter() +def _fetch_bots_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]: + """Batched sibling of :func:`chat.state.entities.get_bot`. + + Inlined here (not exported from ``state.entities``) to keep T106's + scope confined to ``search.py`` per the Phase 4.5 plan. Returns + ``{bot_id: bot_dict}`` for every id present in ``ids``; ids with + no matching row are simply absent from the map (the caller falls + back to the raw id string the same way it did pre-T106). + + Empty ``ids`` short-circuits to ``{}`` because SQLite rejects + ``WHERE id IN ()`` as a syntax error. + """ + if not ids: + return {} + placeholders = ",".join("?" * len(ids)) + cols = [c[1] for c in conn.execute("PRAGMA table_info(bots)").fetchall()] + rows = conn.execute( + f"SELECT * FROM bots WHERE id IN ({placeholders})", + tuple(ids), + ).fetchall() + out: dict[str, dict] = {} + for row in rows: + d = dict(zip(cols, row)) + d["voice_samples"] = json.loads(d.pop("voice_samples_json")) + d["traits"] = json.loads(d.pop("traits_json")) + out[d["id"]] = d + return out + + +def _fetch_chats_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]: + """Batched sibling of :func:`chat.state.world.get_chat`. + + Mirrors that helper's ``chats``/``chat_state`` JOIN so the returned + dicts have the same shape (``narrative_anchor``, ``time``, + ``weather``, ``active_scene_id``, etc.). Empty ``ids`` returns + ``{}`` to dodge the ``IN ()`` syntax error. + """ + if not ids: + return {} + placeholders = ",".join("?" * len(ids)) + rows = conn.execute( + "SELECT c.id, c.host_bot_id, c.guest_bot_id, c.created_at, " + " s.time, s.weather, s.active_scene_id, s.narrative_anchor " + f"FROM chats c JOIN chat_state s ON s.chat_id = c.id " + f"WHERE c.id IN ({placeholders})", + tuple(ids), + ).fetchall() + return { + row[0]: { + "id": row[0], + "host_bot_id": row[1], + "guest_bot_id": row[2], + "created_at": row[3], + "time": row[4], + "weather": row[5], + "active_scene_id": row[6], + "narrative_anchor": row[7], + } + for row in rows + } + + +def _fetch_scenes_by_ids(conn: Connection, ids: set[int]) -> dict[int, dict]: + """Batched sibling of :func:`chat.state.world.get_scene`. + + Returns ``{scene_id: scene_dict}`` with ``participants`` already + JSON-decoded so callers see the same shape as the per-row helper. + Empty ``ids`` returns ``{}``. + """ + if not ids: + return {} + placeholders = ",".join("?" * len(ids)) + cols = [c[1] for c in conn.execute("PRAGMA table_info(scenes)").fetchall()] + rows = conn.execute( + f"SELECT * FROM scenes WHERE id IN ({placeholders})", + tuple(ids), + ).fetchall() + out: dict[int, dict] = {} + for row in rows: + d = dict(zip(cols, row)) + d["participants"] = json.loads(d.pop("participants_json")) + out[d["id"]] = d + return out + + @router.get("/search", response_class=HTMLResponse) async def search(request: Request, q: str = "", conn=Depends(get_conn)): - """Render ``search.html`` with up to 50 cross-chat FTS matches. + """Render ``search.html`` with up to :data:`DEFAULT_SEARCH_K` matches. ``q`` is intentionally allowed to be empty — that path renders the page's "enter a query" placeholder rather than a 400, because the top-bar form submits to this URL even with an empty input. T93's service short-circuits whitespace-only queries to ``[]`` so there is no FTS5 ``MATCH ''`` syntax error to guard against here. - """ - raw_results = search_all_memories(conn, query=q, k=50) if q else [] - # Hydrate display fields per row. We do this in the route (not the - # service) so the service stays a pure FTS shim that other UIs - # can reuse. + Hydration (T106) is batched: rather than calling ``get_bot`` / + ``get_chat`` / ``get_scene`` per row (worst case 3 * k individual + SELECTs), we collect distinct ids and issue one ``IN (...)`` query + per entity kind, then map back during the row build. ``get_bot`` + et al. remain imported for test-time monkeypatching but are no + longer invoked on the hot path. + """ + raw_results = ( + search_all_memories(conn, query=q, k=DEFAULT_SEARCH_K) if q else [] + ) + + # Collect distinct ids up front so the IN-list queries dedupe (a + # popular bot or scene shows up many times across the result set). + bot_ids: set[str] = {r["owner_id"] for r in raw_results if r["owner_id"]} + chat_ids: set[str] = {r["chat_id"] for r in raw_results if r["chat_id"]} + scene_ids: set[int] = { + r["scene_id"] for r in raw_results if r["scene_id"] + } + + bots_by_id = _fetch_bots_by_ids(conn, bot_ids) + chats_by_id = _fetch_chats_by_ids(conn, chat_ids) + scenes_by_id = _fetch_scenes_by_ids(conn, scene_ids) + + # Hydrate display fields per row from the batched maps. We do this + # in the route (not the service) so the service stays a pure FTS + # shim that other UIs can reuse. results = [] for row in raw_results: - bot = get_bot(conn, row["owner_id"]) - chat = get_chat(conn, row["chat_id"]) - scene = get_scene(conn, row["scene_id"]) if row["scene_id"] else None + bot = bots_by_id.get(row["owner_id"]) + chat = chats_by_id.get(row["chat_id"]) + scene = ( + scenes_by_id.get(row["scene_id"]) if row["scene_id"] else None + ) results.append( { "memory_id": row["memory_id"], @@ -69,6 +193,13 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)): chat.get("narrative_anchor") if chat else None ), "scene_id": row["scene_id"], + # T111.2: event_id deep-links to the originating turn + # via the ``id="turn-{event_id}"`` anchor that Phase 3.5 + # T86 stamps on each turn DOM node. May be ``None`` for + # memory rows projected before the 0014 migration ran + # (T109 did not backfill historical rows); the template + # falls back to a chat-level link in that case. + "event_id": row["event_id"], # Scenes have no ``title`` column today; surface the # ``started_at`` timestamp as a human-friendly label # when a scene is set, otherwise leave it blank. @@ -76,6 +207,14 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)): scene.get("started_at") if scene else None ), "pov_summary": row["pov_summary"], + # T111.1: ``snippet`` is the FTS5 windowed excerpt with + # ```` tags around each match. Falls back to the + # full ``pov_summary`` if the row lacks a snippet (which + # shouldn't happen on this code path because every + # ``raw_results`` row came from a MATCH query, but we + # guard defensively so the template never renders + # ``None``). + "snippet": row.get("snippet") or row["pov_summary"], "significance": row["significance"], "ts": row["ts"], } diff --git a/chat/web/snapshots.py b/chat/web/snapshots.py index ae3cc30..c169e4f 100644 --- a/chat/web/snapshots.py +++ b/chat/web/snapshots.py @@ -8,20 +8,27 @@ Routes: * ``GET /snapshots`` list all snapshots (both kinds) * ``POST /snapshots/take`` take a periodic snapshot now -* ``POST /snapshots/restore/{id}`` restore (requires matching ``confirm_id``) +* ``POST /snapshots/restore/{id}`` restore (requires matching ``confirm_id`` and ``kind``) * ``GET /snapshots/{id}/preview`` show metadata + delta vs current The ``snapshot_id`` is the filename stem (the UTC timestamp written by :func:`chat.services.snapshot.take_snapshot`) — there's no separate UUID, and the timestamp filename is already unique per snapshot kind. Both periodic and rewind snapshots share the same id space lookup-wise, so -the restore + preview routes accept ``kind`` as a form/query param to -disambiguate. +the restore + preview routes require ``kind`` as a form/query param to +disambiguate (a missing/empty ``kind`` is a 400, not a silent default). + +Note on ``created_at`` mtime drift: the listing's ``created_at`` comes +from the file's mtime, not the encoded filename timestamp. ``cp -p`` +preserves mtime, but plain ``cp`` resets it to "now" — so a copied +snapshot can show a misleading ``created_at`` while its filename still +reflects the original UTC capture time. """ from __future__ import annotations import json +from datetime import datetime, timezone from pathlib import Path from fastapi import APIRouter, Depends, Form, HTTPException, Request @@ -52,8 +59,6 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]: ``last_event_id`` (parsed from the JSON body — small enough that listing isn't a performance concern for the handful of files we keep). """ - from datetime import datetime, timezone - rows: list[dict] = [] for kind in SNAPSHOT_KINDS: snap_dir = data_dir / "snapshots" / kind @@ -85,12 +90,26 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]: return rows +def _require_kind(kind: str) -> str: + """Reject missing/empty/unknown ``kind`` with 400. + + Defaulting silently to ``"periodic"`` made rewind-snapshot lookups + appear as 404s, which is confusing — make the client always state + the kind explicitly. + """ + if not kind or kind not in SNAPSHOT_KINDS: + raise HTTPException( + status_code=400, + detail=f"kind must be one of {SNAPSHOT_KINDS}", + ) + return kind + + def _resolve_snapshot_path( data_dir: Path, snapshot_id: str, kind: str ) -> Path: """Map an ``(id, kind)`` pair to the on-disk file, or 404.""" - if kind not in SNAPSHOT_KINDS: - raise HTTPException(status_code=400, detail=f"unknown kind: {kind}") + _require_kind(kind) path = data_dir / "snapshots" / kind / f"{snapshot_id}.json" if not path.exists(): raise HTTPException(status_code=404, detail="snapshot not found") @@ -127,7 +146,7 @@ async def snapshots_restore( snapshot_id: str, request: Request, confirm_id: str = Form(""), - kind: str = Form("periodic"), + kind: str = Form(""), conn=Depends(get_conn), ): """Hard-confirm restore: ``confirm_id`` must equal the path id. @@ -135,7 +154,11 @@ async def snapshots_restore( Mismatched confirm → 400 (without touching the DB). On match, the existing :func:`restore_from_snapshot` clears projected tables and re-loads them from the dump. + + ``kind`` is required (must be ``"periodic"`` or ``"rewind"``) — a + missing or empty value 400s rather than silently defaulting. """ + _require_kind(kind) if confirm_id != snapshot_id: raise HTTPException( status_code=400, @@ -151,7 +174,7 @@ async def snapshots_restore( async def snapshots_preview( snapshot_id: str, request: Request, - kind: str = "periodic", + kind: str = "", conn=Depends(get_conn), ): """Show snapshot metadata + a basic delta against the current event log. @@ -159,7 +182,10 @@ async def snapshots_preview( Phase 4 keeps this simple: the snapshot's ``last_event_id`` plus the current ``MAX(event_log.id)`` is enough to tell the user how far the log has moved on. A richer per-table diff is a Phase 4.5+ concern. + + ``kind`` is required — see :func:`snapshots_restore`. """ + _require_kind(kind) settings = request.app.state.settings path = _resolve_snapshot_path(settings.data_dir, snapshot_id, kind) dump = json.loads(path.read_text()) diff --git a/chat/web/turns.py b/chat/web/turns.py index 97ef4a6..623390d 100644 --- a/chat/web/turns.py +++ b/chat/web/turns.py @@ -812,6 +812,14 @@ async def post_turn( payload={ "event_id": transition.event_id, "started_at": chat.get("time"), + # T114.1: back-reference to the assistant_turn that + # triggered this transition. Regenerate uses this + # to roll back lifecycle transitions when the turn + # is superseded. Forward-only — older events + # without this field are skipped by rollback. + "triggered_by_assistant_turn_id": ( + primary_assistant_event_id + ), }, ) elif transition.new_status == "completed": @@ -821,6 +829,10 @@ async def post_turn( payload={ "event_id": transition.event_id, "completed_at": chat.get("time"), + # T114.1: back-reference (see above). + "triggered_by_assistant_turn_id": ( + primary_assistant_event_id + ), }, ) # Run promotion inline so the artifact-emitting events @@ -842,6 +854,10 @@ async def post_turn( payload={ "event_id": transition.event_id, "completed_at": chat.get("time"), + # T114.1: back-reference (see above). + "triggered_by_assistant_turn_id": ( + primary_assistant_event_id + ), }, ) # Any other ``new_status`` value falls through silently — @@ -873,6 +889,20 @@ async def post_turn( # mid-stream still meant to close the scene — the cancelled bot # beat doesn't invalidate that intent. Pinned by # test_cancelled_turn_still_closes_scene_when_user_prose_signals_close. + # + # T108 NOTE — the in-memory append order is correct, but the cancel + # path re-raises ``CancelledError`` at the end of ``post_turn`` + # (see step 11 below). The ``open_db`` dependency teardown skips + # ``conn.commit()`` when the consumer raises, which means in + # production a genuine cancel currently rolls back ALL post-cancel + # writes — including this scene_closed event, the truncated + # assistant_turn record, edge updates, and per-POV summaries. The + # T74.3 regression test passes only because of a missing + # ``import asyncio`` in the test module: the inline mock raises + # ``NameError`` instead of ``CancelledError``, which is caught by + # the ``except Exception:`` branch and leaves ``cancelled=False``, + # so the function returns 204 normally and the commit fires. This + # is a transactional bug deferred for triage (T108 report). if scene is not None and prose.strip(): container = None if scene.get("container_id") is not None: diff --git a/docs/plans/2026-04-26-v1-requirements-design.md b/docs/plans/2026-04-26-v1-requirements-design.md index 5db1623..8e0f78a 100644 --- a/docs/plans/2026-04-26-v1-requirements-design.md +++ b/docs/plans/2026-04-26-v1-requirements-design.md @@ -522,6 +522,8 @@ Written per witness when a scene closes. Different details, different interpreta **Status: shipped 2026-04-27** (T88–T102, 15 tasks across 8 waves; +70 tests). See "Phase 4 status" in CLAUDE.md for the per-task breakdown. Vector retrieval shipped via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions); branching is data-model + drawer UI; significance review, hide-from-view soft delete, surgical delete with cascade preview, snapshot UX, and cross-chat search all surface from the drawer or top-bar. +**Phase 4.5 cleanup: shipped 2026-04-27** (T103–T118, 13 of 14 planned tasks; T115 sqlite-vec swap deferred to Phase 5 due to host Python lacking `enable_load_extension`; +~44 tests; schema baseline now 14). See "Phase 4.5 status" in CLAUDE.md for the per-task breakdown — notable shipped: real embedding model swap path (`LLMClient.embed()` + `--re-embed-all`), branching read-side filter (`active_branch_event_ids`), regenerate lifecycle rollback (`event_status_reverted`), FTS snippet highlighting + deep-link to turn (`memories.event_id`), bulk significance re-rate. + - Vector retrieval (sqlite-vss or sqlite-vec). - Branching UI. - Drawer-edit on every field. diff --git a/scripts/backfill_embeddings.py b/scripts/backfill_embeddings.py index f5c15bb..e823d2b 100644 --- a/scripts/backfill_embeddings.py +++ b/scripts/backfill_embeddings.py @@ -8,8 +8,21 @@ Phase 4 ships the deterministic local pseudo-embedding so this script runs synchronously without a network round-trip — the LLMClient argument is not needed on the pseudo path. Phase 4.5+ will need a real client. +T112 (Phase 4.5) adds two flags: + +* ``--re-embed-all`` walks **every** memory regardless of whether it + already has an ``embeddings`` row. Useful when swapping embedding + models — the projector is INSERT OR REPLACE, so re-emitting an event + for an existing memory replaces the prior vector. Without this flag, + the script keeps the Phase 4 behavior of only filling in gaps. +* ``--model M`` overrides ``Settings.embedding_model`` for this run. + Defaults to the configured model (which itself defaults to + ``"pseudo-sha256-384"``). + Run from the repo root: .venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run] + .venv/bin/python scripts/backfill_embeddings.py --re-embed-all + .venv/bin/python scripts/backfill_embeddings.py --re-embed-all --model bge-small-en-v1.5 """ from __future__ import annotations @@ -17,11 +30,12 @@ from __future__ import annotations import argparse import asyncio -from chat.config import load_settings +from chat.config import Settings, load_settings from chat.db.connection import open_db from chat.db.migrate import apply_migrations from chat.eventlog.log import append_and_apply from chat.services.embeddings import ( + DEFAULT_EMBEDDING_MODEL, FALLBACK_EMBEDDING_MODEL, generate_embedding, ) @@ -34,6 +48,24 @@ import chat.state.memory # noqa: F401 import chat.state.world # noqa: F401 +def _build_client(settings: Settings): + """Construct an LLMClient for the backfill run. + + Default-model runs (the pseudo path) don't need a client, so we + return ``None`` and ``generate_embedding`` skips the call. Non-default + models route through the real client; injectable via monkeypatch in + tests. + """ + if settings.embedding_model == DEFAULT_EMBEDDING_MODEL: + return None + from chat.llm.featherless import FeatherlessClient + + return FeatherlessClient( + api_key=settings.featherless_api_key, + base_url=settings.featherless_base_url, + ) + + async def main() -> None: parser = argparse.ArgumentParser(description=__doc__) parser.add_argument( @@ -47,23 +79,51 @@ async def main() -> None: action="store_true", help="Print the count of memories needing embeddings, then exit.", ) + parser.add_argument( + "--re-embed-all", + action="store_true", + help=( + "Walk every memory (not just those without an embeddings row) " + "and re-emit embedding_indexed events. Use this when swapping " + "embedding models so the existing rows get replaced." + ), + ) + parser.add_argument( + "--model", + type=str, + default=None, + help=( + "Embedding model identifier. Overrides Settings.embedding_model " + "for this run; default uses the configured model." + ), + ) args = parser.parse_args() settings = load_settings() settings.db_path.parent.mkdir(parents=True, exist_ok=True) apply_migrations(settings.db_path) + model = args.model or settings.embedding_model + # Override the settings instance so ``_build_client`` sees the + # effective model when deciding whether to construct a real client. + settings = settings.model_copy(update={"embedding_model": model}) + client = _build_client(settings) + with open_db(settings.db_path) as conn: - sql = ( - "SELECT m.id, m.pov_summary FROM memories m " - "LEFT JOIN embeddings e ON e.memory_id = m.id " - "WHERE e.memory_id IS NULL " - "ORDER BY m.id" - ) + if args.re_embed_all: + sql = "SELECT m.id, m.pov_summary FROM memories m ORDER BY m.id" + else: + sql = ( + "SELECT m.id, m.pov_summary FROM memories m " + "LEFT JOIN embeddings e ON e.memory_id = m.id " + "WHERE e.memory_id IS NULL " + "ORDER BY m.id" + ) if args.limit is not None: sql += f" LIMIT {int(args.limit)}" rows = conn.execute(sql).fetchall() - print(f"Found {len(rows)} memories needing embeddings.") + mode = "re-embedding" if args.re_embed_all else "needing embeddings" + print(f"Found {len(rows)} memories {mode} (model={model}).") if args.dry_run: return @@ -71,11 +131,12 @@ async def main() -> None: skipped = 0 for memory_id, text in rows: result = await generate_embedding( - client=None, # pseudo path: no client needed + client=client, text=text or "", + model=model, ) if result.model == FALLBACK_EMBEDDING_MODEL: - print(f" Skipping memory_id={memory_id} (empty text)") + print(f" Skipping memory_id={memory_id} (empty text or fallback)") skipped += 1 continue append_and_apply( diff --git a/tests/fixtures.py b/tests/fixtures.py new file mode 100644 index 0000000..6ad952b --- /dev/null +++ b/tests/fixtures.py @@ -0,0 +1,383 @@ +"""Structured test-fixture builder for ``MockLLMClient`` canned queues. + +Phase 4.5 (T116) carry-over from Phase 3. The turn-flow tests in +``test_turn_flow.py``, ``test_meanwhile_turn_flow.py``, +``test_phase3_integration.py``, and ``test_phase4_integration.py`` used +to construct ``MockLLMClient`` canned-response queues as raw positional +lists of pre-encoded JSON strings. That worked, but every time a new +classifier call landed in a code path the tests had to be patched in +many places at the right index — easy to mis-position, hard to read. + +This module ships :class:`CannedQueue`, a fluent builder that lets a +test declare its classifier expectations by **name** and **order** of +call, not by index into a brittle list. Each method appends one item +to the queue and returns ``self`` for chaining; ``build()`` JSON-encodes +the items and produces the flat ``list[str]`` that +``MockLLMClient(canned=...)`` expects. + +Usage +----- + +>>> from tests.fixtures import CannedQueue +>>> from chat.llm.mock import MockLLMClient +>>> canned = ( +... CannedQueue() +... .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}]) +... .narrative("Hi there.") +... .state_update() +... .state_update() +... .build() +... ) +>>> mock = MockLLMClient(canned=canned) + +Each method maps to a single classifier (or stream) call that the turn +flow makes, in the order the production code makes them. Picking the +right method for the slot you need keeps the test readable and lets the +builder pin sensible defaults for the fields tests don't care about. + +Migration template +------------------ + +To migrate a positional canned-array test: + +1. Identify each slot in the existing array and what classifier it + feeds. Comments above the array often spell this out — start there. +2. Replace each slot with the matching :class:`CannedQueue` method: + + - ``json.dumps({"segments": [...]})`` → ``.parse_turn(segments=...)`` + - bare narrative string → ``.narrative("...")`` + - zero-state JSON → ``.state_update()`` (defaults are zeros) + - ``json.dumps({"addressee_id": ...})`` → ``.detect_addressee(...)`` + - ``json.dumps({"should_interject": ...})`` → ``.detect_interjection(...)`` + - ``json.dumps({"should_close": ...})`` → ``.detect_scene_close(...)`` + - ``json.dumps({"transitions": [...]})`` → ``.detect_event_transitions(...)`` + - per-POV summary JSON → ``.summarize_scene_pov(summary=...)`` +3. End with ``.build()`` and pass that to + ``MockLLMClient(canned=...)``. The mock's contract is unchanged. + +Notes on streams +---------------- + +``MockLLMClient.stream`` and ``MockLLMClient.generate`` share one queue +— each pop is one entry, regardless of whether the production code +streams the response or generates it whole. The narrative service +streams; classifier services generate. The builder treats both the same: +``narrative()`` appends a raw string, the classifier methods append +JSON-encoded dicts. Both end up in the same flat ``list[str]`` that the +mock pops from in order. + +The remaining tests in the suite (about 30 across the four files +mentioned above) still use positional arrays — Phase 5 work to migrate +the rest. New tests should prefer this builder. +""" + +from __future__ import annotations + +import json +from typing import Any + + +class CannedQueue: + """Fluent builder for ``MockLLMClient`` canned-response queues. + + Each method appends one item to an internal queue and returns + ``self`` for chaining. ``build()`` returns the flat ``list[str]`` + suitable for ``MockLLMClient(canned=...)``. + + The queue holds either ``dict`` (JSON-encoded at ``build()`` time) + or ``str`` (passed through verbatim — used for narrative streams). + """ + + def __init__(self) -> None: + self._queue: list[Any] = [] + + # ------------------------------------------------------------------ + # Narrative stream — bare string, no JSON wrapping. + # ------------------------------------------------------------------ + + def narrative(self, text: str) -> "CannedQueue": + """Append one streaming narrative response. + + ``MockLLMClient.stream`` pops the next entry from the same queue + as ``generate`` — a bare string is what the streaming bot beat + consumes. Use one ``narrative()`` per assistant beat (primary, + and optionally an interjection / second beat). + """ + self._queue.append(text) + return self + + def raw(self, value: str) -> "CannedQueue": + """Append a raw string (escape hatch for non-classifier calls). + + Most tests should reach for the named helpers — this is here + for one-offs the builder doesn't model yet. + """ + self._queue.append(value) + return self + + # ------------------------------------------------------------------ + # Turn parser — splits user prose into segments. + # ------------------------------------------------------------------ + + def parse_turn( + self, + *, + segments: list[dict] | None = None, + intent: str = "narrative", + landing_state_hint: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one ``parse_turn`` classifier response. + + ``intent`` defaults to ``"narrative"``; pass ``"skip_elision"`` + or ``"skip_jump"`` to exercise the natural-language skip paths. + ``landing_state_hint`` carries the residual descriptor for + elision skips and is otherwise ignored. + """ + payload: dict[str, Any] = { + "segments": segments if segments is not None else [], + "intent": intent, + "landing_state_hint": landing_state_hint, + } + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Multi-entity addressee classifier (T74.1). + # ------------------------------------------------------------------ + + def detect_addressee( + self, + *, + addressee_id: str, + confidence: str = "medium", + reason: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one ``detect_addressee`` classifier response.""" + payload: dict[str, Any] = { + "addressee_id": addressee_id, + "confidence": confidence, + "reason": reason, + } + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # State-update — one per directed edge per turn. + # ------------------------------------------------------------------ + + def state_update( + self, + *, + affinity_delta: int = 0, + trust_delta: int = 0, + knowledge_facts: list | None = None, + **rest: Any, + ) -> "CannedQueue": + """Append one ``apply_state_update`` classifier response. + + Defaults to a benign zero-delta payload — tests that don't care + about state mutations can call this without arguments. One call + is required per directed edge that fires after the assistant + beat (e.g. single-bot non-guest turn = 2 calls; multi-bot guest + turn = 6 calls). + """ + payload: dict[str, Any] = { + "affinity_delta": affinity_delta, + "trust_delta": trust_delta, + "knowledge_facts": ( + knowledge_facts if knowledge_facts is not None else [] + ), + } + payload.update(rest) + self._queue.append(payload) + return self + + def zero_state(self) -> "CannedQueue": + """Alias for ``state_update()`` with all defaults — matches the + ``_zero_state()`` helper in existing tests. + """ + return self.state_update() + + # ------------------------------------------------------------------ + # Interjection (T74.2) — silent witness chimes in. + # ------------------------------------------------------------------ + + def detect_interjection( + self, + *, + should_interject: bool, + reason: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one ``detect_interjection`` classifier response.""" + payload: dict[str, Any] = { + "should_interject": should_interject, + "reason": reason, + } + payload.update(rest) + self._queue.append(payload) + return self + + def detect_interjection_targeted( + self, + *, + targeted: bool, + target_id: str | None = None, + reason: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one targeted-interjection classifier response.""" + payload: dict[str, Any] = { + "targeted": targeted, + "target_id": target_id, + "reason": reason, + } + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Scene-close detector (T26). + # ------------------------------------------------------------------ + + def detect_scene_close( + self, + *, + should_close: bool, + reason: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one ``detect_scene_close`` classifier response.""" + payload: dict[str, Any] = { + "should_close": should_close, + "reason": reason, + } + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Event lifecycle (T52, T61) — per-turn transitions. + # ------------------------------------------------------------------ + + def detect_event_transitions( + self, + transitions: list[dict] | None = None, + ) -> "CannedQueue": + """Append one ``detect_event_transitions`` classifier response. + + ``transitions`` is a list of ``{"event_id": ..., "new_status": + "active"|"completed"|"cancelled", "reason": ...}`` dicts. Pass + an empty list (or omit the argument) to assert that the call + ran but produced no transitions; pass ``None`` for an empty + list with the same shape. + + Note: when no events are seeded, ``detect_event_transitions`` + short-circuits without an LLM call — in that case do NOT append + this slot. + """ + payload = {"transitions": transitions if transitions is not None else []} + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Per-POV scene summary (used after scene close). + # ------------------------------------------------------------------ + + def summarize_scene_pov( + self, + *, + summary: str, + knowledge_facts: list | None = None, + relationship_summary: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one per-POV scene-summary response. + + Used by ``apply_scene_close_summary`` — one call per witness + once a scene closes. + """ + payload: dict[str, Any] = { + "summary": summary, + "knowledge_facts": ( + knowledge_facts if knowledge_facts is not None else [] + ), + "relationship_summary": relationship_summary, + } + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Thread detection (Phase 3 §3.3). + # ------------------------------------------------------------------ + + def detect_threads( + self, + candidates: list[dict] | None = None, + ) -> "CannedQueue": + """Append one ``detect_threads`` classifier response. + + ``candidates`` is a list of ``{"action": "open"|"update", + "title": ..., "summary": ..., "existing_thread_id": ...}`` dicts. + """ + payload = {"candidates": candidates if candidates is not None else []} + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Meanwhile digest — narrative summary of what happened off-screen. + # ------------------------------------------------------------------ + + def meanwhile_digest(self, summary: str) -> "CannedQueue": + """Append one meanwhile-digest narrative response. + + The digest service streams the digest as plain text (not JSON) + so this is a thin wrapper over ``narrative``/``raw`` for + readability at the call site. + """ + self._queue.append(summary) + return self + + # ------------------------------------------------------------------ + # Significance scorer (background worker; rarely hit in unit tests + # but available for completeness). + # ------------------------------------------------------------------ + + def score_significance( + self, + *, + score: float = 0.0, + reason: str = "", + **rest: Any, + ) -> "CannedQueue": + """Append one significance-scoring classifier response.""" + payload: dict[str, Any] = {"score": score, "reason": reason} + payload.update(rest) + self._queue.append(payload) + return self + + # ------------------------------------------------------------------ + # Build / introspection. + # ------------------------------------------------------------------ + + def build(self) -> list[str]: + """Return the flat ``list[str]`` queue for ``MockLLMClient``. + + Dict items are JSON-encoded; string items are passed through + verbatim (so streaming responses retain their raw form). + """ + out: list[str] = [] + for item in self._queue: + if isinstance(item, str): + out.append(item) + else: + out.append(json.dumps(item)) + return out + + def __len__(self) -> int: + return len(self._queue) diff --git a/tests/test_backfill_embeddings.py b/tests/test_backfill_embeddings.py new file mode 100644 index 0000000..d0f33b3 --- /dev/null +++ b/tests/test_backfill_embeddings.py @@ -0,0 +1,231 @@ +"""Tests for the backfill_embeddings script (T112, Phase 4.5). + +Phase 4 shipped a backfill that walked memories *without* an embedding +row and produced a vector for each (deterministic pseudo path). T112 +adds a ``--re-embed-all`` flag that walks **every** memory regardless +of whether it already has an embeddings row, so operators can swap +embedding models and have the existing rows replaced (the +``embedding_indexed`` projector is INSERT OR REPLACE). + +These tests exercise the script's ``main()`` directly via asyncio — +shell-out via subprocess would also work but importing keeps the +fixture surface small and the failure mode clearer. +""" + +from __future__ import annotations + +from pathlib import Path +from unittest.mock import patch + +import pytest + +from chat.db.connection import open_db +from chat.db.migrate import apply_migrations +from chat.eventlog.log import append_and_apply, append_event +from chat.eventlog.projector import project +from chat.services.embeddings import DEFAULT_EMBEDDING_MODEL + +# Trigger handler registration for projection. +import chat.state.embeddings # noqa: F401 +import chat.state.entities # noqa: F401 +import chat.state.memory # noqa: F401 +import chat.state.world # noqa: F401 + +import scripts.backfill_embeddings as backfill + + +def _seed(db_path: Path, count: int) -> list[int]: + """Seed ``count`` memory rows for ``bot_a``; return their ids.""" + with open_db(db_path) as conn: + append_event( + conn, + kind="bot_authored", + payload={ + "id": "bot_a", + "name": "BotA", + "persona": "...", + "voice_samples": [], + "traits": [], + "backstory": "", + "initial_relationship_to_you": "", + "kickoff_prose": "", + }, + ) + append_event( + conn, + kind="chat_created", + payload={ + "id": "chat_bot_a", + "host_bot_id": "bot_a", + "initial_time": "2026-04-26T20:00:00+00:00", + "narrative_anchor": "Day 1", + "weather": "", + }, + ) + for i in range(count): + append_event( + conn, + kind="memory_written", + payload={ + "owner_id": "bot_a", + "chat_id": "chat_bot_a", + "pov_summary": f"memory text {i}", + "witness_you": 1, + "witness_host": 1, + "witness_guest": 0, + "source": "direct", + "reliability": 1.0, + "significance": 1, + "pinned": 0, + "auto_pinned": 0, + }, + ) + project(conn) + return [ + r[0] + for r in conn.execute( + "SELECT id FROM memories WHERE owner_id = 'bot_a' ORDER BY id" + ).fetchall() + ] + + +def _seed_embedding(db_path: Path, memory_id: int, model: str = "stale-model") -> None: + """Insert a stale ``embedding_indexed`` event so the row already + exists in ``embeddings`` (and the default backfill would skip it).""" + with open_db(db_path) as conn: + append_and_apply( + conn, + kind="embedding_indexed", + payload={ + "memory_id": memory_id, + "model": model, + "dim": 3, + "vector": [0.0, 0.0, 0.0], + }, + ) + + +@pytest.mark.asyncio +async def test_re_embed_all_walks_every_memory(tmp_path, monkeypatch, capsys): + """``--re-embed-all`` re-embeds memories that already have rows in + ``embeddings`` (default mode skips them). After the run, every + memory should have an updated embedding tagged with the configured + model (the projector replaces stale rows in place).""" + db = tmp_path / "t.db" + apply_migrations(db) + memory_ids = _seed(db, count=3) + # Pre-seed stale embeddings on two of the three memories so the + # default path would skip them and only ``--re-embed-all`` covers + # everything. + _seed_embedding(db, memory_ids[0]) + _seed_embedding(db, memory_ids[1]) + + cfg = tmp_path / "config.toml" + cfg.write_text( + f'featherless_api_key = "x"\n' + f'db_path = "{db}"\n' + f'data_dir = "{tmp_path}"\n' + ) + monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg)) + monkeypatch.setenv("CHAT_DB_PATH", str(db)) + + with patch("sys.argv", ["backfill_embeddings.py", "--re-embed-all"]): + await backfill.main() + + # All three memories now have a fresh embedding tagged with the + # default pseudo model (replacing the stale rows). + with open_db(db) as conn: + rows = conn.execute( + "SELECT memory_id, model FROM embeddings ORDER BY memory_id" + ).fetchall() + assert len(rows) == 3 + for mid, model in rows: + assert mid in memory_ids + assert model == DEFAULT_EMBEDDING_MODEL + + +@pytest.mark.asyncio +async def test_default_backfill_only_walks_missing(tmp_path, monkeypatch): + """Without ``--re-embed-all``, the script keeps the Phase 4 + behavior — memories with an existing embedding row are left + alone (their stale-model tag survives).""" + db = tmp_path / "t.db" + apply_migrations(db) + memory_ids = _seed(db, count=2) + _seed_embedding(db, memory_ids[0], model="stale-model") + # memory_ids[1] has no embedding yet. + + cfg = tmp_path / "config.toml" + cfg.write_text( + f'featherless_api_key = "x"\n' + f'db_path = "{db}"\n' + f'data_dir = "{tmp_path}"\n' + ) + monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg)) + monkeypatch.setenv("CHAT_DB_PATH", str(db)) + + with patch("sys.argv", ["backfill_embeddings.py"]): + await backfill.main() + + with open_db(db) as conn: + rows = dict( + conn.execute( + "SELECT memory_id, model FROM embeddings ORDER BY memory_id" + ).fetchall() + ) + # Stale row preserved; only the missing one was filled. + assert rows[memory_ids[0]] == "stale-model" + assert rows[memory_ids[1]] == DEFAULT_EMBEDDING_MODEL + + +@pytest.mark.asyncio +async def test_re_embed_all_respects_model_arg(tmp_path, monkeypatch): + """The ``--model`` flag overrides ``Settings.embedding_model``. + With a non-default model and a client that returns canned vectors, + every memory is re-embedded with the supplied model tag.""" + db = tmp_path / "t.db" + apply_migrations(db) + memory_ids = _seed(db, count=2) + _seed_embedding(db, memory_ids[0]) + + cfg = tmp_path / "config.toml" + cfg.write_text( + f'featherless_api_key = "x"\n' + f'db_path = "{db}"\n' + f'data_dir = "{tmp_path}"\n' + ) + monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg)) + monkeypatch.setenv("CHAT_DB_PATH", str(db)) + + # Patch the client factory the script uses to produce a Mock with + # canned embeddings — one per memory. + from chat.llm.mock import MockLLMClient + + canned_vec = [0.1] * 384 + + def _factory(_settings): + return MockLLMClient( + canned=[], + canned_embeddings=[list(canned_vec) for _ in memory_ids], + ) + + monkeypatch.setattr(backfill, "_build_client", _factory) + + with patch( + "sys.argv", + [ + "backfill_embeddings.py", + "--re-embed-all", + "--model", + "bge-small-en-v1.5", + ], + ): + await backfill.main() + + with open_db(db) as conn: + rows = conn.execute( + "SELECT memory_id, model FROM embeddings ORDER BY memory_id" + ).fetchall() + assert len(rows) == 2 + for _, model in rows: + assert model == "bge-small-en-v1.5" diff --git a/tests/test_branches_state.py b/tests/test_branches_state.py index ace2e8e..12d6030 100644 --- a/tests/test_branches_state.py +++ b/tests/test_branches_state.py @@ -1,11 +1,19 @@ from __future__ import annotations +import logging + from chat.db.connection import open_db from chat.db.migrate import apply_migrations from chat.eventlog.log import append_event from chat.eventlog.projector import project import chat.state.branches # registers handlers -from chat.state.branches import active_branch, get_branch, list_branches +from chat.state.branches import ( + _NO_HEAD_CLAMP, + active_branch, + active_branch_event_ids, + get_branch, + list_branches, +) def test_main_branch_bootstrapped_by_migration(tmp_path): @@ -139,3 +147,116 @@ def test_list_branches_returns_all(tmp_path): names = [b["name"] for b in list_branches(conn)] assert "main" in names assert "experiment" in names + + +def test_branch_switched_unknown_name_warns(tmp_path, caplog): + """Switching to a nonexistent branch logs a warning and leaves no branch active. + + The previous behavior silently cleared is_active flags and applied no UPDATE + when the named branch did not exist. T103 makes that condition observable + by emitting a warning while preserving the existing (zero-active) outcome. + """ + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + with caplog.at_level(logging.WARNING, logger="chat.state.branches"): + append_event( + conn, + kind="branch_switched", + payload={"name": "does_not_exist"}, + ) + project(conn) + + # A warning was emitted naming the missing branch. + warnings = [ + r for r in caplog.records + if r.levelno == logging.WARNING and r.name == "chat.state.branches" + ] + assert warnings, "expected a warning for unknown branch name" + assert any("does_not_exist" in r.getMessage() for r in warnings) + + # Existing behavior preserved: no branch is active after the switch. + assert active_branch(conn) is None + + # The unknown name was not inserted as a side effect. + assert get_branch(conn, "does_not_exist") is None + + +def test_active_branch_event_ids_bootstrap_main_returns_no_clamp(tmp_path): + """Bootstrap "main" (origin=0, head=0) reads as the no-clamp sentinel. + + Migration 0013 seeds main with both event-id columns at 0; production + today never emits ``branch_head_updated`` for main, so head stays at 0 + even as events accumulate. The helper treats this exact bootstrap + state as "all events visible" (lower bound 0, upper bound BIG_INT) so + every existing reader stays branch-agnostic until a non-main branch + becomes active. + """ + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + origin, head = active_branch_event_ids(conn) + assert origin == 0 + assert head == _NO_HEAD_CLAMP + + +def test_active_branch_event_ids_no_active_branch_falls_through(tmp_path): + """No active branch row at all → defensive ``(0, BIG_INT)``. + + A switch to an unknown branch leaves zero rows with ``is_active=1``; + ``active_branch`` returns None. The helper must still hand readers a + workable range (the full log) so the read pipeline doesn't crash on + an inconsistent metadata state. + """ + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + # Switching to a nonexistent branch clears is_active flags + # without setting any other branch active. + append_event( + conn, + kind="branch_switched", + payload={"name": "does_not_exist"}, + ) + project(conn) + assert active_branch(conn) is None + + origin, head = active_branch_event_ids(conn) + assert origin == 0 + assert head == _NO_HEAD_CLAMP + + +def test_active_branch_event_ids_returns_actual_range_for_non_main(tmp_path): + """Non-main branches return their literal ``(origin, head)`` window. + + A branch created at origin=10 + bumped to head=20 must surface as + (10, 20) so readers' ``BETWEEN`` clamp scopes to that window. + """ + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + append_event( + conn, + kind="branch_created", + payload={ + "name": "experiment", + "origin_event_id": 10, + "head_event_id": 10, + "chat_id": "c1", + }, + ) + append_event( + conn, + kind="branch_head_updated", + payload={"name": "experiment", "head_event_id": 20}, + ) + append_event( + conn, + kind="branch_switched", + payload={"name": "experiment"}, + ) + project(conn) + + origin, head = active_branch_event_ids(conn) + assert origin == 10 + assert head == 20 diff --git a/tests/test_branching.py b/tests/test_branching.py index 610bb2e..3b8c3f4 100644 --- a/tests/test_branching.py +++ b/tests/test_branching.py @@ -129,3 +129,279 @@ def test_list_branches_with_metadata_includes_event_count(tmp_path): assert rows["exp"]["origin_event_id"] == 10 assert rows["exp"]["head_event_id"] == 15 assert rows["exp"]["event_count"] == 6 + + +# --------------------------------------------------------------------------- +# T113 read-side filter — cross-feature tests. +# --------------------------------------------------------------------------- +# +# These exercise the active-branch event-id clamp through every reader +# the spec called out: ``read_recent_dialogue`` (turn_common), +# ``_read_recent_dialogue`` (scene_summarize), and ``search_memories`` +# (memory). They drive the readers via real event-log inserts + branch +# switches so the integration is end-to-end. + + +def _seed_user_turn(conn, chat_id: str, prose: str) -> int: + return append_and_apply( + conn, + kind="user_turn", + payload={"chat_id": chat_id, "prose": prose, "segments": []}, + ) + + +def test_read_recent_dialogue_respects_active_branch_head(tmp_path): + """T113 spec test 1: dialogue reader clamps to active branch head. + + Seed 10 user turns; create a branch with origin=1 + head=5 and switch + to it; assert ``read_recent_dialogue`` only returns the first 5 + turns. (The 5 events with id 6..10 fall outside ``[1, 5]``.) + """ + from chat.services.turn_common import read_recent_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(10)] + # 5 events visible after the switch. + branch_from_event( + conn, name="halfway", origin_event_id=ids[0], chat_id="c1" + ) + append_and_apply( + conn, + kind="branch_head_updated", + payload={"name": "halfway", "head_event_id": ids[4]}, + ) + switch_active_branch(conn, name="halfway") + + rows = read_recent_dialogue(conn, "c1") + # The reader returns oldest-first, so the visible-set is the + # first 5 turns. + assert len(rows) == 5 + assert [r["text"] for r in rows] == [f"turn {i}" for i in range(5)] + + +def test_search_memories_respects_active_branch_head(tmp_path): + """T113 spec test 2: memory search clamps to active branch head via + ``memories.event_id``. Memories whose projecting event lands outside + the clamp drop out of FTS results.""" + from chat.eventlog.log import append_and_apply as _aa + from chat.state.memory import search_memories + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + # Two memories projected from real events. The projector handler + # stamps memories.event_id from the projecting event's id. + ev_a = _aa( + conn, + kind="memory_written", + payload={ + "owner_id": "host_bot", + "chat_id": "c1", + "scene_id": 1, + "pov_summary": "alpha keyword present", + "witness_you": 1, + "witness_host": 1, + "witness_guest": 0, + }, + ) + ev_b = _aa( + conn, + kind="memory_written", + payload={ + "owner_id": "host_bot", + "chat_id": "c1", + "scene_id": 1, + "pov_summary": "alpha keyword present too", + "witness_you": 1, + "witness_host": 1, + "witness_guest": 0, + }, + ) + # Branch clamps to ev_a only (head = ev_a; ev_b sits past head). + branch_from_event( + conn, name="early", origin_event_id=ev_a, chat_id="c1" + ) + switch_active_branch(conn, name="early") + + results = search_memories(conn, "host_bot", "host", "alpha") + # Only the first memory should surface — the second's event_id + # exceeds the active branch head. + ids = [r["event_id"] for r in results] + assert ev_a in ids + assert ev_b not in ids + + +def test_branch_switch_changes_visible_events(tmp_path): + """T113 spec test 3: switching branches mid-flight changes the read + immediately. ``read_recent_dialogue`` re-queries on every call.""" + from chat.services.turn_common import read_recent_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)] + + branch_from_event( + conn, name="early", origin_event_id=ids[0], chat_id="c1" + ) + append_and_apply( + conn, + kind="branch_head_updated", + payload={"name": "early", "head_event_id": ids[2]}, + ) + branch_from_event( + conn, name="late", origin_event_id=ids[3], chat_id="c1" + ) + append_and_apply( + conn, + kind="branch_head_updated", + payload={"name": "late", "head_event_id": ids[5]}, + ) + + switch_active_branch(conn, name="early") + early_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")] + assert early_rows == ["turn 0", "turn 1", "turn 2"] + + switch_active_branch(conn, name="late") + late_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")] + assert late_rows == ["turn 3", "turn 4", "turn 5"] + + +def test_main_branch_with_head_zero_returns_empty(tmp_path): + """T113 spec test 4: a non-main branch with head=0 returns empty. + + The bootstrap-main sentinel only fires for ``name=="main", origin=0, + head=0``. A different branch parked at ``origin=0, head=0`` is not a + sentinel and the ``BETWEEN 0 AND 0`` clamp filters out every real + event_log row (rowids start at 1).""" + from chat.services.turn_common import read_recent_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + # Need a real event_log row id 1+ so the clamp's "exclude 0" actually + # has something to exclude — otherwise we trivially return []. + _seed_user_turn(conn, "c1", "turn 0") + + # Force-create a branch at origin=0, head=0 (NOT main). This is an + # artificial state — production never produces it — but it's the + # cleanest way to drive the documented edge case. + append_and_apply( + conn, + kind="branch_created", + payload={ + "name": "stub", + "origin_event_id": 0, + "head_event_id": 0, + "chat_id": "c1", + }, + ) + switch_active_branch(conn, name="stub") + + rows = read_recent_dialogue(conn, "c1") + assert rows == [] + + +def test_no_active_branch_falls_through_to_all_events(tmp_path): + """T113 spec test 5: with no active branch (e.g. a switch to an + unknown name cleared all is_active flags), readers see the full log + via the ``(0, BIG_INT)`` defensive default.""" + from chat.services.turn_common import read_recent_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + for i in range(3): + _seed_user_turn(conn, "c1", f"turn {i}") + + # Switching to an unknown branch leaves zero rows with is_active=1. + append_and_apply( + conn, + kind="branch_switched", + payload={"name": "missing"}, + ) + from chat.state.branches import active_branch as _ab + + assert _ab(conn) is None + + rows = read_recent_dialogue(conn, "c1") + assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"] + + +def test_scene_summarize_read_recent_dialogue_respects_branch(tmp_path): + """T113: ``scene_summarize._read_recent_dialogue`` (the scene-close + summary input) also clamps to the active branch range.""" + from chat.services.scene_summarize import _read_recent_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)] + + branch_from_event( + conn, name="early", origin_event_id=ids[0], chat_id="c1" + ) + append_and_apply( + conn, + kind="branch_head_updated", + payload={"name": "early", "head_event_id": ids[2]}, + ) + switch_active_branch(conn, name="early") + + rows = _read_recent_dialogue(conn, "c1") + assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"] + + +def test_meanwhile_dialogue_reader_respects_branch(tmp_path): + """T113: meanwhile prompt-context reader also clamps to the active + branch. The meanwhile reader filters by ``meanwhile_scene_id``; the + branch filter is composed on top of that filter.""" + from chat.web.meanwhile import _read_recent_meanwhile_dialogue + + db = tmp_path / "t.db" + apply_migrations(db) + with open_db(db) as conn: + # Seed user turns + meanwhile assistant turns interleaved so the + # branch-id clamp lands across both kinds. + u1 = _seed_user_turn(conn, "c1", "u1") + a1 = append_and_apply( + conn, + kind="assistant_turn", + payload={ + "chat_id": "c1", + "speaker_id": "host", + "text": "a1", + "meanwhile_scene_id": 7, + }, + ) + # Past-head turn should NOT appear once we switch to ``early``. + a2 = append_and_apply( + conn, + kind="assistant_turn", + payload={ + "chat_id": "c1", + "speaker_id": "guest", + "text": "a2", + "meanwhile_scene_id": 7, + }, + ) + + branch_from_event( + conn, name="early", origin_event_id=u1, chat_id="c1" + ) + append_and_apply( + conn, + kind="branch_head_updated", + payload={"name": "early", "head_event_id": a1}, + ) + switch_active_branch(conn, name="early") + + rows = _read_recent_meanwhile_dialogue(conn, "c1", scene_id=7) + texts = [r["text"] for r in rows] + assert "a1" in texts + assert "a2" not in texts + # Suppress the "unused" linter warning while keeping the binding + # readable for the test narrative. + _ = a2 diff --git a/tests/test_config.py b/tests/test_config.py index abffd57..bb723bd 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -24,3 +24,25 @@ def test_chat_db_path_env_overrides_default(tmp_path, monkeypatch): (tmp_path / "config.toml").write_text('featherless_api_key = "x"\n') s = load_settings() assert s.db_path == tmp_path / "alt.db" + + +def test_embedding_model_defaults_to_pseudo(tmp_path, monkeypatch): + """T112: ``embedding_model`` defaults to the deterministic pseudo + so existing zero-config installs keep the Phase 4 behavior.""" + monkeypatch.setenv("CHAT_CONFIG_PATH", str(tmp_path / "config.toml")) + (tmp_path / "config.toml").write_text('featherless_api_key = "x"\n') + s = load_settings() + assert s.embedding_model == "pseudo-sha256-384" + + +def test_embedding_model_overridable_via_toml(tmp_path, monkeypatch): + """T112: operators swap the embedding model by editing config.toml. + The new value flows through to the embedding worker at startup.""" + cfg = tmp_path / "config.toml" + cfg.write_text( + 'featherless_api_key = "x"\n' + 'embedding_model = "bge-small-en-v1.5"\n' + ) + monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg)) + s = load_settings() + assert s.embedding_model == "bge-small-en-v1.5" diff --git a/tests/test_drawer_phase4.py b/tests/test_drawer_phase4.py index f94f266..be4d854 100644 --- a/tests/test_drawer_phase4.py +++ b/tests/test_drawer_phase4.py @@ -458,6 +458,183 @@ def test_t98_4_delete_invokes_rewind_and_drops_cascade(client, tmp_path): assert row is None, f"event {ev_id} should have been deleted" +def test_delete_impact_modal_uses_jinja_partial(client, tmp_path): + """T110.3: the modal HTML is rendered from a Jinja partial + (`_delete_impact_modal.html`) rather than f-string concatenation in + Python. Verify the partial-rendered shape: the wrapping + ``delete-impact-modal`` div, the cascade list, and the confirm form. + + The partial inherits Jinja2 autoescape so HTML safety follows + automatically — the explicit ``html.escape()`` calls from T110.2 + become redundant once this lands. + """ + db = tmp_path / "test.db" + _seed_chat(db) + user_id, _bot_id = _seed_turns(db) + + response = client.get( + f"/chats/chat_bot_a/drawer/turn/delete-preview/{user_id}" + ) + assert response.status_code == 200 + body = response.text + + # Markup shape that the partial produces. Double-quoted attributes + # signal Jinja rendering (the prior f-string used single quotes). + assert '
    ' in body + assert '
      ' in body + # The confirm form still posts to the same delete route. + assert f"/chats/chat_bot_a/drawer/turn/delete/{user_id}" in body + assert "Confirm delete" in body + + +def test_delete_impact_modal_escapes_user_controllable_strings(client, tmp_path): + """T110.2: defense-in-depth — fields embedded in the modal HTML come + from event payloads (turn prose, scene timestamps, etc.) which are + ultimately user-controllable. Wrap them with ``html.escape`` so a + payload like ```` renders as inert text and + doesn't leak through into the rendered modal as actual markup. + """ + db = tmp_path / "test.db" + _seed_chat(db) + + # Seed a user_turn whose prose contains an HTML-script payload. The + # modal renders ``description = "turn N (you: )"`` so + # the prose flows verbatim into the cascade list
    • . + with open_db(db) as conn: + evil_id = append_and_apply( + conn, + kind="user_turn", + payload={ + "chat_id": "chat_bot_a", + "prose": "", + "segments": [], + }, + ) + + response = client.get( + f"/chats/chat_bot_a/drawer/turn/delete-preview/{evil_id}" + ) + assert response.status_code == 200 + body = response.text + + # Raw