fix: drawer modal close button + tab redesign

Two bugs and a redesign: 1) **X close button didn't close the modal**. The previous JS bound close via event delegation on the modal root, but panel.addEventListener('click', e => e.stopPropagation()) swallowed the X click before it ever bubbled up. Switched to direct binding on every [data-drawer-close] element with an idempotent guard so HTMX swaps that re-render the panel don't double-bind. 2) **Stale legacy header in the server-rendered drawer body**. The /chats/<id>/drawer endpoint renders its own <header class="drawer-header"> with a duplicate <h2> and a broken inline-onclick close (targets the OLD id="drawer" semantics). Post-process: lift the bot name out of the legacy header into the modal title, then remove the header. 3) **Tabs**. The drawer has 10 sections — too dense as a single stack. Group into 4 tabs: Scene : Scene + Activity Cast : Guest + Group + Edges Story : Events + Threads + Branches Turns : Recent turns + Significance review Implementation is client-side post-swap so the /chats/<id>/drawer server response stays unchanged. Walks .drawer-section blocks, buckets by their <h3>, builds a <nav role="tablist"> and <section role="tabpanel"> tree, and toggles visibility on click. Empty buckets (e.g. no Guest tab on a 1:1 chat) are hidden. Re-runs on every HTMX afterSwap so in-drawer form submits keep the tabs. CSS tabs match the editorial aesthetic: no pills, no fills — a single muted-amber underline rule under the active tab, Newsreader serif label, ink-faint inactive / ink-default active. Empty hover state, focus ring uses the amber accent.
feat: drawer is now a centered modal popup (director's notebook aesthetic)
2026-04-27 15:23:04 -04:00 · 2026-04-27 15:17:34 -04:00 · 2026-04-27 15:07:39 -04:00 · 2026-04-27 14:55:21 -04:00 · 2026-04-27 14:51:49 -04:00 · 2026-04-27 14:43:16 -04:00
70 changed files with 6643 additions and 354 deletions
@@ -5,6 +5,7 @@ data/
 # Python
 .venv/
 .mlx-venv/
 __pycache__/
 *.pyc
 .pytest_cache/
@@ -322,53 +322,48 @@ Phase 4 polish shipped end-to-end across 15 tasks (T88–T102). Vector retrieval
 ### Phase 4.5 / 5 backlog
-New follow-ups discovered during Phase 4 reviews and execution. None are blocking; pick up at any time.
+All items shipped or deferred to Phase 5 (see "Phase 5 backlog" below). Final schema version: 14.
-#### From T88 review
+## Phase 4.5 status
- **`embeddings` FK lacks `ON DELETE CASCADE`**: deindex events are the only deletion path; if memories ever get deleted directly (raw SQL), embedding rows orphan. Defensible since projector model uses explicit deindex events, but worth a comment or `ON DELETE CASCADE` addition.
+Phase 4.5 cleanup shipped 13 of 14 planned tasks (T103–T117 with T115 deferred; T118 is this docs sweep). Two CLAUDE.md backlogs (Phase 3.6/4, Phase 4.5/5) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 5 backlog" section below. Schema baseline advanced from version 13 to **14** (migration 0014: `memories.event_id`). Test count grew from ~413 (Phase 4) to ~457 (+~44 new tests across the wave).
-#### From T89 review
+- **Wave 1 — trivial polish (parallel)**:
  - **T103** branches polish — global-branch (`chat_id IS NULL`) leak documented in `list_branches`; branch-switch to nonexistent name now logs a warning.
  - **T104** `memory.py` DRY — `MAX(id)` helper extracted; `fts_rank=None` contract documented for vector-only rows.
  - **T105** `snapshots.py` polish — `datetime`/`timezone` imports hoisted to module level; strict `kind` validation in restore/preview (rejects missing); `created_at` from file mtime documented.
  - **T106** `search.py` polish — `k=50` extracted to module constant; N+1 `get_bot`/`get_chat`/`get_scene` lookups batched.
  - **T107** `embeddings.py` — `timeout_s` fallback-path warning when non-default model misconfigured.
 - **Wave 2 — scene-close-on-cancel (single)**:
  - **T108** strengthened the T74.3 regression test + documented rationale in `turns.py`. **Surfaced a deferred bug**: existing pin only passes because `asyncio` isn't imported in the test module (NameError caught instead of CancelledError). When CancelledError fires for real, `post_turn`'s end-of-function re-raise causes `open_db`'s dependency teardown to skip `conn.commit()`, rolling back ALL post-cancel writes. Documented and deferred to Phase 5 triage.
 - **Wave 3 — schema 0014 (single)**:
  - **T109** `memories.event_id` column (foundation for T111 deep-link). FK CASCADE on `embeddings.memory_id` deferred (memories rows are never deleted today; defensive constraint can't fire — saved for broader migration cleanup in Phase 5).
 - **Wave 4 — drawer Phase 4.5 bundle (single)**:
  - **T110** `event_id <= 0` guard in `delete_turn` + `html.escape()` on delete-impact modal + Jinja partial extraction + bulk significance re-rate per chat (one `manual_edit` event per memory).
 - **Wave 5 — search UX (single)**:
  - **T111** FTS snippet highlighting via `snippet()` + deep-link to turn via `memories.event_id`.
 - **Wave 6 — real embedding model swap (single)**:
  - **T112** `LLMClient.embed()` Protocol + Mock impl with `canned_embeddings` + `FeatherlessClient.embed()` (raises `NotImplementedError` — Featherless OAI-compat doesn't expose embeddings, gap documented) + `generate_embedding` routes non-default models through `client.embed()` with fallback + `--re-embed-all` backfill flag.
 - **Wave 7 — branching read-side filter (single)**:
  - **T113** `active_branch_event_ids(conn)` helper + applied to `read_recent_dialogue`, `scene_summarize._read_recent_dialogue`, `search_memories`, and `meanwhile._read_recent_meanwhile_dialogue`. Cross-chat search and projector queries deliberately NOT filtered (cross-chat is by design; projectors must see full log). Bootstrap "main" branch (origin=0, head=0) detected as the no-clamp sentinel.
 - **Wave 8 — regenerate lifecycle rollback (single)**:
  - **T114** `triggered_by_assistant_turn_id` payload back-reference on `event_started`/`event_completed`/`event_cancelled` + new `event_status_reverted` event kind + projector handler in `chat/state/events.py` + regenerate flow emits revert events for affected lifecycle transitions.
 - **Wave 9 — final polish + integration (parallel)**:
  - **T115** sqlite-vec swap — **DEFERRED to Phase 5**. Pre-flight failed: host Python build doesn't expose `sqlite3.Connection.enable_load_extension` (raises `AttributeError`). Requires either Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Phase 4 pure-Python cosine remains in production.
  - **T116** structured `CannedQueue` test fixture builder + 2–3 POC test migrations (Phase 5 to migrate the rest).
  - **T117** Phase 4.5 cross-feature integration tests (5 minimum: real embedding swap, branching read-side filter, lifecycle rollback, search deep-link, bulk significance re-rate).
  - **T118** documentation (this section).
- **`list_branches(chat_id=...)` filter leaks global branches** (`chat_id IS NULL`) into every chat scope. Intentional? Document.
+### Phase 5 backlog
 - **Branch-switch to nonexistent silently leaves zero active branches** — log a warning when this would happen.
-#### From T91 review
+New follow-ups discovered during Phase 4.5 reviews and execution, plus carry-over deferrals. None are blocking; pick up at any time.
- **Real embedding model swap**: Phase 4 ships pseudo-embedding (deterministic SHA-256 hash). Phase 4.5+ should swap to a real model (Featherless `bge-small-en-v1.5` if available; or local `sentence-transformers/all-MiniLM-L6-v2`). The 384-dim is hardcoded in `0012_embeddings.sql`; if dim changes, migrate first.
+- **T115 sqlite-vec swap** (environmental blocker): host Python's `sqlite3.Connection` does not expose `enable_load_extension` — `python -c "import sqlite3; sqlite3.connect(':memory:').enable_load_extension(True)"` raises `AttributeError`. Fix requires either a Python rebuild with `--enable-loadable-sqlite-extensions` or migration to `apsw`. Pure-Python cosine remains in production until then.
- **`timeout_s` unused on pseudo path** — fine, but log when non-default model falls through to fallback so misconfigured callers don't silently degrade.
+- **T108 follow-up: cancel-path commit bug** — `post_turn`'s re-raised `CancelledError` causes `open_db` dependency teardown to skip `conn.commit()`, rolling back all post-cancel writes. The existing T74.3 regression test passes only because `asyncio` isn't imported in the test module (NameError masks the real cancel path). Triage required — either commit before re-raise, or restructure the route to never re-raise after the close-detection branch.
-
+- **`embeddings` FK CASCADE on `memory_id`** — deferred from T109; do as part of a broader migration consolidation in Phase 5.
-#### From T96 review
+- **`CannedQueue` fixture migration** — T116 shipped the builder + POC migrations; remaining tests still use positional canned arrays. Migrate in Phase 5.
-
+- **Vector index optimization (HNSW)** — currently scales to a few thousand memories on the flat-index pure-Python cosine path; revisit when counts grow past flat-index feasibility.
- **Duplicate `MAX(id)` lookup** between `_composite_rerank` and the fused-path tail — DRY follow-up.
+- **Branch-isolated `event_log`** — each branch has its own physical `event_log` range vs the current shared id space + head filter; full branch isolation is Phase 5+.
- **`fts_rank=None` for vector-only rows** — document downstream contract.
+- **Embedding model swap migration tooling** — T112 added `--re-embed-all`; a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
-
+- **Real-time collaborative branching** (multi-user) — out of scope for v1.
-#### From T98 review
+- **Avatars / portraits** (multimodality) — deferred indefinitely per design §14.
 - **`event_id <= 0` guard in `delete_turn`** — currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: 400`.
 - **`html.escape()` on `compute_delete_impact` output rendered into the modal** — defense in depth (currently model-controlled strings, but if event payload fields ever appear in descriptions, autoescape needed).
 - **Extract delete-impact modal HTML to a Jinja partial** — testability + autoescape inheritance.
 #### From T99 review
 - **Hoist `datetime`/`timezone` imports to module level** in `chat/web/snapshots.py`.
 - **`kind` defaulting in restore/preview** — reject missing `kind` rather than silent 404.
 - **`created_at` from file mtime** vs filename-encoded timestamp — small drift if files copied; document.
 #### From T100 review
 - **Hardcoded `k=50`** — extract to module constant.
 - **N+1 lookups (`get_bot`/`get_chat`/`get_scene` per row)** — fine at `k=50`, revisit if `k` grows.
 - **FTS highlighting via `snippet()`** — Phase 4 skipped this; UX nice-to-have.
 - **Result links chat-level only** — `memories` table has no `event_id` column; deep-linking to specific turn requires schema addition.
 #### Deferred items
 - **sqlite-vec swap** when host Python supports `enable_load_extension`.
 - **Real embedding model** with proper semantic similarity.
 - **Branching read-side filter**: T89 ships data-model + UI but event readers don't yet consult `is_active`. Each branch is metadata-only labeled ranges. Consult-on-read is Phase 4.5+ work.
 - **Bulk significance re-rate** in drawer (T98.2 deferred — only per-memory edit shipped).
 - **Vector index optimization** (HNSW) — only relevant if memory counts grow past pure-Python feasibility.
 - **`scene-close-on-cancel` UX revisit** (Phase 2.5 carry-over).
 - **Cross-feature canned-queue brittleness fixture builder** (Phase 3 carry-over).
 - **Full lifecycle-rollback in regenerate** — Phase 3.5 T83.4 shipped a warning log; proper rollback needs schema-level back-references (`triggered_by_assistant_turn_id` payload field).
@@ -72,17 +72,40 @@ async def lifespan(app: FastAPI):
    # (free / lower paid tiers cap at 2). Shared across all
    # FeatherlessClient instances in the process.
    from chat.llm.featherless import FeatherlessClient
    from chat.llm.local_mlx import LocalMLXClient
    from chat.llm.router import RoutedLLMClient
    FeatherlessClient.configure_concurrency(settings.featherless_max_concurrent)
    LocalMLXClient.configure_concurrency(settings.local_mlx_max_concurrent)
-    # Background worker for the async significance pass (T22). Each job
+    # Background workers (significance scoring, embedding indexer)
-    # constructs a fresh FeatherlessClient via the factory; tests can
+    # construct a fresh client per job via the factory. Workers route
-    # disable enqueue by toggling ``app.state.background_worker.enabled``.
+    # through the same RoutedLLMClient as request-time traffic so the
    # narrative model still goes to Featherless and the classifier +
    # embeddings hit the local MLX server.
    def _factory():
-        return FeatherlessClient(
+        narrative = FeatherlessClient(
            api_key=settings.featherless_api_key,
            base_url=settings.featherless_base_url,
        )
        classifier = None
        if settings.classifier_provider_order:
            classifier = FeatherlessClient(
                api_key=settings.featherless_api_key,
                base_url=settings.featherless_base_url,
                default_extra_body={
                    "provider": {
                        "order": list(settings.classifier_provider_order)
                    }
                },
            )
        local = LocalMLXClient(base_url=settings.local_mlx_base_url)
        return RoutedLLMClient(
            narrative=narrative,
            classifier=classifier,
            local=local,
            narrative_model=settings.narrative_model,
        )
    worker = BackgroundWorker(settings, llm_client_factory=_factory)
    await worker.start()
@@ -94,9 +117,15 @@ async def lifespan(app: FastAPI):
    # Phase 4's pseudo-embedding path is local so the worker doesn't need
    # an LLM client; we still pass one so the Phase 4.5 swap to a real
    # model is a one-line change.
    # T112 (Phase 4.5): the embedding model is now configurable via
    # ``Settings.embedding_model``. Default ``"pseudo-sha256-384"``
    # keeps the local-only path; swapping to a real model routes
    # through ``client.embed(...)`` and falls back to a zero vector
    # plus warning if the provider doesn't support embeddings.
    embedding_worker = EmbeddingWorker(
        conn_factory=lambda: open_db(settings.db_path),
        client=_factory(),
        model=settings.embedding_model,
    )
    await embedding_worker.start()
    app.state.embedding_worker = embedding_worker
@@ -23,13 +23,22 @@ class Settings(BaseModel):
    retrieval_k: int = 4
    narrative_budget_hard: int = 8000
    narrative_budget_soft: int = 6000
-    # Cap on each generated bot response. ~400 tokens ≈ 1–2 short paragraphs.
+    # Cap on each generated bot response. The asterisk-action format
-    # Bump if you want longer scenes; drop to 200 for terse banter.
+    # (see ``_closing_instruction`` in chat/services/prompt.py) targets
-    narrative_max_tokens: int = 400
+    # 2-3 short interleaved action+dialogue beats. Verbose roleplay
    # narrators (Cydonia, Magnum) ignore the prompt's cap and keep
    # going; ``trim_to_max_beats`` in chat/services/prompt.py handles
    # the actual cap by trimming at a beat boundary post-stream. This
    # max_tokens setting just gives the third beat enough room to
    # complete naturally before max_tokens cuts mid-action: 160 fits
    # 3 substantive beats with margin. Bump to 250 for longer scenes;
    # drop to 80 for terse banter.
    narrative_max_tokens: int = 160
    # Sampling temperature for narrative generation. 0.7 = grounded /
-    # consistent; 0.85 = creative-but-in-character (default); 1.0 = wide
+    # instruction-compliant (current — Cydonia is verbose-by-default and
-    # variety, can drift; >1.0 = often off-the-rails.
+    # tighter temperature helps it respect the 2-3-beat cap);
-    narrative_temperature: float = 0.85
+    # 0.85 = creative; 1.0 = wide variety; >1.0 = often off-the-rails.
    narrative_temperature: float = 0.7
    classifier_budget_hard: int = 4000
    classifier_timeout_s: float = 30.0
    # Featherless free tier and lower paid tiers cap concurrent connections.
@@ -39,6 +48,27 @@ class Settings(BaseModel):
    data_dir: Path = REPO_ROOT / "data"
    bind_host: str = "127.0.0.1"
    bind_port: int = 8000
    # Local MLX server (e.g. ``mlx-omni-server``) — serves any model
    # whose id starts with one of ``local_prefixes`` (default
    # ``"mlx-community/"``). The :class:`RoutedLLMClient` inspects the
    # ``model`` kwarg at call time: local-prefix -> local, else -> remote.
    # ``embed()`` always routes local.
    local_mlx_base_url: str = "http://127.0.0.1:10240/v1"
    local_mlx_max_concurrent: int = 1
    # Optional OpenRouter-style provider pinning for the classifier
    # client. Maps to the ``provider`` field on chat.completions.create
    # via ``extra_body``; the FeatherlessClient (which is just an
    # AsyncOpenAI wrapper) merges it into every call. Useful for forcing
    # Llama-3.1-8B classifier traffic onto Cerebras (~423 tok/s, 10x
    # the default Nebius). Empty list = no pin (provider is
    # OpenRouter's choice).
    classifier_provider_order: list[str] = Field(default_factory=list)
    # T112 (Phase 4.5): embedding model identifier. Default is the
    # deterministic local pseudo so fresh installs / tests don't need
    # any external infra. Override via config.toml to a real model id
    # (e.g. ``"mlx-community/bge-small-en-v1.5-bf16"``) once a local
    # MLX server is running.
    embedding_model: str = "pseudo-sha256-384"
 def load_settings() -> Settings:
    config_path = Path(os.environ.get("CHAT_CONFIG_PATH", DEFAULT_CONFIG))
@@ -7,7 +7,20 @@ from pathlib import Path
@contextmanager
 def open_db(path: Path, *, check_same_thread: bool = True):
    path.parent.mkdir(parents=True, exist_ok=True)
-    conn = sqlite3.connect(path, check_same_thread=check_same_thread)
+    # ``timeout`` here sets SQLite's busy_timeout, in seconds: how long
    # ``conn.execute`` blocks when another connection holds the WAL
    # write lock. The Python default is 5.0, which is fatal for the
    # async chat app: ``conn.execute``'s busy-wait does NOT release the
    # GIL, so a contending background worker (e.g. the embedding worker
    # writing ``embedding_indexed`` while the request handler holds an
    # open transaction) freezes the whole asyncio event loop for up to
    # 5 seconds — silently turning every concurrent LLM call into a 5s
    # wall-clock hit. 0.1s lets contending writers fail fast; callers
    # that need durability should retry, and the embedding worker
    # already logs failures so a missed embedding can be backfilled.
    conn = sqlite3.connect(
        path, check_same_thread=check_same_thread, timeout=0.1
    )
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA foreign_keys=ON")
    try:
@@ -0,0 +1,25 @@
 -- 0014_phase45_schema.sql — Phase 4.5 Wave 2 schema bump (T109).
 --
 -- Two schema concerns are bundled into this migration:
 --
 -- 1. ``embeddings.memory_id`` FK should ideally carry ``ON DELETE CASCADE``
 --    (T88 review nit). DEFERRED to Phase 5: ``embeddings`` rows are only ever
 --    deleted when the parent ``memories`` row is deleted, and ``memories``
 --    rows are never deleted today (memory hide is a soft flag; the surgical
 --    ``deindex_event`` path operates on ``event_log`` and does NOT cascade
 --    to projection rows). The CASCADE constraint therefore can't fire under
 --    current usage — adding the SQLite table-rebuild dance (rename, recreate,
 --    copy, drop, reindex) for a defensive constraint is unwarranted bloat
 --    in a polish wave. Revisit during the broader Phase 5 migration cleanup
 --    when other table reshapes make the rebuild worthwhile.
 --
 -- 2. Add ``memories.event_id`` (NULLABLE INTEGER, references ``event_log.id``)
 --    so cross-chat search results can deep-link back to the originating
 --    turn (foundation for T111). The column is nullable so historical
 --    memory rows projected before 0014 ran continue to round-trip cleanly;
 --    new rows are populated by the ``memory_written`` projector handler
 --    from the projecting event's id. This is a pure additive change — no
 --    backfill is performed. Older rows simply read NULL until/unless a
 --    later migration backfills them; T111 surfaces are coded to accept
 --    NULL gracefully (no deep-link rendered).
 ALTER TABLE memories ADD COLUMN event_id INTEGER REFERENCES event_log(id);
@@ -1,8 +1,13 @@
 from __future__ import annotations
 import asyncio
 import json
 import logging
 from dataclasses import dataclass
-from typing import Any, Iterator
+from typing import Any, Callable, ContextManager, Iterator
-from sqlite3 import Connection
+from sqlite3 import Connection, OperationalError
 _log = logging.getLogger(__name__)
@dataclass
@@ -63,6 +68,52 @@ def append_and_apply(
    return eid
 async def append_and_apply_with_retry(
    conn_factory: Callable[[], ContextManager[Connection]],
    *,
    kind: str,
    payload: dict[str, Any],
    branch_id: int = 1,
    attempts: int = 30,
    base_sleep_s: float = 0.05,
    max_sleep_s: float = 0.5,
 ) -> int | None:
    """Append-and-apply that retries on ``database is locked``.
    Background workers (embedding indexer, significance scorer) write
    events to the same SQLite file as the request handler. The chat
    app sets a tight ``busy_timeout=100ms`` on every connection so a
    contending worker can't freeze the request's asyncio event loop.
    This helper restores durability for workers: it retries up to
    ``attempts`` times with exponential backoff (capped at
    ``max_sleep_s``) until the lock clears.
    Returns the appended event's id, or ``None`` if all retries failed
    (logged at WARNING). Each retry opens a fresh connection via
    ``conn_factory`` because the failed write may have left the prior
    connection in an unusable state.
    """
    sleep = base_sleep_s
    for attempt in range(attempts):
        try:
            with conn_factory() as conn:
                return append_and_apply(
                    conn, kind=kind, payload=payload, branch_id=branch_id
                )
        except OperationalError as exc:
            if "database is locked" not in str(exc).lower():
                raise
            if attempt == attempts - 1:
                _log.warning(
                    "append_and_apply_with_retry: gave up after %d attempts "
                    "(kind=%s): %s",
                    attempts, kind, exc,
                )
                return None
            await asyncio.sleep(sleep)
            sleep = min(sleep * 2, max_sleep_s)
 def read_events(conn: Connection, branch_id: int = 1, after_id: int = 0) -> Iterator[Event]:
    cur = conn.execute(
        "SELECT id, branch_id, ts, kind, payload_json, superseded_by, hidden "
@@ -1,11 +1,13 @@
 from __future__ import annotations
 import json
 import asyncio
 import logging
 from typing import TypeVar
 from pydantic import BaseModel, ValidationError
 from .client import LLMClient, Message
 T = TypeVar("T", bound=BaseModel)
 _log = logging.getLogger(__name__)
 REFUSAL_PATTERNS = ("i can't", "i cannot", "i'm sorry, but", "as an ai")
@@ -31,6 +33,7 @@ async def classify(
    schema: type[T],
    default: T | None = None,
    timeout_s: float = 10.0,
    max_tokens: int = 512,
 ) -> T:
    schema_json = json.dumps(schema.model_json_schema(), indent=2)
    schema_block = (
@@ -41,22 +44,47 @@ async def classify(
        Message(role="system", content=system + schema_block),
        Message(role="user", content=user),
    ]
    # Cap output length so a misbehaving model (e.g. one that ignores
    # ``response_format=json_object`` and generates prose) can't burn
    # several seconds on tokens we'll never use. Classifier responses
    # are small JSON objects — 512 tokens is generous; usual completions
    # are 50-150.
    last_text = None
    last_error: BaseException | None = None
    for attempt in range(3):
        try:
            text = await asyncio.wait_for(
-                client.generate(msgs, model=model, response_format={"type": "json_object"}),
+                client.generate(
                    msgs,
                    model=model,
                    response_format={"type": "json_object"},
                    max_tokens=max_tokens,
                ),
                timeout=timeout_s,
            )
            last_text = text
            cleaned = _strip_json_fences(text)
            if any(p in cleaned.lower()[:80] for p in REFUSAL_PATTERNS) and not cleaned.lstrip().startswith("{"):
                raise ValueError("refusal-shaped response")
            return schema.model_validate_json(cleaned)
-        except (ValidationError, ValueError, json.JSONDecodeError, asyncio.TimeoutError):
+        except (ValidationError, ValueError, json.JSONDecodeError, asyncio.TimeoutError) as exc:
            last_error = exc
            msgs[0] = Message(
                role="system",
                content=system + schema_block + "\n\nRespond with valid JSON ONLY. No prose, no markdown fences.",
            )
            continue
    # Log when we're falling back so flapping classifiers are
    # diagnosable without taking down the request.
    snippet = (last_text or "")[:200].replace("\n", " ")
    _log.warning(
        "classify(%s) exhausted 3 attempts; last_error=%s last_text=%r; "
        "falling back to %s",
        schema.__name__,
        type(last_error).__name__ if last_error else "?",
        snippet,
        "default" if default is not None else "RuntimeError (no default)",
    )
    if default is None:
        raise RuntimeError(f"classify failed for schema {schema.__name__} with no default")
    return default
@@ -12,3 +12,11 @@ class Message:
 class LLMClient(Protocol):
    async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str: ...
    def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]: ...
    # T112 (Phase 4.5): real-embedding seam. Implementations either call a
    # provider's ``/v1/embeddings`` endpoint or, when the provider doesn't
    # expose embeddings (e.g. Featherless today), raise ``NotImplementedError``
    # so ``generate_embedding`` can catch it and degrade to the zero-vector
    # fallback. The Protocol is structural, so this method only needs to
    # exist on implementations; existing callers that don't use it are
    # unaffected.
    async def embed(self, text: str, *, model: str) -> list[float]: ...
@@ -29,19 +29,60 @@ class FeatherlessClient:
            cls._semaphore = asyncio.Semaphore(2)
        return cls._semaphore
-    def __init__(self, api_key: str, base_url: str = "https://api.featherless.ai/v1"):
+    def __init__(
        self,
        api_key: str,
        base_url: str = "https://api.featherless.ai/v1",
        *,
        default_extra_body: dict | None = None,
    ):
        self._client = AsyncOpenAI(api_key=api_key, base_url=base_url)
        # ``default_extra_body`` is merged into every chat.completions.create
        # call's ``extra_body``. Useful with OpenRouter to pin specific
        # upstream providers (e.g. ``{"provider": {"order": ["Cerebras"]}}``
        # for 10x throughput on Llama-3.1-8B). Featherless ignores the
        # field, so it's safe to leave set even when ``base_url`` points
        # back at Featherless.
        self._default_extra_body = default_extra_body or {}
    def _merge_extra_body(self, params: dict) -> dict:
        if not self._default_extra_body:
            return params
        eb = dict(self._default_extra_body)
        eb.update(params.pop("extra_body", {}) or {})
        params["extra_body"] = eb
        return params
    async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str:
        params = self._merge_extra_body(dict(params))
        async with self._sem():
            resp = await self._client.chat.completions.create(
                model=model,
                messages=[{"role": m.role, "content": m.content} for m in messages],
                **params,
            )
            # Diagnostic: stash provider+usage on a side-channel for the
            # router timing log to pick up. OpenRouter sticks a 'provider'
            # field on the response (not part of the OAI spec, but the
            # SDK passes it through on its model dict).
            try:  # pragma: no cover — diagnostic only
                import os as _os
                if _os.environ.get("CHAT_LLM_TIMING") == "1":
                    prov = getattr(resp, "provider", None)
                    usage = getattr(resp, "usage", None)
                    ct = getattr(usage, "completion_tokens", "?") if usage else "?"
                    pt = getattr(usage, "prompt_tokens", "?") if usage else "?"
                    import logging as _logging
                    _logging.getLogger("chat.llm.router").info(
                        "  ↪ provider=%s prompt_toks=%s completion_toks=%s",
                        prov, pt, ct,
                    )
            except Exception:  # pragma: no cover
                pass
            return resp.choices[0].message.content or ""
    async def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]:
        params = self._merge_extra_body(dict(params))
        async with self._sem():
            stream = await self._client.chat.completions.create(
                model=model,
@@ -53,3 +94,34 @@ class FeatherlessClient:
                delta = chunk.choices[0].delta.content or ""
                if delta:
                    yield delta
    async def embed(self, text: str, *, model: str) -> list[float]:
        """Embeddings via Featherless — unsupported in practice.
        T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
        for a future real-embedding swap. Featherless's OpenAI-compatible
        surface routes ``/v1/embeddings`` (no 404), but every request
        returns HTTP 500 ``{"error": {"type": "completions_error", ...}}``
        — including standard names like ``text-embedding-3-small`` and
        ``BAAI/bge-small-en-v1.5``. ``/v1/models`` confirms it: the
        catalog has no embedding-class entries, only chat/completion
        classes (``llama3-*``, ``gemma3-*``, ``glm5-*``, etc.).
        Rather than ship a request that always 500s, this implementation
        raises ``NotImplementedError``. The
        :func:`chat.services.embeddings.generate_embedding` wrapper
        catches it and degrades to the existing zero-vector fallback
        (with the T107 warning), so misconfigured callers fail loudly in
        logs but the request path keeps working.
        For real embeddings, configure a different provider (OpenAI
        direct, Cohere, Voyage, Together, self-hosted Ollama /
        sentence-transformers). The Mock + routing seam from T112 keeps
        the swap to a one-class change in ``chat/llm/``.
        """
        raise NotImplementedError(
            "Featherless /v1/embeddings always returns 500 "
            '("completions_error") and the model catalog has no '
            "embedding class; configure a different embedding provider "
            "or stick with the default pseudo-sha256-384 model."
        )
@@ -0,0 +1,95 @@
 """Local MLX OpenAI-compatible client.
 Talks to a locally-running MLX server (e.g., ``mlx-omni-server``) over
 the same OpenAI surface that :class:`chat.llm.featherless.FeatherlessClient`
 uses, via :class:`openai.AsyncOpenAI`. The underlying server runs MLX
 models on Apple Silicon (M-series) for chat completions AND embeddings.
 Use cases (Phase 4.5+):
 - Classifier traffic moved off Featherless to local MLX (cost + latency).
 - Embeddings via ``client.embed`` actually work — Featherless's
  ``/v1/embeddings`` always returns 500.
 Constructor takes a ``base_url`` (e.g., ``"http://127.0.0.1:10240/v1"``)
 and an optional ``api_key`` (most local MLX servers don't authenticate;
 the OpenAI SDK requires *some* string, so we default to a placeholder).
 """
 from __future__ import annotations
 import asyncio
 from typing import AsyncIterator, Sequence
 from openai import AsyncOpenAI
 from .client import Message
 class LocalMLXClient:
    """OpenAI-compatible client for a local MLX server.
    The server is single-process by default (``mlx-omni-server`` loads
    one model at a time and swaps on demand). The class-level semaphore
    serializes concurrent requests so we never queue more than
    ``max_concurrent`` at a time — defaults to 1, since MLX inference
    on a single M-series device is sequential anyway.
    """
    _semaphore: asyncio.Semaphore | None = None
    @classmethod
    def configure_concurrency(cls, max_concurrent: int) -> None:
        cls._semaphore = asyncio.Semaphore(max(1, int(max_concurrent)))
    @classmethod
    def _sem(cls) -> asyncio.Semaphore:
        if cls._semaphore is None:
            cls._semaphore = asyncio.Semaphore(1)
        return cls._semaphore
    def __init__(
        self,
        base_url: str = "http://127.0.0.1:10240/v1",
        api_key: str = "not-needed",
    ):
        self._client = AsyncOpenAI(api_key=api_key, base_url=base_url)
    async def generate(
        self, messages: Sequence[Message], *, model: str, **params
    ) -> str:
        async with self._sem():
            resp = await self._client.chat.completions.create(
                model=model,
                messages=[{"role": m.role, "content": m.content} for m in messages],
                **params,
            )
            return resp.choices[0].message.content or ""
    async def stream(
        self, messages: Sequence[Message], *, model: str, **params
    ) -> AsyncIterator[str]:
        async with self._sem():
            stream = await self._client.chat.completions.create(
                model=model,
                messages=[{"role": m.role, "content": m.content} for m in messages],
                stream=True,
                **params,
            )
            async for chunk in stream:
                delta = chunk.choices[0].delta.content or ""
                if delta:
                    yield delta
    async def embed(self, text: str, *, model: str) -> list[float]:
        """Return an embedding vector for ``text`` using the named model.
        Targets ``/v1/embeddings`` on the local MLX server; the server
        loads the model on first request and caches it. The embedding
        model is independent of the chat model loaded for ``generate``
        / ``stream`` (the server can serve both).
        """
        async with self._sem():
            resp = await self._client.embeddings.create(
                model=model,
                input=text,
            )
            return list(resp.data[0].embedding)
@@ -4,8 +4,23 @@ from .client import Message
 class MockLLMClient:
-    def __init__(self, canned: list[str]):
+    """In-memory LLMClient for tests.
    ``canned`` feeds ``generate``/``stream`` (one entry per call, popped
    from the front). ``canned_embeddings`` (T112, Phase 4.5) feeds
    ``embed`` the same way — each call pops the next vector. An empty
    queue raises ``IndexError`` so misconfigured tests fail loudly
    rather than returning ``None`` or hanging.
    """
    def __init__(
        self,
        canned: list[str],
        *,
        canned_embeddings: list[list[float]] | None = None,
    ):
        self._canned = list(canned)
        self._canned_embeddings: list[list[float]] = list(canned_embeddings or [])
    async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str:
        return self._canned.pop(0)
@@ -14,3 +29,8 @@ class MockLLMClient:
        text = self._canned.pop(0)
        for ch in text:
            yield ch
    async def embed(self, text: str, *, model: str) -> list[float]:
        # Mirrors the canned-queue pattern; empty queue raises so
        # misconfigured tests surface clearly instead of returning None.
        return self._canned_embeddings.pop(0)
@@ -0,0 +1,149 @@
 """Routed LLM client — splits traffic across multiple backends by model.
 Phase 4.5+ deployment: the 24B narrative model stays on Featherless,
 the 8B classifier model moves to local MLX, and embeddings run on a
 local BGE/MLX model. One :class:`LLMClient` interface, two underlying
 backends, dispatched by the ``model`` argument at every call site.
 Routing rule: requests whose ``model`` argument matches the configured
 ``narrative_model`` go to the narrative backend; everything else
 (classifier, embeddings, future locally-hosted models) goes to the
 local backend.
 Set the env var ``CHAT_LLM_TIMING=1`` to log per-call timing at INFO
 level. Useful for finding the slow link in a turn.
 """
 from __future__ import annotations
 import logging
 import os
 import time
 from typing import AsyncIterator, Sequence
 from .client import LLMClient, Message
 _log = logging.getLogger(__name__)
 _TIMING = os.environ.get("CHAT_LLM_TIMING") == "1"
 if _TIMING and not _log.handlers:
    # Wire a stderr handler when timing is enabled so the per-call
    # logs show up under uvicorn (which doesn't configure non-uvicorn
    # loggers by default).
    _h = logging.StreamHandler()
    _h.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(name)s: %(message)s"))
    _log.addHandler(_h)
    _log.setLevel(logging.INFO)
    _log.propagate = False
 class RoutedLLMClient:
    """Delegates to one of two underlying clients based on ``model``.
    Routing rule: any model id starting with one of ``local_prefixes``
    goes to the local backend (e.g. ``"mlx-community/"`` for models
    served by ``mlx-omni-server``). Everything else — narrative model,
    remote classifiers, anything on a hosted provider — routes to the
    remote backend.
    ``embed`` always routes locally (the remote provider doesn't
    expose a working ``/v1/embeddings``; see
    :class:`chat.llm.featherless.FeatherlessClient.embed`).
    """
    def __init__(
        self,
        *,
        narrative: LLMClient,
        local: LLMClient,
        narrative_model: str,
        classifier: LLMClient | None = None,
        local_prefixes: tuple[str, ...] = ("mlx-community/",),
    ) -> None:
        # ``classifier`` is an optional separate backend for the
        # classifier model. Useful when classifier and narrative both
        # live on a remote OpenRouter-style provider but need different
        # provider-pinning (e.g. Cerebras for the 8B classifier,
        # default Friendli/etc. for the narrative). When ``classifier``
        # is None, classifier traffic falls through to ``narrative``
        # (the remote client) so old wiring keeps working.
        self._narrative = narrative
        self._classifier = classifier
        self._local = local
        self._narrative_model = narrative_model
        self._local_prefixes = local_prefixes
    def _pick(self, model: str) -> LLMClient:
        if any(model.startswith(p) for p in self._local_prefixes):
            return self._local
        if model == self._narrative_model:
            return self._narrative
        # Anything else (most importantly, the classifier model) goes
        # to the classifier client when configured, otherwise to the
        # narrative remote client.
        return self._classifier or self._narrative
    async def generate(
        self, messages: Sequence[Message], *, model: str, **params
    ) -> str:
        client = self._pick(model)
        backend = (
            "narrative" if client is self._narrative else
            "classifier" if client is self._classifier else
            "local"
        )
        if not _TIMING:
            return await client.generate(messages, model=model, **params)
        in_chars = sum(len(m.content) for m in messages)
        _log.info("LLM generate START [%s] %s in_chars=%d", backend, model, in_chars)
        t0 = time.perf_counter()
        try:
            return await client.generate(messages, model=model, **params)
        finally:
            _log.info(
                "LLM generate END   [%s] %s in_chars=%d %.2fs",
                backend, model, in_chars, time.perf_counter() - t0,
            )
    async def stream(
        self, messages: Sequence[Message], *, model: str, **params
    ) -> AsyncIterator[str]:
        client = self._pick(model)
        backend = (
            "narrative" if client is self._narrative else
            "classifier" if client is self._classifier else
            "local"
        )
        if not _TIMING:
            async for chunk in client.stream(messages, model=model, **params):
                yield chunk
            return
        t0 = time.perf_counter()
        ttft = None
        chars_out = 0
        try:
            async for chunk in client.stream(messages, model=model, **params):
                if ttft is None:
                    ttft = time.perf_counter() - t0
                chars_out += len(chunk)
                yield chunk
        finally:
            dt = time.perf_counter() - t0
            in_chars = sum(len(m.content) for m in messages)
            _log.info(
                "LLM stream   [%s] %s in_chars=%d out_chars=%d ttft=%.2fs total=%.2fs",
                backend, model, in_chars, chars_out, ttft or 0.0, dt,
            )
    async def embed(self, text: str, *, model: str) -> list[float]:
        # Embeddings always run on the local backend — the remote
        # provider doesn't expose a working ``/v1/embeddings`` endpoint.
        if not _TIMING:
            return await self._local.embed(text, model=model)
        t0 = time.perf_counter()
        try:
            return await self._local.embed(text, model=model)
        finally:
            _log.info(
                "LLM embed    [local] %s in_chars=%d %.2fs",
                model, len(text), time.perf_counter() - t0,
            )
@@ -30,7 +30,7 @@ from typing import Callable
 from chat.config import Settings
 from chat.db.connection import open_db
-from chat.eventlog.log import append_and_apply
+from chat.eventlog.log import append_and_apply, append_and_apply_with_retry
 from chat.llm.client import LLMClient
 from chat.services.backup import (
    prune_backups,
@@ -169,16 +169,22 @@ class BackgroundWorker:
            narrative_text=job.narrative_text,
            prior_dialogue=job.prior_dialogue,
        )
-        with open_db(self._settings.db_path) as conn:
+        # Retry-on-lock: see chat/eventlog/log.py's
-            append_and_apply(
+        # ``append_and_apply_with_retry`` docstring for why workers
-                conn,
+        # need to retry while the request handler's open transaction
-                kind="memory_significance_set",
+        # holds the WAL write lock briefly.
-                payload={
+        appended_id = await append_and_apply_with_retry(
-                    "memory_id": job.memory_id,
+            lambda: open_db(self._settings.db_path),
-                    "significance": score,
+            kind="memory_significance_set",
-                },
+            payload={
-            )
+                "memory_id": job.memory_id,
-            if score >= 3:
+                "significance": score,
            },
        )
        # Auto-pin requires a separate connection because retry-helper
        # closed its own. Skip if the significance event itself failed.
        if appended_id is not None and score >= 3:
            with open_db(self._settings.db_path) as conn:
                _auto_pin_with_cap(
                    conn,
                    owner_id=job.host_bot_id,
@@ -26,13 +26,28 @@ def search_all_memories(
    """Search FTS5 across all owners and chats.
    Returns rows with ``{memory_id, owner_id, chat_id, scene_id,
-    pov_summary, significance, ts, fts_rank}``, sorted by FTS5 BM25
+    event_id, pov_summary, snippet, significance, ts, fts_rank}``,
-    rank ascending (lower rank = stronger match, surfaced first).
+    sorted by FTS5 BM25 rank ascending (lower rank = stronger match,
    surfaced first).
    ``event_id`` (T111.2 / T109) is the id of the ``event_log`` row that
    drove the projecting ``memory_written`` event. May be ``None`` for
    memory rows projected before the 0014 schema migration ran (the
    column is nullable on purpose; T109 did not backfill historical
    rows). The search-results UI uses it to deep-link to the originating
    turn anchor (Phase 3.5 T86 stamps ``id="turn-{event_id}"`` on each
    turn DOM node) and falls back to a chat-level link when ``None``.
    The ``memories`` table has no ``ts`` column; we expose ``created_at``
    (the projector-side row insertion timestamp) under that key so the
    UI does not have to know the storage name.
    ``snippet`` (T111.1) is the FTS5 ``snippet()`` output for the
    matched ``pov_summary`` column: a windowed excerpt with each match
    token wrapped in ``<mark>...</mark>`` for the search-results UI to
    render verbatim. The full ``pov_summary`` is also returned so
    non-highlighted callers (or fallbacks) keep the original string.
    An empty / whitespace-only ``query`` short-circuits to ``[]`` to
    avoid an FTS5 ``MATCH ''`` syntax error and to keep the top-bar
    "no input yet" state from triggering a full-table scan.
@@ -45,9 +60,20 @@ def search_all_memories(
    # from the content table because the FTS index only stores
    # ``pov_summary``. ORDER BY rank ASC because BM25 in FTS5 returns
    # negative scores where lower is better.
    #
    # ``snippet(memories_fts, 0, ...)`` (T111.1) targets column 0 of the
    # FTS virtual table, which is ``pov_summary`` (the only column
    # indexed by ``CREATE VIRTUAL TABLE memories_fts USING fts5(
    # pov_summary, ...)`` in migration 0006). SQLite passes the raw
    # column text through verbatim aside from inserting the configured
    # before/after match markers, so the only HTML in the output is the
    # ``<mark>`` we injected — safe to render with ``|safe`` server-side.
    rows = conn.execute(
-        "SELECT m.id, m.owner_id, m.chat_id, m.scene_id, "
+        "SELECT m.id, m.owner_id, m.chat_id, m.scene_id, m.event_id, "
-        "       m.pov_summary, m.significance, m.created_at, "
+        "       m.pov_summary, "
        "       snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32) "
        "       AS snippet, "
        "       m.significance, m.created_at, "
        "       memories_fts.rank "
        "FROM memories_fts "
        "JOIN memories m ON m.id = memories_fts.rowid "
@@ -63,10 +89,12 @@ def search_all_memories(
            "owner_id": r[1],
            "chat_id": r[2],
            "scene_id": r[3],
-            "pov_summary": r[4],
+            "event_id": r[4],
-            "significance": r[5],
+            "pov_summary": r[5],
-            "ts": r[6],
+            "snippet": r[6],
-            "fts_rank": r[7],
+            "significance": r[7],
            "ts": r[8],
            "fts_rank": r[9],
        }
        for r in rows
    ]
@@ -26,7 +26,7 @@ from dataclasses import dataclass
 from sqlite3 import Connection
 from typing import Callable
-from chat.eventlog.log import append_and_apply
+from chat.eventlog.log import append_and_apply_with_retry
 from chat.services.embeddings import (
    DEFAULT_EMBEDDING_DIM,
    DEFAULT_EMBEDDING_MODEL,
@@ -121,17 +121,22 @@ class EmbeddingWorker:
                job.memory_id,
            )
            return
-        with self._conn_factory() as conn:
+        # Retry-on-lock: the request handler holds an open transaction
-            append_and_apply(
+        # for the duration of post_turn (a few seconds), so any worker
-                conn,
+        # write started during that window blocks. open_db's
-                kind="embedding_indexed",
+        # busy_timeout is 100ms (so the request path itself can't get
-                payload={
+        # stuck on a worker), so retry here with backoff. Each retry
-                    "memory_id": job.memory_id,
+        # opens a fresh connection via ``conn_factory``.
-                    "model": result.model,
+        await append_and_apply_with_retry(
-                    "dim": result.dim,
+            self._conn_factory,
-                    "vector": result.vector,
+            kind="embedding_indexed",
-                },
+            payload={
-            )
+                "memory_id": job.memory_id,
                "model": result.model,
                "dim": result.dim,
                "vector": result.vector,
            },
        )
 __all__ = ["EmbeddingJob", "EmbeddingWorker"]
@@ -10,6 +10,7 @@ EmbeddingResult shape stays the same, only the generator changes.
 from __future__ import annotations
 import hashlib
 import logging
 import math
 import struct
@@ -18,6 +19,8 @@ from pydantic import BaseModel
 from chat.llm.client import LLMClient
 _log = logging.getLogger(__name__)
 DEFAULT_EMBEDDING_DIM = 384
 DEFAULT_EMBEDDING_MODEL = "pseudo-sha256-384"
 FALLBACK_EMBEDDING_MODEL = "fallback"
@@ -92,11 +95,27 @@ async def generate_embedding(
        # Pure-local pseudo path — no LLMClient call.
        return EmbeddingResult(vector=_pseudo_embed(text, dim), model=model, dim=dim)
-    # Future: real embedding via client.embed(...). Phase 4.5 work.
+    # T112 (Phase 4.5): non-default model — route through the client's
-    # For Phase 4, any non-default model falls through to fallback.
+    # ``embed()`` method. On any failure (including ``NotImplementedError``
-    return EmbeddingResult(
+    # from providers that don't expose embeddings, e.g. Featherless today),
-        vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim
+    # fall back to the zero vector and re-fire the T107 warning so
-    )
+    # misconfigured callers see the issue in logs rather than silently
    # producing useless cosine results.
    try:
        vector = await client.embed(text, model=model)
        return EmbeddingResult(vector=list(vector), model=model, dim=len(vector))
    except Exception as exc:  # noqa: BLE001 — any failure must degrade gracefully
        _log.warning(
            "generate_embedding: non-default model %r returned fallback "
            "(client.embed() raised %s: %s); "
            "downstream search will degrade silently. Configure a supported model.",
            model,
            type(exc).__name__,
            exc,
        )
        return EmbeddingResult(
            vector=[0.0] * dim, model=FALLBACK_EMBEDDING_MODEL, dim=dim
        )
 __all__ = [
@@ -4,13 +4,15 @@ Wraps single-pair compute_state_update to run state updates for ALL
 directed pairs of present entities. With 3 present entities (you, host,
 guest) that's 6 directed pairs. With 2 present (you, host) it's 2 pairs.
-Calls run sequentially to respect Featherless's 2-connection cap (the
+Pairs run concurrently via :func:`asyncio.gather`; the underlying
-client-level semaphore would serialize them anyway, but doing it here
+client should impose its own concurrency cap if the upstream provider
-keeps the failure surface clean — a hung pair doesn't queue behind
+needs it (e.g., Featherless's 2-conn semaphore). Returning order is
-itself).
+preserved (natural iteration over ``present_ids x present_ids``,
 src != tgt) so downstream event-append order stays deterministic.
 """
 from __future__ import annotations
 import asyncio
 from chat.llm.client import LLMClient
 from chat.services.state_update import StateUpdate, compute_state_update
@@ -28,35 +30,44 @@ async def compute_state_updates_for_present(
    timeout_s: float = 30.0,
 ) -> list[tuple[str, str, StateUpdate]]:
    """Run compute_state_update for every directed pair (src != tgt) over
-    ``present_ids``. Returns list of ``(source_id, target_id, update)``
+    ``present_ids``, concurrently. Returns list of
-    tuples in the natural iteration order over ``present_ids x present_ids``.
+    ``(source_id, target_id, update)`` tuples in the natural iteration
    order over ``present_ids x present_ids`` — concurrent dispatch does
    not change the returned order.
    A single failing pair falls back to the schema-default StateUpdate
-    (zero deltas, empty facts) inside ``compute_state_update``; the batch
+    (zero deltas, empty facts) inside ``compute_state_update``; sibling
-    keeps going.
+    pairs continue independently because each call is wrapped in its
    own try/except inside ``compute_state_update``.
    """
-    out: list[tuple[str, str, StateUpdate]] = []
+    pair_keys: list[tuple[str, str]] = [
-    for src in present_ids:
+        (src, tgt)
-        for tgt in present_ids:
+        for src in present_ids
-            if src == tgt:
+        for tgt in present_ids
-                continue
+        if src != tgt
-            edge = prior_edges.get((src, tgt), {})
+    ]
-            update = await compute_state_update(
+    if not pair_keys:
-                client,
+        return []
-                model=classifier_model,
+
-                source_id=src,
+    async def _one(src: str, tgt: str) -> StateUpdate:
-                target_id=tgt,
+        edge = prior_edges.get((src, tgt), {})
-                source_name=present_names.get(src, src),
+        return await compute_state_update(
-                source_persona=personas.get(src, "") or "",
+            client,
-                target_name=present_names.get(tgt, tgt),
+            model=classifier_model,
-                prior_affinity=int(edge.get("affinity", 50)),
+            source_id=src,
-                prior_trust=int(edge.get("trust", 50)),
+            target_id=tgt,
-                prior_summary=edge.get("summary", "") or "",
+            source_name=present_names.get(src, src),
-                recent_dialogue=recent_dialogue,
+            source_persona=personas.get(src, "") or "",
-                timeout_s=timeout_s,
+            target_name=present_names.get(tgt, tgt),
-            )
+            prior_affinity=int(edge.get("affinity", 50)),
-            out.append((src, tgt, update))
+            prior_trust=int(edge.get("trust", 50)),
-    return out
+            prior_summary=edge.get("summary", "") or "",
            recent_dialogue=recent_dialogue,
            timeout_s=timeout_s,
        )
    updates = await asyncio.gather(*(_one(src, tgt) for src, tgt in pair_keys))
    return [(src, tgt, upd) for (src, tgt), upd in zip(pair_keys, updates)]
 __all__ = ["compute_state_updates_for_present"]
@@ -325,14 +325,59 @@ def _build_open_threads_block(threads: list[dict]) -> str | None:
    return "\n".join(lines)
 def trim_to_max_beats(text: str, max_beats: int = 3) -> str:
    """Truncate ``text`` to at most ``max_beats`` asterisk-action beats.
    A "beat" is one ``*action*`` markdown-italic block plus the dialogue
    that follows it; counting ``*`` characters works as a deterministic
    boundary detector since each complete beat contributes exactly two
    asterisks (open + close). The (2*max_beats + 1)th asterisk is the
    opening of an over-the-cap beat; we trim immediately before it and
    strip trailing whitespace.
    Belt-and-suspenders for verbose roleplay-tuned narrators (Cydonia,
    Magnum, etc.) that reliably ignore "HARD CAP: 2-3 beats" prompt
    instructions and keep going. A physical max_tokens cap helps but
    truncates mid-word; this trims at a beat boundary instead.
    Idempotent and safe on outputs with fewer beats than the cap (just
    returns the text unchanged after a single pass).
    """
    if max_beats <= 0:
        return ""
    target = max_beats * 2
    count = 0
    for i, ch in enumerate(text):
        if ch == "*":
            count += 1
            if count > target:
                return text[:i].rstrip()
    return text
 def _closing_instruction(speaker_name: str, addressee_name: str) -> str:
    return (
-        f"Continue the scene as {speaker_name}, in their voice, responding "
+        f"Continue as {speaker_name}. Format strictly:\n"
-        "naturally. Use *asterisks* for actions and quotes for dialogue. "
+        f"- Wrap actions and gestures in *asterisks*, third person "
-        f"Stay in character. Do not narrate {addressee_name}'s actions or "
+        f"({speaker_name}/she/he/they) — never first person, never inner "
-        "thoughts. "
+        "thoughts inside asterisks.\n"
-        "Keep your response to a single beat — one or two short paragraphs "
+        "- Speak dialogue as plain text between action beats, no quote "
-        "at most. Don't monologue; leave room for the other person to react."
+        "marks. Keep speech fragmented, not paragraphs.\n"
        "- HARD CAP: 2-3 beats per response. A beat is one *asterisk "
        "action* paired with a short dialogue fragment. After the "
        "third beat, STOP — do not add a fourth, do not summarize, do "
        f"not narrate {addressee_name}'s reaction. Long responses break "
        "the scene's rhythm.\n"
        "- Each beat is one concrete gesture or sensory image. No "
        "explanation, no inner monologue, no stage-direction adverbs.\n"
        "- Trailing ellipses (...) are fine for emotional weight.\n"
        "EXAMPLE (3 beats, stops cleanly):\n"
        "*She turns with soapy hands to cup your face* That's how I know "
        "it's real... *She kisses you softly* You love me when I'm messy... "
        "*She smiles tearfully* ...and every moment in between.\n"
        f"Show only what {addressee_name} could externally observe of "
        f"{speaker_name}; never narrate {addressee_name}'s actions, "
        "thoughts, or speech. One response — leave room to react."
    )
@@ -95,6 +95,27 @@ from chat.web.render import render_turn_html
 _log = logging.getLogger(__name__)
 # T114.3: map a lifecycle-transition event kind to the events-table
 # status it implicitly transitioned *from*. Regenerate uses this to pick
 # the ``prior_status`` value for the ``event_status_reverted`` rollback
 # event so the projector sets the row back to where it was before the
 # superseded turn fired the transition.
 #
 # - ``event_started`` was emitted when the row was 'planned' → revert to
 #   'planned'.
 # - ``event_completed`` was emitted when the row was 'active' → revert
 #   to 'active'.
 # - ``event_cancelled`` could have fired from either 'planned' or
 #   'active'. Best-effort default: 'active'. The forward transitions
 #   below only fire detect_event_transitions for currently-active rows,
 #   so 'active' is the realistic prior in practice.
 _PRIOR_STATUS_MAP: dict[str, str] = {
    "event_started": "planned",
    "event_completed": "active",
    "event_cancelled": "active",
 }
 async def regenerate_assistant_turn(
    conn: Connection,
    client,
@@ -115,17 +136,18 @@ async def regenerate_assistant_turn(
    cannot be found — the FastAPI route translates this to 404.
    .. note::
-       **Lifecycle-rollback limitation (T83.4, Phase 4 follow-up).**
+       **Lifecycle rollback (T114, Phase 4.5).**
       When the superseded turn already produced lifecycle transitions
       (``event_started`` / ``event_completed`` / ``event_cancelled``),
-       this function does NOT roll those rows back before re-running
+       this function emits an ``event_status_reverted`` event for each
-       ``detect_event_transitions`` against the regenerated text. A
+       so the events row's status returns to its prior value before the
-       regenerate-after-completion can therefore double-emit promotion
+       regenerated narrative is reclassified. Backward compatibility:
-       artifacts if the new text re-completes the same event. Phase 3.5
+       lifecycle events authored before T114.1 lack the
-       only documents the gap and emits a WARNING log naming the
+       ``triggered_by_assistant_turn_id`` payload field; rollback skips
-       affected event_log ids; the actual undo pass is invasive
+       those (logged at DEBUG) so historic rows are not retroactively
-       (re-projection / inverse-handler dispatch) and is deferred to
+       reverted. A WARNING about un-rolled-back transitions is still
-       Phase 4. See the ``# T83.4`` block below for the warning emit.
+       emitted when stragglers are found — the rollback handles the
       common case while older logs continue to need manual review.
    """
    chat = get_chat(conn, chat_id)
    if chat is None:
@@ -158,20 +180,21 @@ async def regenerate_assistant_turn(
    original_assistant_payload = json.loads(row[0])
    original_user_turn_id = original_assistant_payload.get("user_turn_id")
-    # T83.4: scan for downstream lifecycle transitions emitted by the
+    # T114.3: roll back lifecycle transitions emitted by the superseded
-    # superseded turn — they're not being rolled back (see method
+    # turn. The scan uses the same id-greater-than-superseded-turn
-    # docstring). Heuristic: any ``event_started`` / ``event_completed``
+    # heuristic as the legacy T83.4 warning, joined to ``events`` for
-    # / ``event_cancelled`` event_log row with id strictly greater than
+    # chat scoping (lifecycle events don't carry chat_id in their
-    # the original assistant_turn's id was emitted as part of (or after)
+    # payload — they reference an ``event_id`` FK to the ``events``
-    # that turn's processing. Lifecycle events don't carry ``chat_id``
+    # table, which holds chat_id). For each row whose payload carries
-    # in their payload (their payload references an ``event_id`` FK to
+    # ``triggered_by_assistant_turn_id == original_assistant_event_id``
-    # the ``events`` table, which holds chat_id), so we join through
+    # (T114.1 back-reference), emit an ``event_status_reverted`` event
-    # ``events`` to scope to this chat.
+    # so the events-row status returns to the pre-transition value.
-    #
+    # Lifecycle rows authored before T114.1 lack the back-reference;
-    # A WARNING log surfaces the affected event ids so operators can
+    # those are skipped (DEBUG log) and a WARNING tracks their count so
-    # spot double-emit cases until the Phase 4 rollback pass lands.
+    # operators still see legacy stragglers — preserves the T83.4
    # observability contract for un-rolled-back transitions.
    unrolled_lifecycle = conn.execute(
-        "SELECT el.id, el.kind FROM event_log AS el "
+        "SELECT el.id, el.kind, el.payload_json FROM event_log AS el "
        "JOIN events AS ev "
        "  ON ev.event_id = json_extract(el.payload_json, '$.event_id') "
        "WHERE el.kind IN ("
@@ -182,18 +205,73 @@ async def regenerate_assistant_turn(
        "ORDER BY el.id ASC",
        (chat_id, original_assistant_event_id),
    ).fetchall()
-    if unrolled_lifecycle:
+    rolled_back_ids: list[int] = []
-        # T90.2: phrased as "at-or-after turn <id>" rather than "from
+    skipped_no_backref: list[int] = []
-        # superseded turn" because regenerating an OLDER turn lists
+    for el_id, el_kind, el_payload_json in unrolled_lifecycle:
-        # intervening-turn transitions that legitimately stand on their
+        try:
-        # own — those weren't authored by the superseded turn itself.
+            lifecycle_payload = json.loads(el_payload_json)
        except (TypeError, ValueError):
            skipped_no_backref.append(el_id)
            continue
        triggered_by = lifecycle_payload.get("triggered_by_assistant_turn_id")
        if triggered_by != original_assistant_event_id:
            # Either a legacy row (no field) or a transition triggered
            # by a *different* turn — leave it alone. DEBUG so the
            # message is available under verbose logging without
            # spamming the default WARNING channel.
            _log.debug(
                "regenerate_assistant_turn: skipping rollback for "
                "lifecycle event_log id=%d (kind=%s) — no back-reference "
                "or different turn (triggered_by=%r vs superseded=%d)",
                el_id,
                el_kind,
                triggered_by,
                original_assistant_event_id,
            )
            if triggered_by is None:
                skipped_no_backref.append(el_id)
            continue
        prior_status = _PRIOR_STATUS_MAP.get(el_kind)
        if prior_status is None:
            # Defensive: the SQL filter already restricts to the three
            # known kinds, but a future schema addition shouldn't crash
            # the rollback path.
            continue
        target_event_id = lifecycle_payload.get("event_id")
        if target_event_id is None:
            continue
        append_and_apply(
            conn,
            kind="event_status_reverted",
            payload={
                "event_id": target_event_id,
                "prior_status": prior_status,
            },
        )
        rolled_back_ids.append(el_id)
    if rolled_back_ids:
        _log.info(
            "regenerate_assistant_turn: rolled back %d lifecycle "
            "transition(s) triggered by superseded turn %s "
            "(event_log ids: %s)",
            len(rolled_back_ids),
            original_assistant_event_id,
            rolled_back_ids,
        )
    if skipped_no_backref:
        # T83.4 (legacy) compatibility: still warn about stragglers
        # without the back-reference so operators can spot pre-T114
        # double-emit risks. Phrased as "at-or-after turn <id>" per
        # T90.2 — older transitions may legitimately belong to other
        # turns.
        _log.warning(
            "regenerate_assistant_turn: %d lifecycle transition(s) "
-            "at-or-after turn %s are NOT being rolled back (Phase 4 "
+            "at-or-after turn %s are NOT being rolled back (no "
-            "follow-up). Affected event ids: %s",
+            "triggered_by_assistant_turn_id back-reference). "
-            len(unrolled_lifecycle),
+            "Affected event ids: %s",
            len(skipped_no_backref),
            original_assistant_event_id,
-            [r[0] for r in unrolled_lifecycle],
+            skipped_no_backref,
        )
    # 1a. Look up any sibling interjection beat in the same turn group
@@ -716,11 +794,13 @@ async def regenerate_assistant_turn(
    # runs inline after a completion so promotion artifacts land in the
    # same regenerate path.
    #
-    # T83.4 follow-up: when a regenerate replaces a turn that had
+    # T114.3: original-turn transitions emitted before this regenerate
-    # already produced event transitions, those original transitions
+    # ran were rolled back at the top of the function (see the
-    # are NOT undone here (Phase 4 work). A WARNING log earlier in this
+    # ``# T114.3`` block) by appending ``event_status_reverted`` for
-    # function names the affected event_log ids — see the T83.4 block
+    # each. The classify-and-emit pass below now operates against an
-    # near the function entry.
+    # ``events`` projection that has already been reverted, so it can
    # safely re-fire transitions for the regenerated narrative without
    # double-emitting promotion artifacts.
    new_active_events = list_active_events(conn, chat_id)
    if new_active_events:
        lifecycle_decision = await detect_event_transitions(
@@ -738,6 +818,12 @@ async def regenerate_assistant_turn(
                    payload={
                        "event_id": transition.event_id,
                        "started_at": chat.get("time"),
                        # T114.1: back-reference to the assistant_turn
                        # that triggered this transition (see turns.py
                        # for rationale).
                        "triggered_by_assistant_turn_id": (
                            new_assistant_event_id
                        ),
                    },
                )
            elif transition.new_status == "completed":
@@ -747,6 +833,10 @@ async def regenerate_assistant_turn(
                    payload={
                        "event_id": transition.event_id,
                        "completed_at": chat.get("time"),
                        # T114.1: back-reference (see above).
                        "triggered_by_assistant_turn_id": (
                            new_assistant_event_id
                        ),
                    },
                )
                promote_completed_event(
@@ -762,6 +852,10 @@ async def regenerate_assistant_turn(
                    payload={
                        "event_id": transition.event_id,
                        "completed_at": chat.get("time"),
                        # T114.1: back-reference (see above).
                        "triggered_by_assistant_turn_id": (
                            new_assistant_event_id
                        ),
                    },
                )
@@ -144,23 +144,36 @@ def _read_recent_dialogue(
    ``id >= since_event_id`` so callers needing a scene-scoped view (e.g.
    thread detection on close) don't pull turns that landed before the
    closing scene's ``scene_opened`` event.
    T113: also clamps by the active branch's ``[origin, head]`` event-id
    range so scene-summary inputs respect the user's current branch.
    Bootstrap-main and "no active branch" fall through to ``(0, BIG_INT)``
    so existing flows are unchanged.
    """
    from chat.state.branches import active_branch_event_ids
    origin, head = active_branch_event_ids(conn)
    if since_event_id is None:
        cur = conn.execute(
            "SELECT kind, payload_json FROM event_log "
            "WHERE kind IN ('user_turn', 'assistant_turn') "
            "  AND superseded_by IS NULL AND hidden = 0 "
            "  AND id BETWEEN ? AND ? "
            "ORDER BY id DESC LIMIT ?",
-            (limit,),
+            (origin, head, limit),
        )
    else:
        # Compose ``since_event_id`` with the branch lower bound — readers
        # want the tightest ``id >= max(since, origin)`` clamp without an
        # extra Python pass.
        lower = max(origin, since_event_id)
        cur = conn.execute(
            "SELECT kind, payload_json FROM event_log "
            "WHERE kind IN ('user_turn', 'assistant_turn') "
            "  AND superseded_by IS NULL AND hidden = 0 "
-            "  AND id >= ? "
+            "  AND id BETWEEN ? AND ? "
            "ORDER BY id DESC LIMIT ?",
-            (since_event_id, limit),
+            (lower, head, limit),
        )
    rows = list(reversed(cur.fetchall()))
    out: list[dict] = []
@@ -30,6 +30,7 @@ from __future__ import annotations
 import json
 from sqlite3 import Connection
 from chat.state.branches import active_branch_event_ids
 from chat.state.edges import get_edge
@@ -60,15 +61,22 @@ def read_recent_dialogue(
    previous implementation filtered chat_id post-fetch in Python, which
    let foreign-chat rows fill the LIMIT and yield fewer than N relevant
    rows in busy multi-chat databases.
    T113: clamp by the active branch's ``[origin, head]`` event-id range so
    switching branches actually changes what dialogue this read sees.
    Bootstrap-main and "no active branch" both fall through to ``(0,
    BIG_INT)`` — no functional change for the metadata-only Phase 4 era.
    """
    origin, head = active_branch_event_ids(conn)
    if exclude_event_id is None:
        cur = conn.execute(
            "SELECT id, kind, payload_json FROM event_log "
            "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
            "  AND superseded_by IS NULL AND hidden = 0 "
            "  AND id BETWEEN ? AND ? "
            "  AND json_extract(payload_json, '$.chat_id') = ? "
            "ORDER BY id DESC LIMIT ?",
-            (chat_id, limit),
+            (origin, head, chat_id, limit),
        )
    else:
        cur = conn.execute(
@@ -76,9 +84,10 @@ def read_recent_dialogue(
            "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
            "  AND id != ? "
            "  AND superseded_by IS NULL AND hidden = 0 "
            "  AND id BETWEEN ? AND ? "
            "  AND json_extract(payload_json, '$.chat_id') = ? "
            "ORDER BY id DESC LIMIT ?",
-            (exclude_event_id, chat_id, limit),
+            (exclude_event_id, origin, head, chat_id, limit),
        )
    rows = list(reversed(cur.fetchall()))
    out: list[dict] = []
@@ -107,13 +107,23 @@ async def parse_turn(
    without an LLM call (the classifier would error on empty input
    anyway, and the result is unambiguous).
-    Raises ``RuntimeError`` if the classifier fails twice — no default
+    Falls back to a single dialogue-shaped segment containing the
-    is supplied, since the caller (T19's turn flow) is responsible for
+    whole prose if the classifier flaps after retries — the turn flow
-    surfacing the error to the user.
+    can keep moving (the narrative will still fire on the prose) at
    the cost of finer-grained segment classification. The original
    code raised ``RuntimeError`` here, which 500'd the whole request
    and was particularly painful in multi-bot scenes where every
    user turn paid the classifier round-trip.
    """
    if not prose.strip():
        return ParsedTurn(segments=[])
    fallback = ParsedTurn(
        segments=[TurnSegment(kind="dialogue", text=prose)],
        intent="narrative",
        landing_state_hint="",
    )
    user_prompt = f"INPUT:\n{prose}"
    return await classify(
        client,
@@ -121,5 +131,6 @@ async def parse_turn(
        system=_SYSTEM_PROMPT,
        user=user_prompt,
        schema=ParsedTurn,
        default=fallback,
        timeout_s=timeout_s,
    )
@@ -9,11 +9,15 @@ existing event readers remain branch-agnostic.
 """
 from __future__ import annotations
 import logging
 from sqlite3 import Connection
 from chat.eventlog.projector import on
 from chat.eventlog.log import Event
 logger = logging.getLogger(__name__)
@on("branch_created")
 def _apply_branch_created(conn: Connection, e: Event) -> None:
@@ -37,9 +41,26 @@ def _apply_branch_switched(conn: Connection, e: Event) -> None:
    """Set is_active=1 on the named branch and is_active=0 on all others.
    Atomic via two UPDATEs ordered to avoid the unique-active-index race.
    If the named branch does not exist, a warning is emitted and the
    is_active flags are still cleared (preserving prior behavior — the
    second UPDATE simply matches no rows). Callers should validate the
    name upstream; this guard surfaces accidental mismatches in the log.
    """
    p = e.payload
    name = p["name"]
    # Warn (don't raise) if the target branch is missing. The existing
    # outcome — zero active branches — is preserved; this just makes the
    # condition observable instead of silent.
    exists = conn.execute(
        "SELECT 1 FROM branches WHERE name = ? LIMIT 1",
        (name,),
    ).fetchone()
    if exists is None:
        logger.warning(
            "branch_switched to unknown branch name %r; no branch will be active",
            name,
        )
    # Clear ALL is_active flags first (avoids the unique-index trip).
    conn.execute("UPDATE branches SET is_active = 0 WHERE is_active = 1")
    conn.execute(
@@ -79,6 +100,16 @@ def get_branch(conn: Connection, name: str) -> dict | None:
 def list_branches(conn: Connection, chat_id: str | None = None) -> list[dict]:
    """Return branch rows, optionally scoped to a chat.
    When ``chat_id`` is provided the filter is ``chat_id = ? OR chat_id IS NULL``,
    so global (null-chat) branches are returned in *every* per-chat scope. This
    is intentional: the bootstrapped ``"main"`` branch (and any future
    null-chat branches) are global by design — they belong to no single chat
    and should appear alongside per-chat branches in any chat-scoped listing.
    Callers that want only per-chat branches should filter the result on
    ``chat_id is not None``.
    """
    if chat_id is None:
        rows = conn.execute(
            "SELECT id, name, origin_event_id, head_event_id, chat_id, "
@@ -126,8 +157,58 @@ def active_branch(conn: Connection) -> dict | None:
    }
 # T113: sentinel "no upper bound" used by ``active_branch_event_ids`` when the
 # active branch's head is unset (the bootstrap "main" branch with origin=0 +
 # head=0). Readers compose ``id BETWEEN origin AND head`` so a value larger
 # than any possible row id behaves as "no clamp" without needing a separate
 # code path. ``2**63 - 1`` is SQLite's max signed-int — safe forever.
 _NO_HEAD_CLAMP = 2**63 - 1
 def active_branch_event_ids(conn: Connection) -> tuple[int, int]:
    """Return ``(origin_event_id, head_event_id)`` for the currently active
    branch, suitable as bounds for an ``event_log.id BETWEEN ? AND ?`` clamp
    on user-facing reads (T113).
    Defensive defaults:
    * **No active branch row** (``active_branch`` returns ``None``) — return
      ``(0, _NO_HEAD_CLAMP)`` so readers see all events. This preserves the
      Phase 4 "branches are metadata-only" contract for any code path that
      somehow runs without the migration-0013 bootstrap.
    * **Bootstrap "main"** — the canonical ``name="main", origin=0, head=0``
      row inserted by migration 0013. Production today never emits
      ``branch_head_updated`` for main, so head stays at 0 even as events
      accumulate. We treat this exact bootstrap state as "no clamp" and
      return ``(0, _NO_HEAD_CLAMP)`` so all events remain visible. This is
      what every existing test (which never configures branches) relies on.
    * **Any other branch** — return the literal ``(origin, head)`` from the
      branch row. A branch created at origin=N has head=N initially (per
      ``branch_from_event``), so ``BETWEEN N AND N`` returns just that one
      seed event until the head is bumped via ``branch_head_updated``.
    Note on the schema mismatch with the T113 spec: the spec describes
    ``head_event_id`` as nullable, but migration 0013 declared it
    ``NOT NULL DEFAULT 0``. We read head=0 on bootstrap main as the
    "unset" sentinel; non-main branches never reach head=0 in normal
    flow (creation sets head=origin, and origin=0 only for main).
    """
    branch = active_branch(conn)
    if branch is None:
        return (0, _NO_HEAD_CLAMP)
    origin = int(branch.get("origin_event_id") or 0)
    head = int(branch.get("head_event_id") or 0)
    # Bootstrap "main" sentinel — see docstring above. Detect by name +
    # both ids being 0 to avoid mis-firing on a hypothetical future
    # branch that legitimately starts at origin=0.
    if branch.get("name") == "main" and origin == 0 and head == 0:
        return (0, _NO_HEAD_CLAMP)
    return (origin, head)
 __all__ = [
    "get_branch",
    "list_branches",
    "active_branch",
    "active_branch_event_ids",
 ]
@@ -67,6 +67,29 @@ def _apply_event_expired(conn: Connection, e: Event) -> None:
    )
@on("event_status_reverted")
 def _apply_event_status_reverted(conn: Connection, e: Event) -> None:
    """T114.2: Revert an event row's status to ``prior_status``.
    Emitted by ``regenerate_assistant_turn`` when a superseded turn had
    triggered a lifecycle transition (event_started / event_completed /
    event_cancelled). The rollback step needs an inverse projection that
    sets the row's status back to whatever it was *before* the now-
    superseded transition fired.
    Unlike the forward transitions (which guard against terminal-status
    overwrites) this handler is unconditional — the entire purpose is to
    reverse a transition, including reverting from a terminal status
    (completed/cancelled) back to a non-terminal one.
    """
    p = e.payload
    conn.execute(
        "UPDATE events SET status = ?, updated_at = datetime('now') "
        "WHERE event_id = ?",
        (p["prior_status"], p["event_id"]),
    )
 def get_event(conn: Connection, event_id: str) -> dict | None:
    row = conn.execute(
        "SELECT event_id, chat_id, kind, status, props_json, planned_for, "
@@ -13,13 +13,18 @@ def _row_to_dict(conn: Connection, row: tuple) -> dict:
@on("memory_written")
 def _apply_memory_written(conn: Connection, e: Event) -> None:
    # T109 (schema 0014): persist the projecting event's id on the memory
    # row so cross-chat search results can deep-link back to the
    # originating turn (T111). Older memory rows projected before 0014
    # ran read NULL here — the column is nullable for that reason.
    p = e.payload
    conn.execute(
        "INSERT INTO memories ("
        "owner_id, chat_id, scene_id, pov_summary, "
        "witness_you, witness_host, witness_guest, "
-        "chat_clock_at, source, reliability, significance, pinned, auto_pinned"
+        "chat_clock_at, source, reliability, significance, pinned, auto_pinned, "
-        ") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
+        "event_id"
        ") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
        (
            p["owner_id"],
            p["chat_id"],
@@ -34,6 +39,7 @@ def _apply_memory_written(conn: Connection, e: Event) -> None:
            int(p.get("significance", 1)),
            int(p.get("pinned", 0)),
            int(p.get("auto_pinned", 0)),
            e.id,
        ),
    )
@@ -112,6 +118,25 @@ SIGNIFICANCE_RANK_BIAS = 0.5
 RRF_CONST = 60
 def _max_event_id(conn: Connection, owner_id: str) -> int:
    """Return the largest ``memories.id`` for ``owner_id`` (1 if none exist).
    Used as the recency-boost denominator by both ``_composite_rerank`` and
    ``_rrf_fuse_and_rerank`` (T104). The row id is a monotonic recency proxy
    — newer memories have larger ids — so dividing by the per-owner max keeps
    the boost in [0, 1] regardless of how many memories the owner has.
    Returns 1 (not 0) when the owner has no rows so callers can divide by
    the result without a guard. The "no memories" case never actually hits
    this helper because the FTS query above would have returned no rows,
    but the safe default keeps the helper trivially reusable.
    """
    row = conn.execute(
        "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
    ).fetchone()
    return row[0] if row and row[0] else 1
 def search_memories(
    conn: Connection,
    owner_id: str,
@@ -163,6 +188,14 @@ def search_memories(
    When ``query_vector`` is None: FTS-only behaviour unchanged — all
    Phase 1-3.5 callers see the same row shape and ordering as before.
    **Row-shape contract (T104):** every returned dict carries an
    ``fts_rank`` key. For FTS hits this is the BM25 score (a negative float,
    lower-is-better). For *vector-only* hits surfaced by the fused path —
    rows that matched the query embedding but did NOT match FTS — the
    ``fts_rank`` value is ``None``. Downstream consumers must accept
    ``None`` here; do not assume ``fts_rank`` is always numeric. The
    ``composite_score`` is always a float on every returned row.
    """
    if witness_role not in _VALID_WITNESS_ROLES:
        raise ValueError(
@@ -180,12 +213,20 @@ def search_memories(
    # channel) so memories that are weak in FTS but strong in vector — and
    # vice versa — make it into the merge pool.
    over_fetch = max(k * 2, 20) if query_vector is not None else max(k * 4, 20)
    # T113: branch-scope filter on ``m.event_id`` (T109's column). Memories
    # whose ``event_id`` is NULL — projected before the 0014 schema migration
    # ran — are *included* unconditionally so the branch filter never breaks
    # legacy retrieval. Newer rows respect the active branch's bounds.
    from chat.state.branches import active_branch_event_ids
    origin, head = active_branch_event_ids(conn)
    sql = (
        f"SELECT {select_list}, memories_fts.rank AS fts_rank "
        "FROM memories_fts "
        "JOIN memories m ON m.id = memories_fts.rowid "
        f"WHERE m.owner_id = ? AND m.{witness_col} = 1 "
        "AND memories_fts MATCH ? "
        "AND (m.event_id IS NULL OR m.event_id BETWEEN ? AND ?) "
        # T57: significance multiplier biases the FTS over-fetch order. BM25
        # ``rank`` is lower-is-better, so subtracting ``significance * BIAS``
        # surfaces higher-significance rows above lower-significance rows with
@@ -194,7 +235,10 @@ def search_memories(
        "ORDER BY (memories_fts.rank - m.significance * ?) ASC "
        "LIMIT ?"
    )
-    cur = conn.execute(sql, (owner_id, query, SIGNIFICANCE_RANK_BIAS, over_fetch))
+    cur = conn.execute(
        sql,
        (owner_id, query, origin, head, SIGNIFICANCE_RANK_BIAS, over_fetch),
    )
    rows = cur.fetchall()
    # FTS-only path: preserve pre-T96 behaviour exactly.
@@ -227,10 +271,7 @@ def _composite_rerank(
    Extracted from ``search_memories`` so the no-vector path stays a single
    call and the fused path can re-use the same boost formulae after RRF.
    """
-    max_id_row = conn.execute(
+    max_id = _max_event_id(conn, owner_id)
        "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
    ).fetchone()
    max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1
    result_cols = cols + ["fts_rank"]
    enriched: list[dict] = []
@@ -301,6 +342,28 @@ def _rrf_fuse_and_rerank(
        query_vector=query_vector,
        k=vec_over_fetch,
    )
    # T113: drop vector hits that fall outside the active branch's event-id
    # range. ``vector_search`` is a generic service used elsewhere; the
    # branch filter applied to the FTS leg also has to apply here so the
    # fused result respects the same scope. Memories with NULL event_id
    # (legacy rows projected before T109's 0014 schema migration) are
    # included unconditionally — same policy as the FTS leg.
    from chat.state.branches import _NO_HEAD_CLAMP, active_branch_event_ids
    vec_origin, vec_head = active_branch_event_ids(conn)
    if vec_hits and (vec_origin > 0 or vec_head < _NO_HEAD_CLAMP):
        vec_ids = [h["memory_id"] for h in vec_hits]
        placeholders_v = ",".join("?" * len(vec_ids))
        in_range = {
            row[0]
            for row in conn.execute(
                f"SELECT id FROM memories "
                f"WHERE id IN ({placeholders_v}) "
                f"  AND (event_id IS NULL OR event_id BETWEEN ? AND ?)",
                (*vec_ids, vec_origin, vec_head),
            ).fetchall()
        }
        vec_hits = [h for h in vec_hits if h["memory_id"] in in_range]
    vec_rank_by_id: dict[int, int] = {
        hit["memory_id"]: rank for rank, hit in enumerate(vec_hits)
    }
@@ -343,10 +406,7 @@ def _rrf_fuse_and_rerank(
    # Final composite re-rank: significance + recency boosts on top of the
    # negated fusion score so the sort direction matches the FTS-only path.
-    max_id_row = conn.execute(
+    max_id = _max_event_id(conn, owner_id)
        "SELECT MAX(id) FROM memories WHERE owner_id = ?", (owner_id,)
    ).fetchone()
    max_id = max_id_row[0] if max_id_row and max_id_row[0] else 1
    result_cols = cols + ["fts_rank"]
    enriched: list[dict] = []
@@ -5,7 +5,12 @@ body {
    color: #1c1c1c;
    background: #fafafa;
    display: flex;
-    min-height: 100vh;
+    /* Locked to viewport (was ``min-height: 100vh``) so flex children
       like the chat ``.timeline`` get a bounded height and can use
       ``overflow-y: auto`` to scroll independently. The other pages
       have ``.content`` with ``overflow: auto`` so their own
       overflow still scrolls inside the right pane. */
    height: 100vh;
 }
 .rail {
    width: 200px;
@@ -101,12 +106,291 @@ code { font-family: ui-monospace, "SF Mono", Menlo, monospace; }
 }
 .turn-input { display: flex; flex-direction: column; gap: 8px; padding-top: 12px; border-top: 1px solid #e5e5e5; }
 .turn-input textarea { padding: 8px; font: inherit; border: 1px solid #ccc; border-radius: 3px; resize: vertical; }
-.drawer { position: fixed; top: 0; right: 0; width: 360px; height: 100vh; background: #fff; border-left: 1px solid #e5e5e5; padding: 16px; overflow-y: auto; z-index: 10; }
+/* ===========================================================
-.drawer[hidden] { display: none; }
+   Drawer — director's notebook overlay
-.drawer-content { display: flex; flex-direction: column; gap: 16px; }
+   ===========================================================
-.drawer-header { display: flex; align-items: center; justify-content: space-between; padding-bottom: 8px; border-bottom: 1px solid #e5e5e5; }
+   Editorial popup design: a warm-paper panel floats over an inky
-.drawer-close { border: none; background: transparent; color: #1c1c1c; font-size: 24px; padding: 0 4px; cursor: pointer; }
+   blurred backdrop. Single accent serif (Newsreader) at the title,
-.drawer-section h3 { margin: 0 0 8px; font-size: 14px; text-transform: uppercase; letter-spacing: 0.5px; color: #666; }
+   single muted-amber accent for primary interactives, generous
   spacing, controlled motion.
   Design tokens (scoped to the drawer so the rest of the app stays
   on its existing palette).
 */
 .drawer-modal {
  --paper:           #f6f1e8;   /* warm off-white panel */
  --paper-edge:      #e7dfce;
  --ink:             #1a1d29;   /* deep ink-blue */
  --ink-soft:        #38405a;
  --ink-faint:       #6c7390;
  --accent:          #b97e30;   /* muted amber */
  --accent-soft:     #efd9b1;
  --rule:            rgba(26, 29, 41, 0.10);
  --shadow-near:     0 1px 2px rgba(26, 29, 41, 0.08);
  --shadow-far:      0 32px 64px -24px rgba(26, 29, 41, 0.45),
                     0 12px 24px -12px rgba(26, 29, 41, 0.25);
  --serif:           "Newsreader", "Iowan Old Style", Georgia, serif;
  --duration:        180ms;
  --ease:            cubic-bezier(0.22, 0.61, 0.36, 1);
  position: fixed;
  inset: 0;
  z-index: 100;
  display: flex;
  align-items: center;
  justify-content: center;
  padding: clamp(16px, 4vw, 48px);
  /* Open/close transitions live here so the backdrop and panel
     can fade together; .is-open promotes both to their visible
     end-states. */
  opacity: 0;
  transition: opacity var(--duration) var(--ease);
 }
 .drawer-modal[hidden] { display: none; }
 .drawer-modal.is-open { opacity: 1; }
 .drawer-modal-backdrop {
  position: absolute;
  inset: 0;
  background:
    radial-gradient(circle at 30% 25%, rgba(26, 29, 41, 0.55), rgba(26, 29, 41, 0.85) 75%);
  backdrop-filter: blur(6px) saturate(1.05);
  -webkit-backdrop-filter: blur(6px) saturate(1.05);
 }
 /* The chat behind the modal stops scrolling and loses focus
   entirely. body class set by the JS; resets on close. */
 body.drawer-modal-open { overflow: hidden; }
 .drawer-panel {
  position: relative;
  width: 100%;
  max-width: 720px;
  max-height: min(82vh, 760px);
  display: flex;
  flex-direction: column;
  background: var(--paper);
  border-radius: 6px;
  box-shadow: var(--shadow-far);
  /* Subtle warm-paper texture: a single soft inner highlight at the
     top edge plus a faint vignette toward the bottom. Cheap, no
     external image. */
  background-image:
    linear-gradient(180deg,
      rgba(255, 255, 255, 0.50) 0%,
      rgba(255, 255, 255, 0.00) 18%,
      rgba(0, 0, 0, 0.00) 80%,
      rgba(120, 100, 70, 0.06) 100%);
  /* A 1px ink rule at the very top, set INSIDE the radius so the
     corners stay clean. ::before serves as a hairline accent. */
  overflow: hidden;
  /* Open/close: the backdrop fades; the panel additionally lifts
     slightly and scales from 98% to 100%. Controlled, no bounce. */
  transform: translateY(8px) scale(0.98);
  transition:
    transform var(--duration) var(--ease),
    opacity var(--duration) var(--ease);
  opacity: 0.98;
 }
 .drawer-modal.is-open .drawer-panel {
  transform: translateY(0) scale(1);
  opacity: 1;
 }
 .drawer-panel::before {
  content: "";
  position: absolute;
  top: 0; left: 0; right: 0;
  height: 2px;
  background: linear-gradient(90deg,
    transparent 0%, var(--accent) 14%, var(--accent) 86%, transparent 100%);
  opacity: 0.85;
 }
 .drawer-panel-header {
  display: flex;
  align-items: baseline;
  justify-content: space-between;
  gap: 16px;
  padding: 22px 28px 14px;
  border-bottom: 1px solid var(--rule);
  flex-shrink: 0;
 }
 .drawer-panel-header h2 {
  margin: 0;
  font-family: var(--serif);
  font-weight: 500;
  font-size: clamp(22px, 2.4vw, 28px);
  letter-spacing: -0.01em;
  color: var(--ink);
  /* Tiny editorial flourish: lowercase the title so it reads like
     a column header in a printed broadside. */
  text-transform: lowercase;
 }
 .drawer-panel-header h2::after {
  content: "";
  display: inline-block;
  width: 6px;
  height: 6px;
  margin-left: 10px;
  border-radius: 50%;
  background: var(--accent);
  vertical-align: middle;
  transform: translateY(-2px);
 }
 .drawer-panel-close {
  appearance: none;
  background: transparent;
  border: none;
  border-radius: 4px;
  color: var(--ink-soft);
  font-family: var(--serif);
  font-size: 28px;
  line-height: 1;
  width: 36px;
  height: 36px;
  cursor: pointer;
  transition:
    background-color var(--duration) var(--ease),
    color var(--duration) var(--ease),
    transform var(--duration) var(--ease);
 }
 .drawer-panel-close:hover {
  background: rgba(26, 29, 41, 0.06);
  color: var(--ink);
  transform: rotate(90deg);
 }
 .drawer-panel-close:focus-visible {
  outline: 2px solid var(--accent);
  outline-offset: 2px;
 }
 .drawer-panel-body {
  flex: 1 1 auto;
  min-height: 0;
  overflow-y: auto;
  padding: 18px 28px 28px;
  /* Restrict typography inside the body to the existing app font
     so the existing drawer markup (forms, lists, buttons rendered
     by /chats/<id>/drawer) keeps its current density and read-flow.
     We only re-color a few items so they sit on the warm paper. */
  color: var(--ink);
 }
 .drawer-panel-body .drawer-panel-loading {
  font-family: var(--serif);
  font-style: italic;
  color: var(--ink-faint);
 }
 /* Scoped overrides for the drawer-content the server renders into
   .drawer-panel-body. Keeps the existing class names working but
   re-tunes them for the warm-paper context. */
 /* Tabs nav — sits at the top of .drawer-content and lets the user
   pivot between Scene / Cast / Story / Turns groups. Underline-style
   active indicator (a single muted-amber rule) keeps the editorial
   feel — no pills, no boxes, no hover-fills. */
 .drawer-panel-body .drawer-tabs {
  display: flex;
  gap: 6px;
  margin: 0 -8px 14px;  /* bleed the divider rule slightly past the body padding */
  padding: 0 8px 0;
  border-bottom: 1px solid var(--rule);
  flex-wrap: wrap;
 }
 .drawer-panel-body .drawer-tab {
  appearance: none;
  background: transparent;
  border: none;
  padding: 10px 14px 12px;
  margin-bottom: -1px;  /* sit on top of the parent's border-bottom */
  font-family: var(--serif);
  font-size: 15px;
  font-weight: 400;
  letter-spacing: 0.02em;
  color: var(--ink-faint);
  border-bottom: 2px solid transparent;
  cursor: pointer;
  transition:
    color var(--duration) var(--ease),
    border-color var(--duration) var(--ease);
  border-radius: 0;  /* strip the global button radius */
 }
 .drawer-panel-body .drawer-tab:hover {
  color: var(--ink);
  background: transparent;
  border-color: transparent;
 }
 .drawer-panel-body .drawer-tab.is-active {
  color: var(--ink);
  border-bottom-color: var(--accent);
  background: transparent;
 }
 .drawer-panel-body .drawer-tab.is-active:hover {
  background: transparent;
  color: var(--ink);
 }
 .drawer-panel-body .drawer-tab:focus-visible {
  outline: 2px solid var(--accent);
  outline-offset: 2px;
  border-radius: 2px;
 }
 /* Panes — only one visible at a time. Uses [hidden] so the JS can
   toggle attribute-driven instead of class-driven. */
 .drawer-panel-body .drawer-tab-pane[hidden] { display: none; }
 /* Sections inside a pane: drop the section-level rules since the
   tabs already segment the content. Keep the section h3 as a sub-
   heading inside its pane — useful when a tab groups multiple
   sections (e.g. Cast = Guest + Group + Edges). */
 .drawer-panel-body .drawer-section {
  padding: 14px 0 18px;
  border-bottom: 1px solid var(--rule);
 }
 .drawer-panel-body .drawer-tab-pane > .drawer-section:first-child { padding-top: 6px; }
 .drawer-panel-body .drawer-tab-pane > .drawer-section:last-child { border-bottom: none; padding-bottom: 4px; }
 /* When a pane has only one section, suppress the redundant h3 since
   the tab label is the same name. */
 .drawer-panel-body .drawer-tab-pane:has(> .drawer-section:only-child) > .drawer-section > h3 {
  display: none;
 }
 .drawer-panel-body .drawer-section h3 {
  margin: 0 0 10px;
  font-family: var(--serif);
  font-weight: 500;
  font-size: 12px;
  letter-spacing: 0.16em;
  text-transform: uppercase;
  color: var(--accent);
 }
 .drawer-panel-body .activity-row,
 .drawer-panel-body .edge-row { margin-bottom: 12px; }
 .drawer-panel-body .activity-row strong,
 .drawer-panel-body .edge-row strong { display: block; color: var(--ink); }
 .drawer-panel-body .muted { color: var(--ink-faint); }
 .drawer-panel-body button,
 .drawer-panel-body .btn {
  background: var(--ink);
  border: 1px solid var(--ink);
  color: var(--paper);
  border-radius: 3px;
 }
 .drawer-panel-body button:hover,
 .drawer-panel-body .btn:hover {
  background: var(--accent);
  border-color: var(--accent);
  color: var(--ink);
 }
 /* Respect reduced-motion preference: no scale, no rotate, no
   blur transition — just the opacity fade. */
@media (prefers-reduced-motion: reduce) {
  .drawer-modal,
  .drawer-panel,
  .drawer-panel-close { transition-duration: 0ms; }
  .drawer-panel { transform: none; }
  .drawer-panel-close:hover { transform: none; }
 }
 .activity-row, .edge-row { margin-bottom: 12px; }
 .activity-row strong, .edge-row strong { display: block; }
 .memory-list { list-style: none; padding: 0; margin: 0; }
@@ -0,0 +1,34 @@
 {# T110.3: delete-impact modal partial.
 Rendered from :func:`chat.web.drawer.delete_preview` via a Jinja2
 TemplateResponse so HTML autoescape covers user-controllable fields
 (item.kind, item.description, notes) automatically — the prior
 f-string assembly required explicit html.escape() calls (T110.2)
 which become redundant under autoescape.
 Inputs:
  ``chat_id`` — the URL chat id (used to build the confirm form action).
  ``impact``  — an :class:`~chat.services.delete_impact.ImpactReport`.
 #}
 <div class="delete-impact-modal">
  <h3>Delete event {{ impact.target_event_id }}?</h3>
  <p>This will discard {{ impact.cascading|length }} events. Cascade:</p>
  <ul class="delete-impact-cascade">
    {% if impact.cascading %}
      {% for item in impact.cascading %}
        <li><strong>{{ item.kind }}</strong>: {{ item.description }}</li>
      {% endfor %}
    {% else %}
      <li>none</li>
    {% endif %}
  </ul>
  <ul class="delete-impact-notes">
    {% for note in impact.notes %}
      <li>{{ note }}</li>
    {% endfor %}
  </ul>
  <form hx-post="/chats/{{ chat_id }}/drawer/turn/delete/{{ impact.target_event_id }}"
        hx-target="#drawer" hx-swap="innerHTML">
    <button type="submit">Confirm delete</button>
  </form>
 </div>
@@ -547,6 +547,25 @@
        </ul>
      </details>
    {% endif %}
    {# T110.4: bulk significance re-rate. Move every memory in this chat
       at level_from to level_to with one manual_edit event per row, so
       the audit trail stays per-memory. #}
    <details class="bulk-significance">
      <summary>Bulk re-rate significance</summary>
      <form class="inline-edit"
            hx-post="/chats/{{ chat.id }}/drawer/memory/significance/bulk"
            hx-target="#drawer" hx-swap="innerHTML">
        <label>
          From:
          <input type="number" name="level_from" min="0" max="3" value="0" required>
        </label>
        <label>
          To:
          <input type="number" name="level_to" min="0" max="3" value="1" required>
        </label>
        <button type="submit">Re-rate all</button>
      </form>
    </details>
  </section>
  <section class="drawer-section">
@@ -5,7 +5,18 @@
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{% block title %}chat{% endblock %}</title>
    <link rel="stylesheet" href="/static/app.css">
    <!-- Newsreader: refined editorial serif for accent typography
         (drawer modal title, etc.). Body stays system-ui for read-
         flow legibility. Subset to the weight we use to keep the
         payload tiny. -->
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Newsreader:opsz,wght@6..72,400;6..72,500&display=swap">
    <script src="https://unpkg.com/htmx.org@1.9.12" defer></script>
    <!-- htmx 1.x bundles its SSE extension at /dist/ext/sse.js. The
         standalone htmx-ext-sse@2.x package is for htmx 2.x and is
         not compatible with the 1.x ext API. -->
    <script src="https://unpkg.com/htmx.org@1.9.12/dist/ext/sse.js" defer></script>
 </head>
 <body>
    {% block body %}{% endblock %}
@@ -7,7 +7,9 @@
  <header class="chat-header">
    <h1>{{ host_bot.name }}</h1>
    <div class="chat-meta muted">{{ chat.time }}</div>
-    <button class="drawer-toggle" type="button" aria-controls="drawer" aria-expanded="false">Drawer</button>
+    <button class="drawer-toggle" type="button"
            aria-controls="drawer-modal" aria-expanded="false"
            aria-haspopup="dialog">Drawer</button>
  </header>
  <section class="timeline" id="timeline"
@@ -30,21 +32,251 @@
    <button type="submit">Send</button>
  </form>
  <aside class="drawer" id="drawer" hidden
         hx-get="/chats/{{ chat.id }}/drawer"
         hx-trigger="revealed"
         hx-swap="innerHTML">
    <p class="muted">Loading drawer&hellip;</p>
  </aside>
 </div>
 <!-- Drawer modal — director's notebook overlay.
     Sits outside .chat-shell so its position:fixed backdrop covers the
     whole viewport. The panel still pulls its inner HTML from
     /chats/<id>/drawer via HTMX; trigger is a custom 'drawer-open'
     event that the open/close script dispatches each time the modal
     opens, so the content refreshes on every open. -->
 <div class="drawer-modal" id="drawer-modal" hidden
     role="dialog"
     aria-modal="true"
     aria-labelledby="drawer-modal-title">
  <div class="drawer-modal-backdrop" data-drawer-close></div>
  <article class="drawer-panel">
    <header class="drawer-panel-header">
      <h2 id="drawer-modal-title">Drawer</h2>
      <button class="drawer-panel-close" type="button"
              data-drawer-close
              aria-label="Close drawer">×</button>
    </header>
    <div class="drawer-panel-body" id="drawer"
         hx-get="/chats/{{ chat.id }}/drawer"
         hx-trigger="drawer-open from:body"
         hx-swap="innerHTML">
      <p class="muted drawer-panel-loading">Loading&hellip;</p>
    </div>
  </article>
 </div>
 <script>
-document.querySelector('.drawer-toggle')?.addEventListener('click', (e) => {
+// Drawer modal — open/close, focus management, and post-swap
-  const drawer = document.getElementById('drawer');
+// tab-grouping. The server's /chats/<id>/drawer response is left
-  const isHidden = drawer.hasAttribute('hidden');
+// unchanged; this script post-processes the swapped HTML to:
-  if (isHidden) drawer.removeAttribute('hidden');
+//   1. Pull the bot name from the legacy <header><h2> and use it as
-  else drawer.setAttribute('hidden', '');
+//      the modal title.
-  e.target.setAttribute('aria-expanded', String(isHidden));
+//   2. Remove the legacy header (it has its own onclick="hidden"
-});
+//      close that targets the OLD drawer semantics — broken now).
 //   3. Walk .drawer-section blocks and group them into 4 tabs by
 //      their <h3> title:
 //        Scene  : Scene, Activity
 //        Cast   : Guest, Group, Edges
 //        Story  : Events, Threads, Branches
 //        Turns  : Recent turns, Significance review
 //      A tab nav is rendered above the sections; clicking switches
 //      which group is visible. Empty tabs (no matching sections) are
 //      hidden.
 (function () {
  const modal = document.getElementById('drawer-modal');
  const toggle = document.querySelector('.drawer-toggle');
  if (!modal || !toggle) return;
  const titleEl = modal.querySelector('#drawer-modal-title');
  const body = modal.querySelector('.drawer-panel-body');
  const panel = modal.querySelector('.drawer-panel');
  let lastFocus = null;
  function open() {
    if (!modal.hasAttribute('hidden')) return;
    lastFocus = document.activeElement;
    modal.removeAttribute('hidden');
    // Force reflow so the .is-open class triggers the transition.
    void modal.offsetWidth;
    modal.classList.add('is-open');
    toggle.setAttribute('aria-expanded', 'true');
    document.body.classList.add('drawer-modal-open');
    // Re-fetch drawer content via the panel's hx-trigger.
    document.body.dispatchEvent(new CustomEvent('drawer-open'));
    // Focus the close button so Escape / Enter both work
    // immediately and screen readers announce the dialog.
    requestAnimationFrame(() => {
      const closeBtn = modal.querySelector('.drawer-panel-close');
      if (closeBtn) closeBtn.focus();
    });
  }
  function close() {
    if (modal.hasAttribute('hidden')) return;
    modal.classList.remove('is-open');
    toggle.setAttribute('aria-expanded', 'false');
    document.body.classList.remove('drawer-modal-open');
    // Wait for the fade-out before fully hiding so the transition
    // can play. Match the CSS duration.
    setTimeout(() => {
      modal.setAttribute('hidden', '');
      if (lastFocus && typeof lastFocus.focus === 'function') {
        lastFocus.focus();
      }
    }, 180);
  }
  toggle.addEventListener('click', open);
  // Bind close DIRECTLY to every element flagged data-drawer-close.
  // Event delegation through .stopPropagation() previously swallowed
  // the close button's click (it sits inside .drawer-panel, which
  // stops propagation to keep backdrop clicks from leaking through
  // the panel itself). Direct binding sidesteps that and keeps the
  // panel-stops-propagation rule for everything else.
  function bindCloseTargets(root) {
    root.querySelectorAll('[data-drawer-close]').forEach((el) => {
      // Idempotent: only bind once per element.
      if (el.dataset.drawerCloseBound === '1') return;
      el.dataset.drawerCloseBound = '1';
      el.addEventListener('click', (e) => {
        e.preventDefault();
        e.stopPropagation();
        close();
      });
    });
  }
  bindCloseTargets(modal);
  // Clicks inside the panel that AREN'T close targets must not
  // reach the backdrop click handler. (We don't have one currently
  // — backdrop close is via data-drawer-close on the backdrop div —
  // but stopPropagation here is defensive against future handlers.)
  panel.addEventListener('click', (e) => e.stopPropagation());
  // Escape closes only when the modal is open.
  document.addEventListener('keydown', (e) => {
    if (e.key === 'Escape' && !modal.hasAttribute('hidden')) {
      e.preventDefault();
      close();
    }
  });
  // ---- Tabs: group server-rendered .drawer-section blocks ----
  const TAB_GROUPS = [
    { id: 'scene', label: 'Scene', sections: ['Scene', 'Activity'] },
    { id: 'cast',  label: 'Cast',  sections: ['Guest', 'Group', 'Edges'] },
    { id: 'story', label: 'Story', sections: ['Events', 'Threads', 'Branches'] },
    { id: 'turns', label: 'Turns', sections: ['Recent turns', 'Significance review'] },
  ];
  function tabIdForSection(h3Text) {
    const t = (h3Text || '').trim();
    for (const g of TAB_GROUPS) {
      if (g.sections.includes(t)) return g.id;
    }
    return 'scene'; // unknown sections fall into the first tab
  }
  function buildTabs() {
    // Clean up the legacy server-rendered header inside the body
    // (duplicate close + duplicate title).
    const legacyHeader = body.querySelector(':scope > .drawer-content > .drawer-header');
    if (legacyHeader) {
      // Promote the bot name to the modal title before discarding.
      const h2 = legacyHeader.querySelector('h2');
      if (h2 && h2.textContent.trim()) {
        titleEl.textContent = h2.textContent.trim();
      }
      legacyHeader.remove();
    }
    // The drawer-content wrapper holds all the sections. Group them.
    const content = body.querySelector('.drawer-content');
    if (!content) return;
    const sections = Array.from(content.querySelectorAll(':scope > .drawer-section'));
    if (sections.length === 0) return;
    // Bucket sections by tab id.
    const buckets = new Map(TAB_GROUPS.map((g) => [g.id, []]));
    for (const sec of sections) {
      const h3 = sec.querySelector(':scope > h3');
      const tabId = tabIdForSection(h3 ? h3.textContent : '');
      buckets.get(tabId).push(sec);
    }
    // Build the tab nav. Skip empty buckets so the nav reflects
    // what the chat actually has (e.g. no Guest tab when 1:1).
    const nav = document.createElement('nav');
    nav.className = 'drawer-tabs';
    nav.setAttribute('role', 'tablist');
    const panes = document.createElement('div');
    panes.className = 'drawer-tab-panes';
    let firstActive = null;
    for (const group of TAB_GROUPS) {
      const items = buckets.get(group.id);
      if (!items.length) continue;
      const btn = document.createElement('button');
      btn.type = 'button';
      btn.className = 'drawer-tab';
      btn.setAttribute('role', 'tab');
      btn.id = `drawer-tab-${group.id}`;
      btn.dataset.tabTarget = group.id;
      btn.textContent = group.label;
      btn.setAttribute('aria-controls', `drawer-pane-${group.id}`);
      nav.appendChild(btn);
      const pane = document.createElement('section');
      pane.className = 'drawer-tab-pane';
      pane.id = `drawer-pane-${group.id}`;
      pane.setAttribute('role', 'tabpanel');
      pane.setAttribute('aria-labelledby', `drawer-tab-${group.id}`);
      // Move the section nodes into the pane (preserves any HTMX
      // event listeners and the sections' interactive forms).
      for (const sec of items) pane.appendChild(sec);
      panes.appendChild(pane);
      if (!firstActive) firstActive = group.id;
    }
    // Replace the existing content with [nav][panes].
    content.innerHTML = '';
    content.appendChild(nav);
    content.appendChild(panes);
    // Tab click handler.
    nav.addEventListener('click', (e) => {
      const target = e.target;
      if (!(target instanceof HTMLElement)) return;
      const tabId = target.dataset.tabTarget;
      if (!tabId) return;
      activateTab(content, tabId);
    });
    if (firstActive) activateTab(content, firstActive);
  }
  function activateTab(content, tabId) {
    content.querySelectorAll('.drawer-tab').forEach((btn) => {
      const isActive = btn.dataset.tabTarget === tabId;
      btn.classList.toggle('is-active', isActive);
      btn.setAttribute('aria-selected', String(isActive));
      btn.setAttribute('tabindex', isActive ? '0' : '-1');
    });
    content.querySelectorAll('.drawer-tab-pane').forEach((pane) => {
      const isActive = pane.id === `drawer-pane-${tabId}`;
      pane.toggleAttribute('hidden', !isActive);
    });
  }
  // Run after every HTMX swap into the panel body. Covers the
  // initial open AND any subsequent server-driven re-render
  // (e.g. an in-drawer form submit that returns refreshed HTML).
  body.addEventListener('htmx:afterSwap', () => {
    buildTabs();
    bindCloseTargets(modal);
  });
 })();
 </script>
 <script>
 // Streaming UX (T34): typing indicator, Stop button, send-lock,
@@ -66,6 +298,44 @@ document.querySelector('.drawer-toggle')?.addEventListener('click', (e) => {
  let isStreaming = false;
  let typingEl = null;
  // Sticky-bottom autoscroll: scroll the timeline to the latest
  // message when new content arrives, but ONLY if the user is
  // already pinned to the bottom. Once they scroll up to read older
  // turns, we leave their position alone until they manually scroll
  // back down.
  //
  // ``isPinnedToBottom`` flips on every scroll event based on
  // distance-from-bottom (with a small tolerance so a few pixels of
  // overshoot from a layout shift doesn't unpin). A MutationObserver
  // catches every node added to the timeline — covers the SSE-
  // injected ``turn_html`` swap, the optimistic ``appendUserTurn``
  // render, and the streaming typing-indicator updates.
  const STICK_TOLERANCE_PX = 64;
  let isPinnedToBottom = true;
  function distanceFromBottom() {
    return timeline.scrollHeight - timeline.scrollTop - timeline.clientHeight;
  }
  function scrollToBottom() {
    timeline.scrollTop = timeline.scrollHeight;
  }
  // Initial state: stick to the bottom on page load so the latest
  // turn is visible without manual scrolling.
  requestAnimationFrame(scrollToBottom);
  timeline.addEventListener('scroll', () => {
    isPinnedToBottom = distanceFromBottom() <= STICK_TOLERANCE_PX;
  }, { passive: true });
  const timelineObserver = new MutationObserver(() => {
    if (isPinnedToBottom) scrollToBottom();
  });
  timelineObserver.observe(timeline, {
    childList: true,
    subtree: true,
    characterData: true,  // streaming token-by-token edits
  });
  function ensureTypingEl() {
    if (typingEl) return typingEl;
    typingEl = document.createElement('div');
@@ -162,13 +432,62 @@ document.querySelector('.drawer-toggle')?.addEventListener('click', (e) => {
    }
  });
-  form.addEventListener('submit', () => {
+  // Enter-to-send (Shift+Enter for newline). Submits via the form's
-    isStreaming = true;
+  // own submit event so all the optimistic-render + fetch logic
  // below applies uniformly to keyboard and click submissions.
  if (textarea) {
    textarea.addEventListener('keydown', (e) => {
      if (e.key === 'Enter' && !e.shiftKey && !e.isComposing) {
        e.preventDefault();
        if (typeof form.requestSubmit === 'function') {
          form.requestSubmit();
        } else {
          form.dispatchEvent(new Event('submit', { cancelable: true }));
        }
      }
    });
  }
  // Render the user's prose optimistically as a turn-you DOM node.
  // Without this the user can't see what they just sent until the page
  // reloads — the server persists ``user_turn`` events but doesn't
  // publish a turn_html for them (the SSE channel is bot-output-only).
  function appendUserTurn(prose) {
    const div = document.createElement('div');
    div.className = 'turn turn-you';
    const strong = document.createElement('strong');
    strong.textContent = 'you';
    const p = document.createElement('p');
    p.textContent = prose;
    div.appendChild(strong);
    div.appendChild(p);
    // Sending a message means the user wants to see it land — force
    // sticky-bottom even if they were scrolled up reading older
    // turns. The MutationObserver handles the actual scroll.
    isPinnedToBottom = true;
    timeline.appendChild(div);
  }
  // Intercept the form submit and POST via fetch so we can:
  //   1. Render the user's prose immediately (optimistic).
  //   2. Clear the textarea immediately.
  //   3. Keep the page state intact while the bot streams its
  //      response over SSE — vanilla form POST + 204 leaves the
  //      browser in a half-loaded state with the textarea unflushed.
  form.addEventListener('submit', async (e) => {
    e.preventDefault();
    if (isStreaming) return;
    const prose = textarea ? (textarea.value || '').trim() : '';
    if (!prose) return;
    appendUserTurn(prose);
    if (textarea) {
      textarea.value = '';
      textarea.readOnly = true;
    }
    if (sendBtn) sendBtn.disabled = true;
-    // readOnly (not disabled) — disabled fields are excluded from the
+    isStreaming = true;
-    // form submission, which would send prose="" and trigger the
+
    // server's empty-prose 400.
    if (textarea) textarea.readOnly = true;
    if (!shell.querySelector('.stop-streaming')) {
      const stopBtn = document.createElement('button');
      stopBtn.type = 'button';
@@ -186,6 +505,25 @@ document.querySelector('.drawer-toggle')?.addEventListener('click', (e) => {
      });
      form.parentElement.insertBefore(stopBtn, form);
    }
    // Fire the actual POST. The bot's response arrives via SSE
    // (``turn_html`` event swaps into the timeline; ``unlock()`` runs
    // on receipt to clear streaming state and re-enable the form).
    try {
      const body = new URLSearchParams({ prose }).toString();
      const resp = await fetch(form.action, {
        method: 'POST',
        headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
        body,
      });
      if (!resp.ok && resp.status !== 204) {
        showBanner('send failed (HTTP ' + resp.status + ') — try again');
        unlock();
      }
    } catch (err) {
      showBanner('send failed — check your connection');
      unlock();
    }
  });
 })();
 </script>
@@ -21,14 +21,29 @@
  <ul class="search-results">
    {% for r in results %}
    <li class="search-result">
-      <a class="search-result-link" href="/chats/{{ r.chat_id }}">
+      {# T111.2: deep-link to the originating turn via the
         ``id="turn-{event_id}"`` anchor stamped by Phase 3.5 T86.
         ``event_id`` may be NULL for memory rows projected before the
         0014 migration ran (T109 did not backfill historical rows); in
         that case fall back to a chat-level link with no anchor so we
         never emit ``#turn-None``. #}
      <a class="search-result-link"
         href="/chats/{{ r.chat_id }}{% if r.event_id %}#turn-{{ r.event_id }}{% endif %}">
        <div class="search-result-meta muted">
          <strong>{{ r.owner_name }}</strong>
          <span>&middot; {{ r.chat_id }}</span>
          {% if r.chat_name %}<span>&middot; {{ r.chat_name }}</span>{% endif %}
          {% if r.scene_label %}<span>&middot; scene {{ r.scene_label }}</span>{% endif %}
        </div>
-        <div class="search-result-summary">{{ r.pov_summary }}</div>
+        {# T111.1: ``r.snippet`` is the FTS5 ``snippet()`` excerpt with
           each match wrapped in ``<mark>...</mark>``. ``|safe`` is
           required so the marker tags survive Jinja's auto-escape; the
           snippet is built by SQLite from indexed text, so the only
           HTML in the string is the ``<mark>`` we configured (any
           special chars from the source content are passed through as
           literal text, NOT as HTML). This is the only ``|safe`` filter
           on the page — chat_id, owner_name, etc. remain auto-escaped. #}
        <div class="search-result-summary">{{ r.snippet|safe }}</div>
      </a>
    </li>
    {% endfor %}
@@ -5,9 +5,9 @@ from fastapi.responses import RedirectResponse, HTMLResponse
 from fastapi.templating import Jinja2Templates
 from chat.db.connection import open_db
-from chat.eventlog.log import append_event
+from chat.eventlog.log import append_and_apply
-from chat.eventlog.projector import project
+from chat.state.entities import get_bot, list_bots
-from chat.state.entities import list_bots
+from chat.state.world import get_chat
 TEMPLATES = Jinja2Templates(directory=str(Path(__file__).resolve().parent.parent / "templates"))
@@ -108,11 +108,33 @@ async def bot_create(
        "initial_relationship_to_you": initial_relationship_to_you.strip(),
        "kickoff_prose": kickoff_prose.strip(),
    }
-    append_event(conn, kind="bot_authored", payload=payload)
+    # Per-event apply (NOT project()) — see docs/audits/2026-04-27-project-callers.md.
-    project(conn)
+    # ``project()`` replays the full log, which trips raw-INSERT handlers like
    # ``_apply_chat_created`` once a second bot's events are present.
    append_and_apply(conn, kind="bot_authored", payload=payload)
    return RedirectResponse(url=f"/bots/{payload['id']}/kickoff", status_code=303)
@router.get("/bots/{bot_id}")
 async def bot_detail(bot_id: str, conn=Depends(get_conn)):
    """Click-through from the bots list. Routes to the bot's existing
    chat when there is one (the v1 model is one-chat-per-host-bot,
    keyed by ``chat_<bot_id>``), otherwise to the kickoff page so the
    user can author the chat's opening state. 404 if the bot itself
    doesn't exist.
    Defined AFTER the ``/bots/new`` and ``/bots/{bot_id}/kickoff``
    routes — FastAPI matches in declaration order, and a path
    parameter would otherwise swallow ``/bots/new``.
    """
    if get_bot(conn, bot_id) is None:
        raise HTTPException(status_code=404, detail="bot not found")
    chat_id = f"chat_{bot_id}"
    if get_chat(conn, chat_id) is not None:
        return RedirectResponse(url=f"/chats/{chat_id}", status_code=303)
    return RedirectResponse(url=f"/bots/{bot_id}/kickoff", status_code=303)
@router.post("/bots/{bot_id}/reset")
 async def reset_bot_route(
    bot_id: str,
@@ -411,6 +411,64 @@ async def edit_memory_significance(
    return await drawer(chat_id, request, conn)
@router.post(
    "/chats/{chat_id}/drawer/memory/significance/bulk",
    response_class=HTMLResponse,
 )
 async def bulk_re_rate_significance(
    chat_id: str,
    request: Request,
    level_from: int = Form(...),
    level_to: int = Form(...),
    conn=Depends(get_conn),
 ):
    """T110.4: bulk re-rate every memory in this chat at ``level_from``
    to ``level_to``.
    Fans out into one ``manual_edit`` event per matching memory rather
    than a single bulk event so the §6.4 audit trail stays per-row —
    each affected memory carries its own ``prior_value -> new_value``
    snapshot, so an inverse edit can restore an individual row without
    needing to inspect a bulk payload's member list. The drawer's
    significance-distribution panel surfaces the new buckets on the
    refreshed partial.
    Both levels are clamped to 0..3 (matching ``edit_memory_significance``)
    and a no-op (``level_from == level_to``) is rejected with 400 so a
    misclick can't pad the event log with empty edits.
    """
    chat = get_chat(conn, chat_id)
    if chat is None:
        raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}")
    lf = max(0, min(3, int(level_from)))
    lt = max(0, min(3, int(level_to)))
    if lf == lt:
        raise HTTPException(
            status_code=400,
            detail=f"level_from and level_to must differ (both = {lf})",
        )
    rows = conn.execute(
        "SELECT id FROM memories WHERE chat_id = ? AND significance = ? "
        "ORDER BY id ASC",
        (chat_id, lf),
    ).fetchall()
    for row in rows:
        memory_id = int(row[0])
        append_and_apply(
            conn,
            kind="manual_edit",
            payload={
                "target_kind": "memory_significance",
                "target_id": memory_id,
                "prior_value": lf,
                "new_value": lt,
            },
        )
    return await drawer(chat_id, request, conn)
@router.post(
    "/chats/{chat_id}/drawer/memory/{memory_id}/pin",
    response_class=HTMLResponse,
@@ -1234,28 +1292,18 @@ async def delete_preview(
    report = compute_delete_impact(conn, target_event_id=int(event_id))
-    # Build the modal HTML directly — the impact report is small and
+    # T110.3: render via the ``_delete_impact_modal.html`` Jinja partial
-    # reusing the drawer template would require a fragment include just
+    # so HTML autoescape covers user-controllable fields (item.kind,
-    # for this surface. Mirrors the rewind-preview style in
+    # item.description, notes) automatically. The prior implementation
-    # :func:`chat.web.turns.rewind_preview`.
+    # built the modal HTML via raw f-string concatenation and required
-    items_html = "".join(
+    # explicit ``html.escape()`` calls (T110.2) on each interpolated
-        f"<li><strong>{item.kind}</strong>: {item.description}</li>"
+    # field; under autoescape those calls become redundant. Mirrors the
-        for item in report.cascading
+    # rewind-preview style in :func:`chat.web.turns.rewind_preview`.
    return TEMPLATES.TemplateResponse(
        request,
        "_delete_impact_modal.html",
        {"chat_id": chat_id, "impact": report},
    )
    notes_html = "".join(f"<li>{note}</li>" for note in report.notes)
    body = (
        "<div class='delete-impact-modal'>"
        f"<h3>Delete event {report.target_event_id}?</h3>"
        f"<p>This will discard {len(report.cascading)} events. Cascade:</p>"
        f"<ul class='delete-impact-cascade'>{items_html or '<li>none</li>'}</ul>"
        f"<ul class='delete-impact-notes'>{notes_html}</ul>"
        f"<form hx-post='/chats/{chat_id}/drawer/turn/delete/{report.target_event_id}' "
        "hx-target='#drawer' hx-swap='innerHTML'>"
        "<button type='submit'>Confirm delete</button>"
        "</form>"
        "</div>"
    )
    return HTMLResponse(body)
@router.post(
@@ -1278,7 +1326,19 @@ async def delete_turn(
    A snapshot is taken before truncation (inside ``execute_rewind``)
    so the user can recover via the snapshot index.
    T110.1 guards ``event_id <= 0``: a stale tab or hand-crafted request
    posting ``event_id=0`` would otherwise compute ``after_event_id=-1``
    and silently truncate the entire log. ``id`` is auto-assigned by
    SQLite starting at 1 so any caller's "real" id is always >= 1; a
    zero or negative value can only mean a client bug, surfaced as 400.
    """
    if int(event_id) <= 0:
        raise HTTPException(
            status_code=400,
            detail=f"event_id must be a positive integer, got {event_id}",
        )
    chat = get_chat(conn, chat_id)
    if chat is None:
        raise HTTPException(status_code=404, detail=f"chat not found: {chat_id}")
@@ -17,8 +17,7 @@ from fastapi import APIRouter, Depends, Form, HTTPException, Request
 from fastapi.responses import HTMLResponse, RedirectResponse
 from fastapi.templating import Jinja2Templates
-from chat.eventlog.log import append_event
+from chat.eventlog.log import append_and_apply
 from chat.eventlog.projector import project
 from chat.llm.client import LLMClient
 from chat.services.kickoff import parse_kickoff
 from chat.state.entities import get_bot, get_you
@@ -32,14 +31,97 @@ router = APIRouter()
 def get_llm_client(request: Request) -> LLMClient:
-    """Production LLM client. Tests override this via ``app.dependency_overrides``."""
+    """Production LLM client. Tests override this via ``app.dependency_overrides``.
    Returns a :class:`chat.llm.router.RoutedLLMClient` that splits
    traffic: the narrative model goes to Featherless, the classifier
    + embeddings go to the local MLX server (``mlx-omni-server``).
    Both backends share the OpenAI-compatible surface, so the routing
    is invisible to call sites — they just pass ``model=...`` and the
    router picks the backend.
    """
    settings = request.app.state.settings
    from chat.llm.featherless import FeatherlessClient
    from chat.llm.local_mlx import LocalMLXClient
    from chat.llm.router import RoutedLLMClient
-    return FeatherlessClient(
+    narrative = FeatherlessClient(
        api_key=settings.featherless_api_key,
        base_url=settings.featherless_base_url,
    )
    # Dedicated classifier client when a provider pin is configured —
    # routes Llama-3.1-8B (or whatever ``classifier_model`` is) onto a
    # specific upstream like Cerebras for ~10x throughput. When the
    # pin is empty, ``classifier`` is None and the router falls back
    # to the narrative client for classifier traffic.
    classifier = None
    if settings.classifier_provider_order:
        classifier = FeatherlessClient(
            api_key=settings.featherless_api_key,
            base_url=settings.featherless_base_url,
            default_extra_body={
                "provider": {"order": list(settings.classifier_provider_order)}
            },
        )
    local = LocalMLXClient(base_url=settings.local_mlx_base_url)
    return RoutedLLMClient(
        narrative=narrative,
        classifier=classifier,
        local=local,
        narrative_model=settings.narrative_model,
    )
 def _coerce_iso_time(value: str) -> str:
    """Permissive parser that returns a canonical ISO 8601 datetime.
    The kickoff classifier (chat/services/kickoff.py) returns
    ``initial_time_iso`` as a free-form string; in practice it emits
    things like ``"Sun 2024-05-12 07:00:00"``,
    ``"Tuesday, May 14, 2024 7:00 AM"``, or proper ISO. The strict
    ``datetime.fromisoformat`` would 400 on those, so this helper
    tries a sequence of common classifier-emitted formats and
    returns a canonical ``YYYY-MM-DDTHH:MM:SS+00:00`` form.
    Raises ``ValueError`` when nothing parses, so the caller can 400
    cleanly.
    """
    from datetime import datetime, timezone
    s = (value or "").strip()
    if not s:
        return s
    # Strict ISO first (covers "2026-04-26T20:00:00+00:00" and friends).
    try:
        dt = datetime.fromisoformat(s)
    except ValueError:
        dt = None
    if dt is None:
        # Common classifier-emitted formats, in rough frequency order.
        formats = [
            "%a %Y-%m-%d %H:%M:%S",       # Sun 2024-05-12 07:00:00
            "%A %Y-%m-%d %H:%M:%S",       # Sunday 2024-05-12 07:00:00
            "%Y-%m-%d %H:%M:%S",          # 2024-05-12 07:00:00
            "%Y-%m-%d %H:%M",             # 2024-05-12 07:00
            "%Y-%m-%d",                   # 2024-05-12 (date only)
            "%a %b %d %Y %H:%M:%S",       # Sun May 12 2024 07:00:00
            "%A, %B %d, %Y %I:%M %p",     # Tuesday, May 14, 2024 7:00 AM
            "%B %d, %Y %I:%M %p",         # May 14, 2024 7:00 AM
            "%a %b %d %H:%M:%S %Y",       # Sun May 12 07:00:00 2024 (asctime-ish)
        ]
        for fmt in formats:
            try:
                dt = datetime.strptime(s, fmt)
                break
            except ValueError:
                continue
    if dt is None:
        raise ValueError(f"could not parse {value!r} as a datetime")
    # Naive datetimes assumed UTC (the v1 model is single-user, single
    # timezone — keeping it consistent with chat_state.time defaults).
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)
    return dt.isoformat(timespec="seconds")
 def _parse_holding(text: str) -> list[str]:
@@ -157,11 +239,13 @@ async def kickoff_post(
    if bot is None:
        raise HTTPException(status_code=404, detail=f"bot not found: {bot_id}")
-    # Loose ISO 8601 validation. ``datetime.fromisoformat`` accepts the offset
+    # Permissive datetime parsing — the classifier emits a variety of
-    # form ``2026-04-26T20:00:00+00:00`` we use; reject anything it can't parse.
+    # human-readable formats ("Sun 2024-05-12 07:00:00",
    # "Tuesday, May 14, 2024 7:00 AM", proper ISO, etc.). We coerce
    # to canonical ISO and only 400 if NOTHING parses.
    if initial_time_iso.strip():
        try:
-            datetime.fromisoformat(initial_time_iso.strip())
+            initial_time_iso = _coerce_iso_time(initial_time_iso)
        except ValueError:
            raise HTTPException(
                status_code=400,
@@ -178,8 +262,14 @@ async def kickoff_post(
    ).fetchone()
    container_id = next_container_row[0]
    # Use ``append_and_apply`` per event (live-path pattern) rather than
    # appending all-then-project. ``project()`` replays the *entire*
    # event log; non-idempotent handlers like ``_apply_chat_created``
    # (raw INSERT into chats) then 500 with UNIQUE constraint failures
    # for any chats that already exist from prior kickoffs.
    # 1. chat_created
-    append_event(
+    append_and_apply(
        conn,
        kind="chat_created",
        payload={
@@ -192,7 +282,7 @@ async def kickoff_post(
    )
    # 2. container_created
-    append_event(
+    append_and_apply(
        conn,
        kind="container_created",
        payload={
@@ -208,7 +298,7 @@ async def kickoff_post(
    bot_interruptible = bool(bot_activity_action_interruptible)
    # 3. activity_change for "you"
-    append_event(
+    append_and_apply(
        conn,
        kind="activity_change",
        payload={
@@ -229,7 +319,7 @@ async def kickoff_post(
    )
    # 4. activity_change for bot
-    append_event(
+    append_and_apply(
        conn,
        kind="activity_change",
        payload={
@@ -250,7 +340,7 @@ async def kickoff_post(
    )
    # 5. scene_opened
-    append_event(
+    append_and_apply(
        conn,
        kind="scene_opened",
        payload={
@@ -267,7 +357,7 @@ async def kickoff_post(
    facts = _parse_facts(edge_seed_knowledge_facts)
    if edge_seed_summary.strip():
        facts.insert(0, f"[summary] {edge_seed_summary.strip()}")
-    append_event(
+    append_and_apply(
        conn,
        kind="edge_update",
        payload={
@@ -278,9 +368,4 @@ async def kickoff_post(
        },
    )
    # Project all events at once. ``bot_authored`` (already in log from prior
    # POST) is idempotent (INSERT OR REPLACE); the new events project cleanly
    # because they're being applied for the first time.
    project(conn)
    return RedirectResponse(url=f"/chats/{chat_id}", status_code=303)
@@ -71,18 +71,27 @@ def _read_recent_meanwhile_dialogue(
    that already match — avoids an unbounded scan as ``event_log``
    grows. The user-side rows match on chat_id only since they aren't
    tagged with a scene id (they ride the chat-wide log).
    T113: clamp by the active branch's ``[origin, head]`` event-id range
    so meanwhile prompt context respects the user's current branch.
    Bootstrap-main and "no active branch" both fall through to ``(0,
    BIG_INT)`` — no functional change for the metadata-only Phase 4 era.
    """
    from chat.state.branches import active_branch_event_ids
    origin, head = active_branch_event_ids(conn)
    cur = conn.execute(
        "SELECT id, kind, payload_json FROM event_log "
        "WHERE kind IN ('user_turn', 'user_turn_edit', 'assistant_turn') "
        "  AND superseded_by IS NULL AND hidden = 0 "
        "  AND id BETWEEN ? AND ? "
        "  AND json_extract(payload_json, '$.chat_id') = ? "
        "  AND ("
        "    kind IN ('user_turn', 'user_turn_edit') "
        "    OR json_extract(payload_json, '$.meanwhile_scene_id') = ?"
        "  ) "
        "ORDER BY id DESC LIMIT ?",
-        (chat_id, scene_id, limit),
+        (origin, head, chat_id, scene_id, limit),
    )
    rows = cur.fetchall()
    rows.reverse()
@@ -14,6 +14,12 @@ For each match we hydrate just enough metadata to render a row:
 * the originating scene title when one exists,
 * and the ``pov_summary`` itself.
 T106 (Phase 4.5): hydration is batched. Pre-T106 the route called
 ``get_bot``/``get_chat``/``get_scene`` once per result row — N+1 with
 ``DEFAULT_SEARCH_K=50`` meaning up to 150 individual SELECTs per page
 load. We now collect distinct ids first and fan-in via three
 ``WHERE id IN (...)`` queries, then map back per row.
 We deliberately keep this module synchronous and template-only — no
 HTMX swaps, no JSON API — because the search box is a "leave the
 current chat to look something up" surface, not an inline drawer.
@@ -21,7 +27,9 @@ current chat to look something up" surface, not an inline drawer.
 from __future__ import annotations
 import json
 from pathlib import Path
 from sqlite3 import Connection
 from fastapi import APIRouter, Depends, Request
 from fastapi.responses import HTMLResponse
@@ -36,29 +44,145 @@ TEMPLATES = Jinja2Templates(
    directory=str(Path(__file__).resolve().parent.parent / "templates")
 )
 #: Maximum cross-chat FTS matches surfaced per ``/search`` page load.
 #: Extracted as a module-level constant (T106) so the cap is tunable
 #: without touching the route body. ``search_all_memories`` itself
 #: defaults to a smaller ``k=20``; we override here because the
 #: top-bar search is a "scan everything I've seen" surface, not an
 #: inline drawer.
 DEFAULT_SEARCH_K = 50
 router = APIRouter()
 def _fetch_bots_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]:
    """Batched sibling of :func:`chat.state.entities.get_bot`.
    Inlined here (not exported from ``state.entities``) to keep T106's
    scope confined to ``search.py`` per the Phase 4.5 plan. Returns
    ``{bot_id: bot_dict}`` for every id present in ``ids``; ids with
    no matching row are simply absent from the map (the caller falls
    back to the raw id string the same way it did pre-T106).
    Empty ``ids`` short-circuits to ``{}`` because SQLite rejects
    ``WHERE id IN ()`` as a syntax error.
    """
    if not ids:
        return {}
    placeholders = ",".join("?" * len(ids))
    cols = [c[1] for c in conn.execute("PRAGMA table_info(bots)").fetchall()]
    rows = conn.execute(
        f"SELECT * FROM bots WHERE id IN ({placeholders})",
        tuple(ids),
    ).fetchall()
    out: dict[str, dict] = {}
    for row in rows:
        d = dict(zip(cols, row))
        d["voice_samples"] = json.loads(d.pop("voice_samples_json"))
        d["traits"] = json.loads(d.pop("traits_json"))
        out[d["id"]] = d
    return out
 def _fetch_chats_by_ids(conn: Connection, ids: set[str]) -> dict[str, dict]:
    """Batched sibling of :func:`chat.state.world.get_chat`.
    Mirrors that helper's ``chats``/``chat_state`` JOIN so the returned
    dicts have the same shape (``narrative_anchor``, ``time``,
    ``weather``, ``active_scene_id``, etc.). Empty ``ids`` returns
    ``{}`` to dodge the ``IN ()`` syntax error.
    """
    if not ids:
        return {}
    placeholders = ",".join("?" * len(ids))
    rows = conn.execute(
        "SELECT c.id, c.host_bot_id, c.guest_bot_id, c.created_at, "
        "       s.time, s.weather, s.active_scene_id, s.narrative_anchor "
        f"FROM chats c JOIN chat_state s ON s.chat_id = c.id "
        f"WHERE c.id IN ({placeholders})",
        tuple(ids),
    ).fetchall()
    return {
        row[0]: {
            "id": row[0],
            "host_bot_id": row[1],
            "guest_bot_id": row[2],
            "created_at": row[3],
            "time": row[4],
            "weather": row[5],
            "active_scene_id": row[6],
            "narrative_anchor": row[7],
        }
        for row in rows
    }
 def _fetch_scenes_by_ids(conn: Connection, ids: set[int]) -> dict[int, dict]:
    """Batched sibling of :func:`chat.state.world.get_scene`.
    Returns ``{scene_id: scene_dict}`` with ``participants`` already
    JSON-decoded so callers see the same shape as the per-row helper.
    Empty ``ids`` returns ``{}``.
    """
    if not ids:
        return {}
    placeholders = ",".join("?" * len(ids))
    cols = [c[1] for c in conn.execute("PRAGMA table_info(scenes)").fetchall()]
    rows = conn.execute(
        f"SELECT * FROM scenes WHERE id IN ({placeholders})",
        tuple(ids),
    ).fetchall()
    out: dict[int, dict] = {}
    for row in rows:
        d = dict(zip(cols, row))
        d["participants"] = json.loads(d.pop("participants_json"))
        out[d["id"]] = d
    return out
@router.get("/search", response_class=HTMLResponse)
 async def search(request: Request, q: str = "", conn=Depends(get_conn)):
-    """Render ``search.html`` with up to 50 cross-chat FTS matches.
+    """Render ``search.html`` with up to :data:`DEFAULT_SEARCH_K` matches.
    ``q`` is intentionally allowed to be empty — that path renders the
    page's "enter a query" placeholder rather than a 400, because the
    top-bar form submits to this URL even with an empty input. T93's
    service short-circuits whitespace-only queries to ``[]`` so there
    is no FTS5 ``MATCH ''`` syntax error to guard against here.
    """
    raw_results = search_all_memories(conn, query=q, k=50) if q else []
-    # Hydrate display fields per row. We do this in the route (not the
+    Hydration (T106) is batched: rather than calling ``get_bot`` /
-    # service) so the service stays a pure FTS shim that other UIs
+    ``get_chat`` / ``get_scene`` per row (worst case 3 * k individual
-    # can reuse.
+    SELECTs), we collect distinct ids and issue one ``IN (...)`` query
    per entity kind, then map back during the row build. ``get_bot``
    et al. remain imported for test-time monkeypatching but are no
    longer invoked on the hot path.
    """
    raw_results = (
        search_all_memories(conn, query=q, k=DEFAULT_SEARCH_K) if q else []
    )
    # Collect distinct ids up front so the IN-list queries dedupe (a
    # popular bot or scene shows up many times across the result set).
    bot_ids: set[str] = {r["owner_id"] for r in raw_results if r["owner_id"]}
    chat_ids: set[str] = {r["chat_id"] for r in raw_results if r["chat_id"]}
    scene_ids: set[int] = {
        r["scene_id"] for r in raw_results if r["scene_id"]
    }
    bots_by_id = _fetch_bots_by_ids(conn, bot_ids)
    chats_by_id = _fetch_chats_by_ids(conn, chat_ids)
    scenes_by_id = _fetch_scenes_by_ids(conn, scene_ids)
    # Hydrate display fields per row from the batched maps. We do this
    # in the route (not the service) so the service stays a pure FTS
    # shim that other UIs can reuse.
    results = []
    for row in raw_results:
-        bot = get_bot(conn, row["owner_id"])
+        bot = bots_by_id.get(row["owner_id"])
-        chat = get_chat(conn, row["chat_id"])
+        chat = chats_by_id.get(row["chat_id"])
-        scene = get_scene(conn, row["scene_id"]) if row["scene_id"] else None
+        scene = (
            scenes_by_id.get(row["scene_id"]) if row["scene_id"] else None
        )
        results.append(
            {
                "memory_id": row["memory_id"],
@@ -69,6 +193,13 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)):
                    chat.get("narrative_anchor") if chat else None
                ),
                "scene_id": row["scene_id"],
                # T111.2: event_id deep-links to the originating turn
                # via the ``id="turn-{event_id}"`` anchor that Phase 3.5
                # T86 stamps on each turn DOM node. May be ``None`` for
                # memory rows projected before the 0014 migration ran
                # (T109 did not backfill historical rows); the template
                # falls back to a chat-level link in that case.
                "event_id": row["event_id"],
                # Scenes have no ``title`` column today; surface the
                # ``started_at`` timestamp as a human-friendly label
                # when a scene is set, otherwise leave it blank.
@@ -76,6 +207,14 @@ async def search(request: Request, q: str = "", conn=Depends(get_conn)):
                    scene.get("started_at") if scene else None
                ),
                "pov_summary": row["pov_summary"],
                # T111.1: ``snippet`` is the FTS5 windowed excerpt with
                # ``<mark>`` tags around each match. Falls back to the
                # full ``pov_summary`` if the row lacks a snippet (which
                # shouldn't happen on this code path because every
                # ``raw_results`` row came from a MATCH query, but we
                # guard defensively so the template never renders
                # ``None``).
                "snippet": row.get("snippet") or row["pov_summary"],
                "significance": row["significance"],
                "ts": row["ts"],
            }
@@ -4,8 +4,7 @@ from fastapi import APIRouter, Depends, Form, HTTPException, Request
 from fastapi.responses import HTMLResponse
 from fastapi.templating import Jinja2Templates
-from chat.eventlog.log import append_event
+from chat.eventlog.log import append_and_apply
 from chat.eventlog.projector import project
 from chat.state.entities import get_you
 from chat.web.bots import get_conn
@@ -40,8 +39,10 @@ async def settings_post(
        "pronouns": pronouns.strip(),
        "persona": persona.strip(),
    }
-    append_event(conn, kind="you_authored", payload=payload)
+    # Per-event apply (NOT project()) — see docs/audits/2026-04-27-project-callers.md.
-    project(conn)
+    # ``project()`` replays the full log, which trips raw-INSERT handlers like
    # ``_apply_chat_created`` once chat events are present.
    append_and_apply(conn, kind="you_authored", payload=payload)
    return TEMPLATES.TemplateResponse(
        request,
@@ -8,20 +8,27 @@ Routes:
 * ``GET  /snapshots``                    list all snapshots (both kinds)
 * ``POST /snapshots/take``               take a periodic snapshot now
-* ``POST /snapshots/restore/{id}``       restore (requires matching ``confirm_id``)
+* ``POST /snapshots/restore/{id}``       restore (requires matching ``confirm_id`` and ``kind``)
 * ``GET  /snapshots/{id}/preview``       show metadata + delta vs current
 The ``snapshot_id`` is the filename stem (the UTC timestamp written by
 :func:`chat.services.snapshot.take_snapshot`) — there's no separate UUID,
 and the timestamp filename is already unique per snapshot kind. Both
 periodic and rewind snapshots share the same id space lookup-wise, so
-the restore + preview routes accept ``kind`` as a form/query param to
+the restore + preview routes require ``kind`` as a form/query param to
-disambiguate.
+disambiguate (a missing/empty ``kind`` is a 400, not a silent default).
 Note on ``created_at`` mtime drift: the listing's ``created_at`` comes
 from the file's mtime, not the encoded filename timestamp. ``cp -p``
 preserves mtime, but plain ``cp`` resets it to "now" — so a copied
 snapshot can show a misleading ``created_at`` while its filename still
 reflects the original UTC capture time.
 """
 from __future__ import annotations
 import json
 from datetime import datetime, timezone
 from pathlib import Path
 from fastapi import APIRouter, Depends, Form, HTTPException, Request
@@ -52,8 +59,6 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]:
    ``last_event_id`` (parsed from the JSON body — small enough that
    listing isn't a performance concern for the handful of files we keep).
    """
    from datetime import datetime, timezone
    rows: list[dict] = []
    for kind in SNAPSHOT_KINDS:
        snap_dir = data_dir / "snapshots" / kind
@@ -85,12 +90,26 @@ def _list_all_snapshots(data_dir: Path) -> list[dict]:
    return rows
 def _require_kind(kind: str) -> str:
    """Reject missing/empty/unknown ``kind`` with 400.
    Defaulting silently to ``"periodic"`` made rewind-snapshot lookups
    appear as 404s, which is confusing — make the client always state
    the kind explicitly.
    """
    if not kind or kind not in SNAPSHOT_KINDS:
        raise HTTPException(
            status_code=400,
            detail=f"kind must be one of {SNAPSHOT_KINDS}",
        )
    return kind
 def _resolve_snapshot_path(
    data_dir: Path, snapshot_id: str, kind: str
 ) -> Path:
    """Map an ``(id, kind)`` pair to the on-disk file, or 404."""
-    if kind not in SNAPSHOT_KINDS:
+    _require_kind(kind)
        raise HTTPException(status_code=400, detail=f"unknown kind: {kind}")
    path = data_dir / "snapshots" / kind / f"{snapshot_id}.json"
    if not path.exists():
        raise HTTPException(status_code=404, detail="snapshot not found")
@@ -127,7 +146,7 @@ async def snapshots_restore(
    snapshot_id: str,
    request: Request,
    confirm_id: str = Form(""),
-    kind: str = Form("periodic"),
+    kind: str = Form(""),
    conn=Depends(get_conn),
 ):
    """Hard-confirm restore: ``confirm_id`` must equal the path id.
@@ -135,7 +154,11 @@ async def snapshots_restore(
    Mismatched confirm → 400 (without touching the DB). On match, the
    existing :func:`restore_from_snapshot` clears projected tables and
    re-loads them from the dump.
    ``kind`` is required (must be ``"periodic"`` or ``"rewind"``) — a
    missing or empty value 400s rather than silently defaulting.
    """
    _require_kind(kind)
    if confirm_id != snapshot_id:
        raise HTTPException(
            status_code=400,
@@ -151,7 +174,7 @@ async def snapshots_restore(
 async def snapshots_preview(
    snapshot_id: str,
    request: Request,
-    kind: str = "periodic",
+    kind: str = "",
    conn=Depends(get_conn),
 ):
    """Show snapshot metadata + a basic delta against the current event log.
@@ -159,7 +182,10 @@ async def snapshots_preview(
    Phase 4 keeps this simple: the snapshot's ``last_event_id`` plus the
    current ``MAX(event_log.id)`` is enough to tell the user how far the
    log has moved on. A richer per-table diff is a Phase 4.5+ concern.
    ``kind`` is required — see :func:`snapshots_restore`.
    """
    _require_kind(kind)
    settings = request.app.state.settings
    path = _resolve_snapshot_path(settings.data_dir, snapshot_id, kind)
    dump = json.loads(path.read_text())
@@ -67,6 +67,7 @@ from chat.services.multi_state_update import compute_state_updates_for_present
 from chat.services.prompt import (
    assemble_narrative_prompt,
    consume_pending_meanwhile_digests,
    trim_to_max_beats,
 )
 from chat.services.rewind import compute_rewind_preview, execute_rewind
 from chat.services.scene_close import detect_scene_close
@@ -482,6 +483,11 @@ async def post_turn(
        _in_flight_tasks.pop(chat_id, None)
    primary_text = "".join(primary_accumulated)
    # Belt-and-suspenders: trim to 3 beats max even if the model
    # ignored the "HARD CAP: 2-3 beats" prompt instruction. Roleplay-
    # tuned narrators are reliably verbose; a physical max_tokens
    # truncates mid-word, this trims at a beat boundary.
    primary_text = trim_to_max_beats(primary_text, max_beats=3)
    # 7. Append the assistant_turn with the final text. (See note above on
    # why we skip ``project`` for these transcript-only event kinds.)
@@ -677,6 +683,10 @@ async def post_turn(
                _in_flight_tasks.pop(chat_id, None)
            interjection_text = "".join(interject_accumulated)
            # Same beat-cap as the primary turn — interjections are
            # by definition short, but Cydonia-class narrators ignore
            # that. 2 beats is plenty for a chime-in.
            interjection_text = trim_to_max_beats(interjection_text, max_beats=2)
            # Capture the event id (T86 follow-up) so the SSE fragment
            # below carries ``id="turn-<n>"`` for in-place swap.
@@ -812,6 +822,14 @@ async def post_turn(
                    payload={
                        "event_id": transition.event_id,
                        "started_at": chat.get("time"),
                        # T114.1: back-reference to the assistant_turn that
                        # triggered this transition. Regenerate uses this
                        # to roll back lifecycle transitions when the turn
                        # is superseded. Forward-only — older events
                        # without this field are skipped by rollback.
                        "triggered_by_assistant_turn_id": (
                            primary_assistant_event_id
                        ),
                    },
                )
            elif transition.new_status == "completed":
@@ -821,6 +839,10 @@ async def post_turn(
                    payload={
                        "event_id": transition.event_id,
                        "completed_at": chat.get("time"),
                        # T114.1: back-reference (see above).
                        "triggered_by_assistant_turn_id": (
                            primary_assistant_event_id
                        ),
                    },
                )
                # Run promotion inline so the artifact-emitting events
@@ -842,6 +864,10 @@ async def post_turn(
                    payload={
                        "event_id": transition.event_id,
                        "completed_at": chat.get("time"),
                        # T114.1: back-reference (see above).
                        "triggered_by_assistant_turn_id": (
                            primary_assistant_event_id
                        ),
                    },
                )
            # Any other ``new_status`` value falls through silently —
@@ -873,6 +899,20 @@ async def post_turn(
    # mid-stream still meant to close the scene — the cancelled bot
    # beat doesn't invalidate that intent. Pinned by
    # test_cancelled_turn_still_closes_scene_when_user_prose_signals_close.
    #
    # T108 NOTE — the in-memory append order is correct, but the cancel
    # path re-raises ``CancelledError`` at the end of ``post_turn``
    # (see step 11 below). The ``open_db`` dependency teardown skips
    # ``conn.commit()`` when the consumer raises, which means in
    # production a genuine cancel currently rolls back ALL post-cancel
    # writes — including this scene_closed event, the truncated
    # assistant_turn record, edge updates, and per-POV summaries. The
    # T74.3 regression test passes only because of a missing
    # ``import asyncio`` in the test module: the inline mock raises
    # ``NameError`` instead of ``CancelledError``, which is caught by
    # the ``except Exception:`` branch and leaves ``cancelled=False``,
    # so the function returns 204 normally and the commit fires. This
    # is a transactional bug deferred for triage (T108 report).
    if scene is not None and prose.strip():
        container = None
        if scene.get("container_id") is not None:
@@ -0,0 +1,205 @@
 # Audit: `project()` callers and non-idempotent projector handlers
 **Date:** 2026-04-27
 **Triggering incident:** commit `0f8bf94` — `kickoff_post` 500'd with
 `sqlite3.IntegrityError: UNIQUE constraint failed: chats.id` after a
 second bot's kickoff. Root cause: the route appended events with
 `append_event()` and then called `project(conn)`, which replays the
 *entire* event log. The `chat_created` handler in `chat/state/world.py`
 uses raw `INSERT INTO chats ...` (no `OR REPLACE`/`OR IGNORE`), so on a
 DB that already had a first bot's chat row, the replay re-hit that row
 and raised.
 This audit walks the rest of the live request paths to make sure no
 other route has the same shape, and inventories every projector handler
 that uses raw `INSERT` so the trade-offs are documented for future
 hardening passes.
 ---
 ## Step 1 — `project()` callers
 `grep -rn "project(" chat/ --include="*.py"` (excluding the definition
 itself in `chat/eventlog/projector.py:17` and the local `project_id`
 type variables that the regex doesn't actually catch):
 | File:line | Caller | Classification |
 |---|---|---|
 | `chat/web/bots.py:113` | `bot_create` route — append `bot_authored` then `project(conn)` | **Unsafe (live path) — fixed** |
 | `chat/web/settings.py:44` | `settings_post` route — append `you_authored` then `project(conn)` | **Unsafe (live path) — fixed** |
 | `chat/services/rewind.py:110` | `execute_rewind` — clears every projected table then re-projects from the truncated log | **Safe (replay-only)** |
 | `chat/eventlog/projector.py:17` | Definition site, not a call | n/a |
 | `tests/test_*.py` (~50 tests) | Test setup pattern: append a sequence of synthetic events into a fresh DB, then `project(conn)` to materialise | **Safe (replay-only)** — projects against an empty/fresh DB; not a live request path |
 ### Safe (replay-only)
 - **`chat/services/rewind.py:110`** — `execute_rewind` is the canonical
  "rebuild the projection" entry point. Lines 95–104 explicitly
  `DELETE FROM` every projected table (`memories`, `activity`, `scenes`,
  `containers`, `chat_state`, `chats`, `edges`, `bots`, `you_entity`,
  `classifier_failures`) before calling `project(conn)`. The handler
  registry then walks the truncated log against empty tables, so even
  the raw-INSERT handlers run safely on a clean slate. The module
  docstring (lines 1–21) calls out exactly why a full replay (rather
  than a "revert delta") is the right move here: the `edge_update`
  handler is a delta accumulator with no clean inverse. **Do not
  change.**
 - **Test suite** — every `from chat.eventlog.projector import project`
  in `tests/` is a setup helper. They open a fresh in-memory or
  tmp-path DB, append a hand-crafted sequence of events, and call
  `project(conn)` once. There is no second-replay risk because the DB
  starts empty. These are not live paths.
 ### Unsafe (live-path) — fixed in this audit
 Both fixes follow the pattern established by `0f8bf94`: drop the
 `append_event` + `project` pair in favour of `append_and_apply` (defined
 in `chat/eventlog/log.py:32`), which appends and runs *only the
 brand-new event* through its registered handler.
 - **`chat/web/bots.py:113` — `bot_create`**
  Was: `append_event(conn, kind="bot_authored", ...); project(conn)`.
  Now: `append_and_apply(conn, kind="bot_authored", ...)`.
  In isolation, `_apply_bot_authored` is itself idempotent (`INSERT OR
  REPLACE INTO bots`), so the *route* didn't fail today. The bug is
  latent: as soon as any kickoff ran first (which produces
  `chat_created` events), the next call to `bot_create` would replay
  that prior `chat_created` and trip the same UNIQUE constraint. We
  saw this happen in `0f8bf94` — fixing the symmetric route prevents
  the next variant of the same incident.
  Removed unused imports: `append_event`, `project`.
 - **`chat/web/settings.py:44` — `settings_post`**
  Was: `append_event(conn, kind="you_authored", ...); project(conn)`.
  Now: `append_and_apply(conn, kind="you_authored", ...)`.
  Same shape as `bot_create`. `_apply_you_authored` is idempotent on
  its own (`INSERT OR REPLACE INTO you_entity`), but `project()` walks
  the *whole* log, including any `chat_created` / `container_created`
  / `scene_opened` events that have accumulated. Editing the user's
  own settings on a non-empty DB would 500 with the same UNIQUE
  constraint error — not because the new event is unsafe, but because
  the replay is. Fixed by per-event apply.
  Removed unused imports: `append_event`, `project`.
 ### Unsafe — still to fix
 None. The two unsafe live-path call sites identified above were both
 fixed in this commit. Future hardening: a CI lint that flags
 `project(` outside `chat/services/rewind.py` and `tests/` would catch a
 regression, but that's out of scope here.
 ---
 ## Step 2 — non-idempotent projector handler inventory
 Output of `grep -n "INSERT INTO\|INSERT OR REPLACE\|INSERT OR IGNORE"
 chat/state/*.py`, classified.
 ### Replay-safe handlers
 These either use `INSERT OR REPLACE` / `INSERT OR IGNORE` (so a second
 apply is a no-op or an overwrite of identical data), or are pure
 `UPDATE` against rows the prior event created.
 | Handler | File | Statement | Why safe |
 |---|---|---|---|
 | `_apply_bot_authored` | `chat/state/entities.py:12` | `INSERT OR REPLACE INTO bots` | `id` is the natural PK; replay overwrites with identical payload. |
 | `_apply_you_authored` | `chat/state/entities.py:29` | `INSERT OR REPLACE INTO you_entity` | Singleton row keyed on `id=1`. |
 | `_apply_activity_change` | `chat/state/world.py:98` | `INSERT OR REPLACE INTO activity` | Activity is keyed on `entity_id` — last write wins is exactly the intended semantics. |
 | `_apply_thread_opened` | `chat/state/threads.py:12` | `INSERT OR IGNORE INTO threads` | `thread_id` is the natural PK. |
 | `_apply_event_planned` | `chat/state/events.py:16` | `INSERT OR IGNORE INTO events` | `event_id` is the natural PK. |
 | `_apply_branch_created` | `chat/state/branches.py:27` | `INSERT OR IGNORE INTO branches` | Branch `name` is unique. |
 | `_apply_group_node_initialized` | `chat/state/group_node.py:12` | `INSERT OR REPLACE INTO group_node` | One row per `chat_id`. |
 | `_apply_embedding_indexed` | `chat/state/embeddings.py:28` | `INSERT OR REPLACE INTO embeddings` | One vector per `memory_id`. |
 | Pure-`UPDATE` handlers | various — `_apply_time_skip_*`, `_apply_guest_added`/`_removed`, `_apply_scene_closed`, `_apply_memory_significance_set`, `_apply_memory_pin_changed`, `_apply_meanwhile_scene_closed`, `_apply_meanwhile_digest_consumed`, `_apply_thread_updated`, `_apply_event_started`/`_completed`/`_cancelled` (etc.), `_apply_group_node_updated` | n/a | Idempotent: re-applying the same UPDATE produces the same row state. |
 ### Unsafe-on-replay handlers (raw `INSERT`)
 | Handler | File | Statement | Failure mode on replay |
 |---|---|---|---|
 | `_apply_chat_created` | `chat/state/world.py:14` | `INSERT INTO chats`, `INSERT INTO chat_state` | `chats.id` is PK — second insert raises `IntegrityError: UNIQUE constraint failed: chats.id`. **This is the `0f8bf94` bug.** `chat_state.chat_id` is also unique; would raise too. |
 | `_apply_container_created` | `chat/state/world.py:78` | `INSERT INTO containers` | `containers.id` is `INTEGER PRIMARY KEY AUTOINCREMENT` — replay does NOT raise (a new id is assigned), but it silently creates a duplicate row, fragmenting downstream lookups by `(chat_id, name)`. **Silent corruption, not a crash.** |
 | `_apply_scene_opened` | `chat/state/world.py:115` | `INSERT INTO scenes` | Same shape: autoincrement `id`. Replay creates a duplicate scene row and re-points `chat_state.active_scene_id` to the new copy. **Silent corruption.** |
 | `_apply_memory_written` | `chat/state/memory.py:14` | `INSERT INTO memories` | Autoincrement `id`. Replay duplicates the memory; FTS5 trigger then double-indexes the same `pov_summary`. **Silent corruption + double-counting in retrieval.** |
 | `_apply_meanwhile_scene_started` | `chat/state/meanwhile.py:29` | `INSERT INTO scenes` (with explicit `scene_id`) | Caller supplies `scene_id` (deterministic). Replay raises `IntegrityError: UNIQUE constraint failed: scenes.id`. **Hard crash, like `chat_created`.** |
 | `_apply_meanwhile_digest_created` | `chat/state/meanwhile.py:67` | `INSERT INTO meanwhile_digest_pending` | Autoincrement `id`. Replay creates a duplicate pending digest, surfacing the same summary twice in the next you-scene's prompt. **Silent corruption.** |
 | `_apply_edge_update` | `chat/state/edges.py:12` | `INSERT OR IGNORE INTO edges` followed by `UPDATE … SET affinity = ? + delta` | The `INSERT OR IGNORE` is fine, but the handler is *delta-shaped* — each replay re-adds `affinity_delta` and `trust_delta`, and re-extends `knowledge_json`. **Silent corruption: scores drift up; knowledge facts duplicate.** Already called out in `chat/eventlog/log.py:39-46` as the canonical reason `append_and_apply` exists. |
 ### Trade-offs — why we are NOT switching every handler to `INSERT OR REPLACE`
 This is the part the audit is here to nail down before someone "fixes
 it" with a one-line s/`INSERT INTO`/`INSERT OR REPLACE INTO`/.
 1. **Autoincrement-id handlers (`containers`, `scenes`, `memories`,
   `meanwhile_digest_pending`)** — `INSERT OR REPLACE` doesn't help.
   Each event's payload doesn't carry the row's eventual id — the id
   comes from `lastrowid` *at projection time*. There is no key for
   `OR REPLACE` to match on. The fix here is either (a) make the event
   carry a deterministic id derived from the event's own id (large
   refactor — payload schemas, downstream FK lookups, FTS rowid
   alignment), or (b) keep the handler raw-INSERT and ensure every
   live path uses `append_and_apply` (the path we're on). We are on
   path (b), and this audit makes it explicit.
 2. **`chat_created`** — `chats.id` IS keyed on the natural PK, so
   `INSERT OR REPLACE INTO chats ...` would technically work for the
   chat row. *But* it would silently overwrite `chat_state` columns
   that other events legitimately mutate later: `chat_state.time` is
   bumped by `time_skip_elision`, `active_scene_id` is set/cleared by
   `scene_opened`/`scene_closed`. On replay the
   `chat_created` overwrite would clobber those subsequent updates,
   then later events would re-set them — *if* the events themselves
   appear in order (they do today). It would work in practice, but it
   would erase the invariant that "each handler is responsible for one
   table-shape change" and make the projector's correctness depend on
   strict event-order replay through `chat_state`. Not worth the
   subtle coupling; keep the raw INSERT and treat replay as an
   explicit "wipe + replay" operation (the rewind path does exactly
   that).
 3. **`meanwhile_scene_started`** — could be made idempotent (the
   payload supplies `scene_id`), but it shares the `scenes` table with
   `_apply_scene_opened` (autoincrement) — making one half of the
   table writers `OR REPLACE` and the other half raw-INSERT is asking
   for a future bug. Keep both raw, lean on `append_and_apply`.
 4. **`edge_update`** — fundamentally cannot be made idempotent under
   replay without either changing the event schema (carry absolute
   values, not deltas) or recording per-event-id "already applied"
   flags. Either is a multi-week project. The current contract is
   "edge_update is a delta event; never apply it twice"; the
   `append_and_apply` rule enforces that contract from the call site.
 **Conclusion:** the handler layer is *correctly* non-idempotent for
 event-sourcing semantics. The defect class lives in the *caller* layer
 (routes that mistakenly call `project()` instead of `append_and_apply`).
 This audit fixes the two known offenders and pins the contract with a
 regression test (see Step 3).
 ---
 ## Step 3 — regression test
 Added `tests/test_chat_created_non_idempotent.py`. The test:
 1. Opens a fresh DB and runs the migration chain.
 2. Appends one `chat_created` event and projects — first projection
   succeeds.
 3. Appends a *second* `chat_created` for the same chat id and projects
   again — asserts that the second projection raises
   `sqlite3.IntegrityError`.
 The point isn't that the test catches a future "make it idempotent"
 change automatically; it's that any such change MUST update this test,
 forcing a deliberate review of all the trade-offs documented above.
 ---
 ## Files changed
 - `chat/web/bots.py` — swap `append_event`+`project` → `append_and_apply`,
  drop unused imports.
 - `chat/web/settings.py` — same swap.
 - `tests/test_chat_created_non_idempotent.py` — new regression test.
 - `docs/audits/2026-04-27-project-callers.md` — this file.
@@ -522,6 +522,8 @@ Written per witness when a scene closes. Different details, different interpreta
 **Status: shipped 2026-04-27** (T88–T102, 15 tasks across 8 waves; +70 tests). See "Phase 4 status" in CLAUDE.md for the per-task breakdown. Vector retrieval shipped via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions); branching is data-model + drawer UI; significance review, hide-from-view soft delete, surgical delete with cascade preview, snapshot UX, and cross-chat search all surface from the drawer or top-bar.
 **Phase 4.5 cleanup: shipped 2026-04-27** (T103–T118, 13 of 14 planned tasks; T115 sqlite-vec swap deferred to Phase 5 due to host Python lacking `enable_load_extension`; +~44 tests; schema baseline now 14). See "Phase 4.5 status" in CLAUDE.md for the per-task breakdown — notable shipped: real embedding model swap path (`LLMClient.embed()` + `--re-embed-all`), branching read-side filter (`active_branch_event_ids`), regenerate lifecycle rollback (`event_status_reverted`), FTS snippet highlighting + deep-link to turn (`memories.event_id`), bulk significance re-rate.
 - Vector retrieval (sqlite-vss or sqlite-vec).
 - Branching UI.
 - Drawer-edit on every field.
@@ -0,0 +1,724 @@
 # Roleplay Engine — Phase 4.5 Cleanup Plan
 > **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for parallel waves.
 **Goal:** Burn down all 24 items in `CLAUDE.md` §"Phase 4.5 / 5 backlog". Mix of small defensive cleanups (most), three big features (real embedding model swap, branching read-side filter, lifecycle rollback in regenerate), one environment-dependent feature (sqlite-vec swap), and the long-deferred carry-overs (scene-close-on-cancel revisit, structured test-fixture builder).
 **Architecture:** No new architecture. Two new schema migrations (0014 schema polish, 0015 sqlite-vec virtual tables). New external dependency optional (`apsw` if Python rebuild isn't possible). All other changes are polish / refactor / observability.
 **Tech Stack:**
 - Existing — same as Phase 4.
 - **OPTIONAL:** rebuild Python with `--enable-loadable-sqlite-extensions` OR install `apsw` to enable T115 sqlite-vec swap. T115 is the only task that requires this; the other 13 tasks land without it. If neither is available, T115 is deferred to Phase 5.
 **Source-of-truth references:**
 - Backlog: [`CLAUDE.md`](../../CLAUDE.md) §"Phase 4.5 / 5 backlog" (24 items grouped by review source + deferred).
 - Phase 3.5 / Phase 2.5 cleanup plans (pattern reference): [2026-04-26-v3.5-phase3.5-cleanup.md](2026-04-26-v3.5-phase3.5-cleanup.md), [2026-04-26-v2.5-phase2.5-cleanup.md](2026-04-26-v2.5-phase2.5-cleanup.md).
 - Conventions: [`CLAUDE.md`](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 4 status".
 ---
 ## Pre-flight
 **Branch:** create `phase-4.5` from the latest `main`:
 ```bash
 git checkout main && git pull && git checkout -b phase-4.5
 ```
 **Schema baseline:** Phase 4 leaves the DB at version 13. Phase 4.5 adds two migrations: `0014_phase45_schema.sql` (T109) and `0015_vec0_virtual_tables.sql` (T115 — only lands if T115 ships). Final schema version: 14 or 15.
 **Optional pre-flight for T115 (sqlite-vec swap):**
 The host Python build needs `enable_load_extension`. Two options:
 1. **Rebuild Python** via pyenv with `PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" pyenv install 3.12.0 --force` and recreate the venv.
 2. **Add `apsw`** as a dependency and migrate `chat/db/connection.py` to use `apsw.Connection` (significant refactor — the entire codebase uses stdlib `sqlite3`).
 If neither is acceptable, **defer T115** to Phase 5 and ship Phase 4.5 with 13 tasks instead of 14. The other tasks are unaffected.
 **Pinned non-negotiables (carried forward):**
 - State changes go through the event log. Use `append_and_apply` for the live path.
 - Witness filter every memory read at SQL level.
 - TDD: every task starts with a failing test (or a regression test pinning existing contract before refactor).
 - One commit per task minimum. Bundled tasks split internally.
 **Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output.
 ---
 ## Backlog item → task mapping
 24 items consolidated into 14 tasks by **file ownership**:
 | # | Item | Source | Task |
 |---|------|--------|------|
 | 1 | `embeddings` FK lacks `ON DELETE CASCADE` | T88 | **T109** (schema migration) |
 | 2 | `list_branches(chat_id=...)` global-branch leak — document | T89 | **T103** |
 | 3 | Branch-switch silently leaves zero active — log warning | T89 | **T103** |
 | 4 | Real embedding model swap | T91 / deferred | **T112** |
 | 5 | `timeout_s` fallback-path logging | T91 | **T107** |
 | 6 | Duplicate `MAX(id)` lookup in retrieval ranking | T96 | **T104** |
 | 7 | `fts_rank=None` for vector-only rows — document | T96 | **T104** |
 | 8 | `event_id <= 0` guard in `delete_turn` | T98 | **T110** |
 | 9 | `html.escape()` on delete-impact modal output | T98 | **T110** |
 | 10 | Extract delete-impact modal to Jinja partial | T98 | **T110** |
 | 11 | Hoist `datetime`/`timezone` imports in `snapshots.py` | T99 | **T105** |
 | 12 | Strict `kind` validation in snapshot routes | T99 | **T105** |
 | 13 | `created_at` from file mtime — document drift risk | T99 | **T105** |
 | 14 | Hardcoded `k=50` → module constant | T100 | **T106** |
 | 15 | N+1 lookups in search results | T100 | **T106** |
 | 16 | FTS highlighting via `snippet()` | T100 | **T111** |
 | 17 | Result links chat-level only — add deep-link via memories.event_id | T100 | **T109** + **T111** |
 | 18 | sqlite-vec swap when host Python supports loadable extensions | deferred | **T115** |
 | 19 | Branching read-side filter (consult `is_active`) | deferred | **T113** |
 | 20 | Bulk significance re-rate in drawer | deferred | **T110** |
 | 21 | Vector index optimization (HNSW) | deferred | **T115** (post-ship note) |
 | 22 | Scene-close-on-cancel UX revisit | Phase 2.5 carry-over | **T108** |
 | 23 | Cross-feature canned-queue brittleness fixture builder | Phase 3 carry-over | **T116** |
 | 24 | Full lifecycle-rollback in regenerate | Phase 3.5 carry-over | **T114** |
 ---
 ## Parallel-Execution Strategy
 Same pattern as Phase 3.5 / Phase 2.5 / Phase 4. Nine waves: parallel within each wave (file-disjoint), serial across waves.
 ### How to dispatch a wave in parallel
 Use the **Agent tool with `isolation: "worktree"`**. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-4.5`.)
 ### After a wave completes
 1. Each subagent returns its worktree path and commit SHA(s).
 2. **Run a spec + code-quality reviewer subagent on each completed task.** Combined review acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
 3. **Merge the wave into `phase-4.5`** in any order (file-disjointness guarantees no conflict). Use `--no-ff`.
 4. **Run the full test suite** on the merged `phase-4.5`.
 5. **Push `phase-4.5`** to gitea.
 6. Optionally clean up worktrees.
 ### Conflict prevention checklist
 For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. Hot files in this plan (each owned by exactly one task): `chat/state/memory.py`, `chat/web/drawer.py`, `chat/web/search.py`, `chat/services/regenerate.py`, `chat/services/turn_common.py`, `chat/services/embeddings.py`, `chat/db/migrations/`.
 ### Why each wave is parallel-safe
 | Wave | Tasks | Hot files | Disjoint? |
 |------|-------|-----------|-----------|
 | 1 | T103, T104, T105, T106, T107, T108 | 6 different files; no overlap | ✅ |
 | 2 | T109 | new migration + minor projector update | (single task) |
 | 3 | T110 | `chat/web/drawer.py` (bundle) | (single task) |
 | 4 | T111 | `chat/services/cross_chat_search.py` + `chat/web/search.py` + template | (single task; depends on T109) |
 | 5 | T112 | `chat/services/embeddings.py` + `chat/llm/*.py` (Protocol + Featherless + Mock) | (single task) |
 | 6 | T113 | `chat/services/turn_common.py` + multiple readers (cross-cutting) | (single task) |
 | 7 | T114 | `chat/services/regenerate.py` + projector handler | (single task) |
 | 8 | T115 | new migration + `chat/services/vector_search.py` + `chat/db/connection.py` | (single task; environmental) |
 | 9 | T116, T117, T118 | new test fixture file (T116); new test file (T117); CLAUDE.md (T118) | ✅ |
 ---
 ## Task overview
 ```
 Wave 1 ─┬─ T103: branches polish (global-branch doc + branch-switch warning)
        ├─ T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)
        ├─ T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)
        ├─ T106: search.py polish (k constant + N+1 batched lookups)
        ├─ T107: embeddings.py timeout_s fallback-path logging
        └─ T108: scene-close-on-cancel UX revisit (pin behavior with regression test)
 Wave 2 ─── T109: 0014 schema migration (FK CASCADE + memories.event_id column)
 Wave 3 ─── T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)
 Wave 4 ─── T111: search UX enhancements (FTS snippet() highlighting + deep-link via memories.event_id)
 Wave 5 ─── T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + generate_embedding routing + backfill)
 Wave 6 ─── T113: branching read-side filter (event readers consult is_active branch range)
 Wave 7 ─── T114: regenerate lifecycle rollback (back-reference field + compensating events on supersede)
 Wave 8 ─── T115: sqlite-vec swap (vec0 virtual tables + MATCH-based vector_search) [ENVIRONMENTAL — see pre-flight]
 Wave 9 ─┬─ T116: structured test-fixture builder (canned-queue brittleness)
        ├─ T117: Phase 4.5 cross-feature integration tests
        └─ T118: docs sweep — Phase 4.5 status, prune backlog, capture Phase 5 residuals
 ```
 Critical path: 9 sequential merge points. Total tasks: 14 (or 13 if T115 deferred). Parallelism: Waves 1 (6-way) and 9 (3-way) dispatch concurrently. Waves 2–8 are single-task by hot-file constraint.
 ---
 ## Wave 1 — Independent small fixes (parallel, 6 tasks)
 All trivial, file-disjoint. Each is 1-line + 1-test or similar.
 ### Task 103: branches polish
 **Files:**
 - Modify: `chat/state/branches.py`
 - Modify: `tests/test_branches_state.py`
 **Spec (2 sub-fixes, single commit):**
 1. **Document global-branch leak**: `list_branches(chat_id=...)` filter `chat_id = ? OR chat_id IS NULL` returns global/null-chat branches (like "main") in every chat scope. Add a docstring note explaining this is intentional ("main" is global by design; per-chat branches are scoped).
 2. **Warn on branch-switch to nonexistent name**: in `_apply_branch_switched`, before the SQL UPDATE, check if a branch with the given name exists. If not, emit `logging.getLogger(__name__).warning(...)` rather than silently leaving zero active branches.
 **Test:** `test_branch_switched_unknown_name_warns` — capture log via `caplog`, append `branch_switched` for nonexistent name, assert warning message + no active branch (existing behavior preserved, just observable).
 **Commit:** `chore: branches polish — global-leak docs + unknown-name warning (T103)`.
 ---
 ### Task 104: state/memory.py polish
 **Files:**
 - Modify: `chat/state/memory.py`
 - Modify: `tests/test_memory_search.py` (no new tests; just add docstring assertions if needed)
 **Spec (2 sub-fixes):**
 1. **DRY `MAX(id)` lookup**: `_composite_rerank` (Phase 3.5 T57) and `_rrf_fuse_and_rerank` (Phase 4 T96) both query `SELECT MAX(id) FROM event_log` for the recency boost. Extract a `_max_event_id(conn)` helper.
 2. **`fts_rank=None` documentation**: search_memories docstring should note that vector-only rows have `fts_rank=None`. Downstream consumers must accept None (they currently do, but contract is implicit).
 **Test:** existing tests cover both via the public API; no new test needed unless docstring assertion is desired.
 **Commit:** `chore: memory.py DRY MAX(id) helper + document fts_rank=None contract (T104)`.
 ---
 ### Task 105: snapshots.py polish
 **Files:**
 - Modify: `chat/web/snapshots.py`
 - Modify: `tests/test_snapshot_ux.py` (1 new test)
 **Spec (3 sub-fixes):**
 1. **Hoist `datetime`/`timezone` imports** to module level (currently inside `_list_all_snapshots`).
 2. **Strict `kind` validation in restore/preview routes**: currently `kind` defaults to `"periodic"`. If a rewind snapshot is requested without explicit `kind`, the lookup silently 404s. Reject missing `kind` with a 400 instead of silently defaulting.
 3. **Document `created_at` mtime drift risk** in module docstring: snapshot timestamps come from file mtime, not the encoded filename timestamp. Files copied via `cp -p` preserve mtime; `cp` without `-p` resets it. Add a one-line note.
 **Test:** `test_restore_without_kind_returns_400` — POST `/snapshots/restore/<id>` without `kind`; assert 400.
 **Commit:** `chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105)`.
 ---
 ### Task 106: search.py polish
 **Files:**
 - Modify: `chat/web/search.py`
 - Modify: `tests/test_search_ux.py` (1 new test)
 **Spec (2 sub-fixes):**
 1. **Hardcoded `k=50` → module constant**: extract `DEFAULT_SEARCH_K = 50` at module level. Tunable without code change at the call site.
 2. **N+1 lookup batching**: GET `/search?q=...` currently calls `get_bot(conn, owner_id)`, `get_chat(conn, chat_id)`, `get_scene(conn, scene_id)` per result row (worst case 50×3 = 150 individual queries). Batch via `WHERE id IN (...)` queries: collect distinct ids first, fetch in 3 batched queries, then map back per row.
 **Test:** `test_search_results_use_batched_lookups` — mock `get_bot`/`get_chat`/`get_scene` and assert each is called once (not per row). OR easier: time the search with 50 results and assert it doesn't degrade linearly with `k`.
 **Commit:** `perf: search.py N+1 batching + k constant extraction (T106)`.
 ---
 ### Task 107: embeddings.py timeout_s fallback-path logging
 **Files:**
 - Modify: `chat/services/embeddings.py`
 - Modify: `tests/test_embeddings.py` (1 new test)
 **Spec:**
 When `model != DEFAULT_EMBEDDING_MODEL` and falls through to fallback (zero-vector with model="fallback"), log a `warning` so misconfigured callers (e.g., a Phase 4.5+ caller pointing at a real model that doesn't exist) don't silently degrade.
 ```python
 if model != DEFAULT_EMBEDDING_MODEL:
    _log.warning(
        "generate_embedding: non-default model %r returned fallback "
        "(model client.embed() not yet implemented in Phase 4.5+); "
        "downstream search will degrade silently. Configure a supported model.",
        model,
    )
    return EmbeddingResult(...)  # fallback
 ```
 The Phase 4 default path (`model == DEFAULT_EMBEDDING_MODEL` → pseudo-embedding) is silent; only non-default models trigger the warning.
 **Test:** `test_generate_embedding_non_default_model_logs_warning` — call with `model="real-model"`; capture log via `caplog`; assert the warning message appears.
 **Commit:** `chore: embeddings.py warns on fallback for non-default models (T107)`.
 ---
 ### Task 108: scene-close-on-cancel UX revisit
 **Files:**
 - Modify: `tests/test_turn_flow.py` (extend the existing pin test added in Phase 2.5 T74.3 OR add a new one)
 - Optionally modify: `chat/web/turns.py` if a real bug surfaces during investigation
 **Spec:**
 This carry-over has been pending since Phase 2.5 T74.3. The pinned behavior: scene close fires even when the primary turn is cancelled mid-stream, because `detect_scene_close` consults user prose (fully present at cancel time), not bot output.
 **Action:**
 1. **Re-investigate** by reading the post_turn cancellation path. Confirm the rationale still holds (it should — nothing about the close-detection logic changed in Phase 3 or 4).
 2. **Strengthen the regression test** in `tests/test_turn_flow.py` (the existing `test_cancelled_turn_still_closes_scene_when_user_prose_signals_close`). Add an assertion that the user prose IS present at the moment scene_close_decision fires (even though the bot output isn't).
 3. If investigation surfaces an actual UX issue (e.g., the close fires too eagerly on prose like "fade out... actually wait"), this becomes a real fix — but default action is documentation-only.
 **Default outcome:** add a docstring comment to the post_turn close-detection branch explaining the rationale. No behavioral change.
 **Test (extend existing):** assert ordering — `scene_closed` event lands AFTER the user_turn event but BEFORE any potential assistant_turn (which is cancelled). Pin the contract.
 **Commit:** `chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)`.
 ---
 ## Wave 2 — Schema migration (single)
 ### Task 109: 0014 schema migration
 **Files:**
 - Create: `chat/db/migrations/0014_phase45_schema.sql`
 - Modify: `chat/state/memory.py` or `chat/services/memory_write.py` (populate the new `event_id` column on memory_written)
 - Modify: `tests/test_world.py` (bump schema_version assertion to 14)
 - Modify: `tests/test_memory_write.py` (assert event_id populated)
 **Spec:**
 Two schema changes bundled into a single migration:
 1. **`embeddings.memory_id` FK gets `ON DELETE CASCADE`** (T88 review nit). SQLite doesn't support `ALTER TABLE ... ALTER COLUMN`, so the standard pattern is: rename old table, create new, copy data, drop old, recreate indices. Alternatively, since this is a new-ish table (Phase 4 added it) and the change is purely defensive, document as "WONTFIX in 4.5; deindex events remain the only deletion path; ON DELETE CASCADE remains a Phase 5 candidate when we do a broader migration cleanup". Choose pragmatically.
 2. **Add `memories.event_id INTEGER` column** (NULL allowed for backward compat) referencing `event_log.id`. This is the foundation for T111's deep-linking from cross-chat search results to specific turns. Migration adds the column; the projector for `memory_written` populates it from the event id when projecting.
 **Production code change:** in the `memory_written` projector handler (in `chat/state/memory.py` or wherever it lives), populate the new `event_id` column with the projecting event's `id`. The `Event` object has `id` available in the projector context.
 **Tests:**
 1. `test_schema_version_after_migration_is_14` (rename + bump from 13).
 2. `test_memory_written_populates_event_id` — append memory_written; project; query memories table; assert `event_id` is the projecting event's id.
 3. (Backward compat) older memories from existing seed data have NULL `event_id` — the column is nullable.
 **Commit:** `feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109)`.
 ---
 ## Wave 3 — Drawer Phase 4.5 bundle (single)
 ### Task 110: drawer polish + bulk significance re-rate
 **Files:**
 - Modify: `chat/web/drawer.py`
 - Modify: `chat/templates/_drawer.html`
 - Create: `chat/templates/_delete_impact_modal.html` (extracted partial)
 - Modify: `chat/state/manual_edit.py` (potentially — if bulk re-rate emits a new manual_edit kind)
 - Modify: `tests/test_drawer_phase4.py` (extend with 4-5 new tests)
 **Spec (4 sub-fixes, 4 commits):**
 1. **`event_id <= 0` guard in `delete_turn`** (T98 nit): currently silently rewinds everything if `event_id` is 0. Add `if event_id <= 0: raise HTTPException(400, "...")`.
 2. **`html.escape()` on delete-impact modal** (T98 nit): the rendered HTML in `compute_delete_impact` output is built via raw f-strings from model-controlled strings. Wrap user-controllable fields with `html.escape()`. Defense-in-depth — currently safe, but if event payload fields ever appear in descriptions, autoescape would prevent XSS.
 3. **Extract delete-impact modal HTML to a Jinja partial**: create `chat/templates/_delete_impact_modal.html`; render via `templates.TemplateResponse(...)` instead of f-string concatenation. Inherits Jinja2 autoescape automatically. Tests use the existing TestClient pattern.
 4. **Bulk significance re-rate** (T98.2 deferral): drawer panel showing memory significance distribution per chat. New POST route `/chats/{chat_id}/drawer/memory/significance/bulk` accepting `{level_from, level_to}` form fields. Updates ALL memories in the chat at `level_from` to `level_to` via a sequence of `manual_edit` events (one per memory — preserves the audit trail).
 **Tests:**
 1. `test_delete_turn_with_event_id_zero_returns_400`.
 2. `test_delete_impact_modal_uses_jinja_partial` (assert response renders the partial template; verify with `assert b"<div class=\"delete-impact-modal\">" in response.content` or similar).
 3. `test_delete_impact_modal_escapes_user_controllable_strings` — seed an event with a payload containing `<script>` in a description-bound field; render preview; assert it appears HTML-escaped.
 4. `test_bulk_significance_re_rate_emits_manual_edit_per_memory` — seed 5 memories at significance 0; bulk re-rate to 2; assert 5 `manual_edit` events landed.
 **Commits (4):**
 - `fix: drawer delete_turn guards event_id <= 0 (T110.1)`
 - `fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)`
 - `refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)`
 - `feat: drawer bulk significance re-rate per chat (T110.4)`
 ---
 ## Wave 4 — Search UX enhancements (single)
 ### Task 111: FTS highlighting + deep-link to turn
 **Files:**
 - Modify: `chat/services/cross_chat_search.py`
 - Modify: `chat/web/search.py`
 - Modify: `chat/templates/search.html`
 - Modify: `tests/test_search_ux.py`
 **Spec (2 sub-fixes, 2 commits):**
 1. **FTS highlighting via `snippet()`** (T100 nit): replace the `pov_summary` column in `search_all_memories`'s SELECT with `snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)` to return a highlighted snippet around the match. The template renders this raw via `|safe` (the snippet is built by SQLite from indexed content; the `<mark>` tags are the only HTML, and SQLite escapes any HTML special chars in the source content).
 2. **Deep-link to turn via memories.event_id** (T100 nit + T109 dependency): now that `memories.event_id` exists (from T109), each search result row knows the originating event id. The chat page uses turn-id stamping (Phase 3.5 T86 added `id="turn-{event_id}"`). Build result links as `/chats/{chat_id}#turn-{event_id}`. The chat page DOM scrolls to the anchor on load (browser default).
 **Tests:**
 1. `test_search_results_include_fts_snippet_with_highlight` — seed memory with text containing "rabbit"; search for "rabbit"; assert response body contains `<mark>rabbit</mark>` (or whatever marker the snippet uses).
 2. `test_search_result_link_includes_turn_anchor` — seed memory with known event_id; search; assert link href contains `#turn-{event_id}`.
 **Commits (2):**
 - `feat: cross-chat search FTS snippet highlighting (T111.1)`
 - `feat: cross-chat search deep-links to turn via memories.event_id (T111.2)`
 ---
 ## Wave 5 — Real embedding model (single)
 ### Task 112: Real embedding model swap
 **Files:**
 - Modify: `chat/llm/client.py` (Protocol — add `embed(text, model) -> list[float]` method)
 - Modify: `chat/llm/featherless.py` (FeatherlessClient — implement `embed` against Featherless `/v1/embeddings` endpoint OR equivalent)
 - Modify: `chat/llm/mock.py` (MockLLMClient — accept canned embedding vectors)
 - Modify: `chat/services/embeddings.py` (route non-default model through `client.embed()`)
 - Modify: `chat/config.py` (add `embedding_model: str` setting; default to current pseudo)
 - Modify: `scripts/backfill_embeddings.py` (re-embed-all option for model swaps)
 - Modify: `tests/test_embeddings.py` + `tests/test_llm_mock.py` + `tests/test_featherless.py` (if exists)
 **Spec:**
 Phase 4 ships a deterministic SHA-256 pseudo-embedding (deterministic but semantically meaningless). T112 wires the path for a real embedding model.
 **Steps:**
 1. **Extend `LLMClient` Protocol** with `async def embed(self, text: str, *, model: str) -> list[float]`.
 2. **Implement on FeatherlessClient**: call the Featherless OpenAI-compatible `/v1/embeddings` endpoint:
   ```python
   response = await self._http.post(
       "/v1/embeddings",
       json={"model": model, "input": text},
       headers={"Authorization": f"Bearer {self._api_key}"},
   )
   data = response.json()
   return data["data"][0]["embedding"]
   ```
   Handle rate limits (existing 2-conn semaphore covers this).
 3. **Implement on MockLLMClient**: `embed` pops a canned vector from a new `canned_embeddings` queue. Tests configure this queue.
 4. **Update `generate_embedding`**: when `model != DEFAULT_EMBEDDING_MODEL`, call `client.embed(text, model=model)` instead of falling through to fallback. Wrap in try/except — failures fall back to zero vector (existing fallback path).
 5. **Settings**: add `embedding_model: str = "pseudo-sha256-384"` to `Settings`. App reads this at startup; the embedding worker (`chat/services/embedding_worker.py`) passes it through.
 6. **Backfill script**: add `--re-embed-all` flag that walks ALL memories (regardless of existing `embeddings_meta` rows) and re-embeds with the configured model. Useful for swapping models.
 **Tests:**
 1. `test_embed_routes_to_client_when_non_default_model` — mock client with canned vector; call `generate_embedding(model="bge-small-en-v1.5")`; assert vector matches the canned response.
 2. `test_embed_falls_back_on_client_failure` — mock client to raise; assert returns zero vector with model="fallback".
 3. `test_mock_llm_client_embed_pops_canned`.
 4. `test_featherless_embed_calls_correct_endpoint` (if there's an existing featherless test pattern; otherwise mock the HTTP layer).
 **Commits:**
 - `feat: LLMClient Protocol gains embed() method (T112.1)`
 - `feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)`
 - `feat: generate_embedding routes non-default models through client.embed (T112.3)`
 - `feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)`
 ---
 ## Wave 6 — Branching read-side filter (single, BIG)
 ### Task 113: Branching read-side filter
 **Files (cross-cutting):**
 - Modify: `chat/services/turn_common.py::read_recent_dialogue` — filter events to active branch's range
 - Modify: `chat/services/scene_summarize.py::_read_recent_dialogue` (similar)
 - Modify: `chat/state/memory.py::search_memories` — memories should be filtered to active branch (memories.event_id from T109 enables this)
 - Modify: `chat/state/branches.py` — add helper `active_branch_event_ids(conn) -> tuple[int, int]` returning (origin, head)
 - Add tests across multiple files
 - Modify: `tests/test_branching.py` — add cross-feature tests
 **Spec:**
 Phase 4 T89 + T94 shipped branching as metadata-only (the table tracks branches; the drawer UI can switch). But event readers DON'T consult `is_active` — they read the entire event_log. So switching branches has no functional effect.
 T113 wires the filter:
 1. **Helper** `active_branch_event_ids(conn) -> tuple[int, int]`: returns `(origin_event_id, head_event_id)` for the currently active branch. For "main" with origin=0 + head=N, returns `(0, N)` meaning "all events visible".
 2. **Apply filter** in every event reader that returns historical state:
   - `read_recent_dialogue`: WHERE clause adds `id BETWEEN ? AND ?` (the active branch's range).
   - `search_memories`: WHERE clause adds `m.event_id BETWEEN ? AND ?` (uses T109's column).
   - `scene_summarize._read_recent_dialogue`: same as turn_common.
   - Other readers TBD — grep for `event_log` SELECT patterns and audit each one.
 3. **Branches that diverge**: when branch B is created from event 10 and then accumulates events 11-15 (which only exist on B's timeline), but main also accumulates 11-12, the events overlap by id range. This is OK because event reads filter by `id <= active_branch.head_event_id`. The simpler model: branches share event_log ids globally, but each branch's "head" defines which ids are visible.
 4. **Events written under branch B** carry an implicit branch tag — but the event_log table has no `branch_id` column today. T113 punts on cross-branch event writes (they all land in the global log) and relies on the `head_event_id` filter to scope reads. This is a Phase 4.5+ first cut; full branch-isolated event_log is Phase 5+.
 **Edge cases:**
 - Active branch has `head_event_id = 0` (just created): readers return empty.
 - No active branch: readers fall through to "all events visible" (defensive).
 - Switching branches mid-flight: each `read_recent_dialogue` call re-queries `active_branch`, so it's always current. No caching.
 **Tests:** 5+ minimum.
 1. `test_read_recent_dialogue_respects_active_branch_head` — seed 10 events; active branch head = 5; assert only first 5 returned.
 2. `test_search_memories_respects_active_branch_head` — same.
 3. `test_branch_switch_changes_visible_events` — switch branches; immediately read; assert different result sets.
 4. `test_main_branch_with_head_zero_returns_empty` — defensive.
 5. `test_no_active_branch_falls_through_to_all_events` — defensive.
 **Commit:** `feat: branching read-side filter — event readers consult active branch range (T113)`.
 **This is the largest task in Phase 4.5.** Estimate 200-400 lines across multiple files. Implementer should split commits if it helps clarity (one per affected reader).
 ---
 ## Wave 7 — Lifecycle rollback in regenerate (single)
 ### Task 114: Lifecycle rollback
 **Files:**
 - Modify: `chat/services/regenerate.py`
 - Modify: `chat/db/migrations/0014_phase45_schema.sql` (T109's migration) — add column? OR
 - Add new migration — see decision below
 - Modify: tests in `tests/test_regenerate.py`
 **Spec:**
 Phase 3.5 T83.4 shipped a warning log when regenerate detects un-rolled-back lifecycle transitions. T114 implements actual rollback.
 **Schema decision:**
 Option A: extend lifecycle event payloads with `triggered_by_assistant_turn_id` (no schema change needed — just a payload convention). Production code (T61 turn flow) populates it when emitting `event_started`/`event_completed`/`event_cancelled`. Existing rows have NULL — rollback skips them with a debug log.
 Option B: add a column to `event_log` for stronger invariants. Significant migration cost.
 **Recommended:** Option A. Safer, no migration, backward compatible (older events skip rollback). Document in commit body.
 **Rollback semantics:**
 When regenerate detects lifecycle events triggered by the superseded turn:
 - `event_started` → emit `event_cancelled` (or a NEW `event_started_undone` event kind that reverts status to "planned") with the same event_id.
 - `event_completed` → emit `event_uncompleted` (NEW event kind that reverts status from "completed" to "active").
 - `event_cancelled` → emit `event_uncancelled` (reverts to prior status — which we'd need to track; or simpler: emit `event_started` again to restore "active").
 **Simpler approach (recommended):** add ONE new event kind `event_status_reverted` with payload `{event_id, prior_status}`. The projector sets `events.status = prior_status` for the event_id. Rollback emits this event for each affected lifecycle transition, looking up the prior status from the row's history (via event_log scan) or accepting it as a payload field.
 **Production code change:** in `chat/web/turns.py::post_turn` (and `chat/services/regenerate.py`), when emitting `event_started`/`event_completed`/`event_cancelled`, populate `triggered_by_assistant_turn_id: <id>` in the payload. Forward-only — older code doesn't need updating.
 **Tests:** 3 minimum.
 1. `test_regenerate_rolls_back_event_started_from_superseded_turn` — seed an event; play a turn that starts it; regenerate; assert `event_status_reverted` event landed with `prior_status="planned"` and the events row is back to "planned".
 2. `test_regenerate_rolls_back_event_completed_to_active` — same but completed → active rollback.
 3. `test_regenerate_skips_events_without_back_reference` — older events without `triggered_by_assistant_turn_id` are not rolled back (debug log). Pin the backward-compat behavior.
 **Commits:**
 - `feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)`
 - `feat: event_status_reverted event kind + projector handler (T114.2)`
 - `feat: regenerate rolls back lifecycle transitions on supersede (T114.3)`
 ---
 ## Wave 8 — sqlite-vec swap (single, ENVIRONMENTAL)
 ### Task 115: sqlite-vec swap (optional)
 **Files:**
 - Create: `chat/db/migrations/0015_vec0_virtual_tables.sql`
 - Modify: `chat/db/connection.py` (load extension on every connection)
 - Modify: `chat/services/vector_search.py` (rewrite to use vec0 MATCH instead of pure-Python cosine)
 - Modify: `chat/state/embeddings.py` (writer needs to populate vec0 table)
 - Modify: `pyproject.toml` (add `sqlite-vec` dependency)
 **Pre-flight:**
 This task REQUIRES one of:
 - Python rebuilt with `--enable-loadable-sqlite-extensions` (pyenv reinstall).
 - `apsw` migration of `chat/db/connection.py`.
 If neither is feasible at the time of execution: SKIP THIS TASK and document the deferral in T118 docs sweep. The other 13 Phase 4.5 tasks ship without it.
 **Spec:**
 1. **Migration** `0015_vec0_virtual_tables.sql`:
   ```sql
   CREATE VIRTUAL TABLE embeddings_vec USING vec0(
       memory_id INTEGER PRIMARY KEY,
       embedding FLOAT[384]
   );
   -- Backfill from existing JSON embeddings table.
   INSERT INTO embeddings_vec (memory_id, embedding)
   SELECT memory_id, vec_f32(vector_json) FROM embeddings;
   ```
 2. **`chat/db/connection.py`** loads `sqlite_vec` extension on every connection:
   ```python
   import sqlite_vec
   def open_db(...):
       conn = sqlite3.connect(...)
       conn.enable_load_extension(True)
       sqlite_vec.load(conn)
       conn.enable_load_extension(False)
       ...
   ```
 3. **Rewrite `vector_search.py`** to use `embeddings_vec MATCH ?` syntax with `k=?` clause:
   ```sql
   SELECT m.id, m.pov_summary, m.significance, e.distance
   FROM embeddings_vec e
   JOIN memories m ON m.id = e.memory_id
   WHERE e.embedding MATCH ? AND k = ?
     AND m.owner_id = ?
     AND m.witness_<role> = 1
   ORDER BY e.distance ASC
   LIMIT ?
   ```
 4. **HNSW note**: vec0 supports both flat (default) and HNSW indexes. T115 ships flat (sufficient for < few thousand memories). Document HNSW upgrade path in CLAUDE.md if memory counts ever grow past pure-Python feasibility.
 5. **Old `embeddings` JSON table**: keep alongside `embeddings_vec` (data redundancy is fine; the JSON table is the source of truth and `embeddings_vec` is the index). Backfill on migration. Keep the `embedding_indexed` projector populating both.
 **Tests:** rewrite `tests/test_vector_search.py` to expect new behavior. Same observable contract — only implementation changes. All 5 existing tests should pass post-swap.
 **Commit:** `feat: sqlite-vec swap (vec0 virtual tables + MATCH-based search) (T115)`.
 ---
 ## Wave 9 — Polish (parallel, 3 tasks)
 ### Task 116: Structured test-fixture builder
 **Files:**
 - Create: `tests/fixtures.py` (or extend `tests/conftest.py`)
 - Modify: existing test files that use brittle canned-queue arrays (selectively)
 **Spec:**
 Phase 3 carry-over. Tests across `test_turn_flow.py`, `test_meanwhile_turn_flow.py`, `test_phase3_integration.py`, `test_phase4_integration.py` use positional canned-response arrays for `MockLLMClient`. Adding a new classifier call to a code path requires updating canned arrays in many tests.
 **Solution:** structured fixture builder that lets tests declare their classifier expectations by name, not position:
 ```python
 # tests/fixtures.py
 class CannedQueue:
    def __init__(self):
        self._queue = []
    def parse_turn(self, **fields): ...
    def state_update(self, **fields): ...
    def detect_scene_close(self, should_close: bool): ...
    def detect_event_transitions(self, transitions: list[dict]): ...
    def summarize_scene(self, summary: str, **fields): ...
    def detect_threads(self, candidates: list[dict]): ...
    # ... one method per classifier service
    def build(self) -> list[str]:
        return [json.dumps(item) for item in self._queue]
 ```
 Usage:
 ```python
 def test_post_turn_with_event_transition(...):
    canned = (
        CannedQueue()
            .parse_turn(intent="narrative")
            .narrative("BotA speaks.")  # narrative is a stream, but for simplicity treat it like a canned response
            .state_update(affinity_delta=0, trust_delta=0)
            .state_update(affinity_delta=0, trust_delta=0)
            .detect_event_transitions([{"event_id": "evt_1", "new_status": "completed"}])
            .detect_scene_close(should_close=False)
            .build()
    )
    mock = MockLLMClient(canned=canned)
    # ...
 ```
 **Migration scope:** don't migrate ALL existing tests at once — that's a separate massive refactor. Instead, ship the fixture builder + migrate 2-3 representative tests as proof of concept. Document the migration path in the fixture's docstring.
 **Tests:** the fixture builder itself doesn't need extensive testing — it's just a builder. Add 1-2 sanity tests that the JSON output matches expected shapes.
 **Commit:** `test: structured CannedQueue fixture builder for classifier mocks (T116)`.
 ---
 ### Task 117: Phase 4.5 cross-feature integration tests
 **Files:**
 - Create: `tests/test_phase45_integration.py`
 **Spec:**
 End-to-end multi-feature flows specific to Phase 4.5 changes. 5 tests minimum.
 1. **Real embedding swap + retrieval** — configure `embedding_model="bge-small-en-v1.5"` (mocked); write a memory; backfill or wait for worker; assert vector search returns the memory via `client.embed`-derived vector (not pseudo).
 2. **Branching read-side filter end-to-end** — create a branch from turn 5; switch; play 3 turns on the branch; switch back to main; assert main's recent dialogue is missing the branch turns (read filter respects active branch's head).
 3. **Lifecycle rollback** — start an event via a turn; regenerate that turn; assert lifecycle reverted (event back to "planned").
 4. **Search deep-link** — write memories; search; click a result; verify the chat page renders with the right turn anchored (assert via TestClient response — either the browser anchor OR a server-side scroll-to-anchor mechanism).
 5. **Bulk significance re-rate end-to-end** — seed 5 memories at significance 0; bulk re-rate via drawer; verify significance histogram updates.
 **Commit:** `test: phase 4.5 cross-feature integration coverage (T117)`.
 ---
 ### Task 118: Phase 4.5 documentation update
 **Files:**
 - Modify: `CLAUDE.md`
 - Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 Phase 4 entries — though they're already shipped per Phase 4 T102)
 **Spec:**
 Mirror the Phase 3.5 / 2.5 status sections. Document:
 - All shipped items per task (T103–T117).
 - Empty out the Phase 4.5 / 5 backlog (replace with single "All items shipped" line).
 - Add new "Phase 5 backlog" section if any Phase 4.5 reviews surfaced new follow-ups.
 **Phase 5 backlog candidates** (default, if no new follow-ups discovered):
 - Vector index optimization (HNSW) when memory counts grow past flat-index feasibility.
 - Branch-isolated event_log (each branch has its own physical event_log range vs the current shared id space + head filter).
 - Embedding model swap migration tooling — when changing models, need to re-embed everything; T112 added `--re-embed-all` but a more orchestrated swap (drain old worker, re-seed all memories, swap config) is Phase 5+.
 - Real-time collaborative branching (multi-user) — out of scope for v1.
 - Avatars / portraits (multimodality) — deferred indefinitely per design §14.
 **Commit:** `docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118)`.
 ---
 ## Wrap-up
 After Wave 9 lands:
 1. **Run full suite** on `phase-4.5`: should be ~430+ tests passing (413 from Phase 4 + ~20 new across Phase 4.5).
 2. **Manual smoke** (recommended before opening the PR):
   - Configure `embedding_model="bge-small-en-v1.5"` (or whatever real model is chosen); restart server; play a turn; verify `embedding_indexed` events use the real model and search returns semantically-relevant memories.
   - Create a branch, switch, play turns, switch back — verify main's history is unaffected.
   - Plan an event, complete it via a turn, regenerate that turn — verify event reverts to "planned".
   - Use cross-chat search; click a result; verify it lands on the right turn in the chat page.
   - Bulk re-rate a chat's significance distribution.
 3. **Push `phase-4.5`** to gitea.
 4. **Open PR** `phase-4.5 → main`.
 ---
 ## Notes for the controller running this plan
 - **T115 (sqlite-vec swap)** is environmental. If pre-flight fails (no rebuilt Python, no apsw), defer to Phase 5 and ship Phase 4.5 with 13 tasks. T118 docs sweep should note the deferral.
 - **T112 (real embedding swap)** assumes Featherless or similar exposes an `/v1/embeddings` endpoint. If not available, document the gap and ship the Protocol + Mock impl only (Featherless impl deferred). The pseudo path remains the default in that case — same as Phase 4.
 - **T113 (branching read-side filter)** is the riskiest task. Cross-cutting. Land it on a quiet branch, test thoroughly. If integration tests break in unexpected ways, bisect the affected reader and add coverage.
 - **After each parallel wave**, run a code-review subagent. Combined spec+quality acceptable for trivial tasks (T103–T108); separate spec + quality reviewers for big tasks (T112, T113, T114, T115).
 - **Token-spend rough estimate**: Phase 4.5 should be ~50% the size of Phase 4 (similar number of tasks, mostly smaller). Big tasks (T112, T113, T114) bring the per-task spend up but parallelism in Wave 1 + Wave 9 brings the wall-clock down.
 - **DO NOT break existing v1/v2/v3/v3.5/v4 surface contracts.** Every test file that was green at the start of Phase 4.5 must stay green at the end. The cross-feature integration tests (`tests/test_phase4_integration.py`, `tests/test_phase3_integration.py`) are particularly load-bearing.
@@ -0,0 +1,23 @@
 {
  "planPath": "docs/plans/2026-04-27-v4.5-phase4.5-cleanup.md",
  "tasks": [
    {"id": 103, "subject": "T103: branches polish (global-leak doc + branch-switch warning)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 104, "subject": "T104: state/memory.py polish (DRY MAX(id) + fts_rank doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 105, "subject": "T105: snapshots.py polish (datetime hoist + kind validation + mtime doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 106, "subject": "T106: search.py polish (k constant + N+1 batched lookups)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 107, "subject": "T107: embeddings.py timeout_s fallback-path logging", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 108, "subject": "T108: scene-close-on-cancel UX revisit (regression test pin + rationale doc)", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
    {"id": 109, "subject": "T109: 0014 schema migration (FK CASCADE + memories.event_id column)", "status": "pending", "wave": 2, "parallelGroup": null},
    {"id": 110, "subject": "T110: drawer Phase 4.5 bundle (event_id guard + html.escape + modal partial + bulk sig re-rate)", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [109]},
    {"id": 111, "subject": "T111: search UX (FTS snippet highlighting + deep-link via memories.event_id)", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [109]},
    {"id": 112, "subject": "T112: real embedding model swap (LLMClient.embed protocol + Featherless impl + routing)", "status": "pending", "wave": 5, "parallelGroup": null},
    {"id": 113, "subject": "T113: branching read-side filter (event readers consult is_active branch range)", "status": "pending", "wave": 6, "parallelGroup": null, "blockedBy": [109]},
    {"id": 114, "subject": "T114: regenerate lifecycle rollback (back-reference + event_status_reverted)", "status": "pending", "wave": 7, "parallelGroup": null},
    {"id": 115, "subject": "T115: sqlite-vec swap (vec0 virtual tables + MATCH search) [ENVIRONMENTAL — may defer]", "status": "pending", "wave": 8, "parallelGroup": null},
    {"id": 116, "subject": "T116: structured CannedQueue test fixture builder", "status": "pending", "wave": 9, "parallelGroup": "wave-9"},
    {"id": 117, "subject": "T117: phase 4.5 cross-feature integration tests", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]},
    {"id": 118, "subject": "T118: phase 4.5 docs sweep — prune backlog, capture phase 5 candidates", "status": "pending", "wave": 9, "parallelGroup": "wave-9", "blockedBy": [110, 111, 112, 113, 114]}
  ],
  "lastUpdated": "2026-04-27T00:00:00Z",
  "notes": "16 tasks across 9 waves consolidating all 24 items in CLAUDE.md Phase 4.5/5 backlog. Wave 1 (6-way parallel) and Wave 9 (3-way parallel) maximize parallelism. Waves 2-8 are single-task by hot-file constraint. T115 (sqlite-vec swap) requires Python rebuild OR apsw migration — environmental; may defer to Phase 5. Schema baseline 13 -> 14 (T109's 0014) -> optionally 15 (T115's 0015). Big tasks: T112 (real embedding swap), T113 (branching read-side filter — riskiest), T114 (lifecycle rollback). Uses task ids T103-T118."
 }
@@ -8,8 +8,21 @@ Phase 4 ships the deterministic local pseudo-embedding so this script
 runs synchronously without a network round-trip — the LLMClient argument
 is not needed on the pseudo path. Phase 4.5+ will need a real client.
 T112 (Phase 4.5) adds two flags:
 * ``--re-embed-all`` walks **every** memory regardless of whether it
  already has an ``embeddings`` row. Useful when swapping embedding
  models — the projector is INSERT OR REPLACE, so re-emitting an event
  for an existing memory replaces the prior vector. Without this flag,
  the script keeps the Phase 4 behavior of only filling in gaps.
 * ``--model M`` overrides ``Settings.embedding_model`` for this run.
  Defaults to the configured model (which itself defaults to
  ``"pseudo-sha256-384"``).
 Run from the repo root:
    .venv/bin/python scripts/backfill_embeddings.py [--limit N] [--dry-run]
    .venv/bin/python scripts/backfill_embeddings.py --re-embed-all
    .venv/bin/python scripts/backfill_embeddings.py --re-embed-all --model bge-small-en-v1.5
 """
 from __future__ import annotations
@@ -17,11 +30,12 @@ from __future__ import annotations
 import argparse
 import asyncio
-from chat.config import load_settings
+from chat.config import Settings, load_settings
 from chat.db.connection import open_db
 from chat.db.migrate import apply_migrations
 from chat.eventlog.log import append_and_apply
 from chat.services.embeddings import (
    DEFAULT_EMBEDDING_MODEL,
    FALLBACK_EMBEDDING_MODEL,
    generate_embedding,
 )
@@ -34,6 +48,24 @@ import chat.state.memory  # noqa: F401
 import chat.state.world  # noqa: F401
 def _build_client(settings: Settings):
    """Construct an LLMClient for the backfill run.
    Default-model runs (the pseudo path) don't need a client, so we
    return ``None`` and ``generate_embedding`` skips the call. Non-default
    models route through the real client; injectable via monkeypatch in
    tests.
    """
    if settings.embedding_model == DEFAULT_EMBEDDING_MODEL:
        return None
    from chat.llm.featherless import FeatherlessClient
    return FeatherlessClient(
        api_key=settings.featherless_api_key,
        base_url=settings.featherless_base_url,
    )
 async def main() -> None:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
@@ -47,23 +79,51 @@ async def main() -> None:
        action="store_true",
        help="Print the count of memories needing embeddings, then exit.",
    )
    parser.add_argument(
        "--re-embed-all",
        action="store_true",
        help=(
            "Walk every memory (not just those without an embeddings row) "
            "and re-emit embedding_indexed events. Use this when swapping "
            "embedding models so the existing rows get replaced."
        ),
    )
    parser.add_argument(
        "--model",
        type=str,
        default=None,
        help=(
            "Embedding model identifier. Overrides Settings.embedding_model "
            "for this run; default uses the configured model."
        ),
    )
    args = parser.parse_args()
    settings = load_settings()
    settings.db_path.parent.mkdir(parents=True, exist_ok=True)
    apply_migrations(settings.db_path)
    model = args.model or settings.embedding_model
    # Override the settings instance so ``_build_client`` sees the
    # effective model when deciding whether to construct a real client.
    settings = settings.model_copy(update={"embedding_model": model})
    client = _build_client(settings)
    with open_db(settings.db_path) as conn:
-        sql = (
+        if args.re_embed_all:
-            "SELECT m.id, m.pov_summary FROM memories m "
+            sql = "SELECT m.id, m.pov_summary FROM memories m ORDER BY m.id"
-            "LEFT JOIN embeddings e ON e.memory_id = m.id "
+        else:
-            "WHERE e.memory_id IS NULL "
+            sql = (
-            "ORDER BY m.id"
+                "SELECT m.id, m.pov_summary FROM memories m "
-        )
+                "LEFT JOIN embeddings e ON e.memory_id = m.id "
                "WHERE e.memory_id IS NULL "
                "ORDER BY m.id"
            )
        if args.limit is not None:
            sql += f" LIMIT {int(args.limit)}"
        rows = conn.execute(sql).fetchall()
-        print(f"Found {len(rows)} memories needing embeddings.")
+        mode = "re-embedding" if args.re_embed_all else "needing embeddings"
        print(f"Found {len(rows)} memories {mode} (model={model}).")
        if args.dry_run:
            return
@@ -71,11 +131,12 @@ async def main() -> None:
        skipped = 0
        for memory_id, text in rows:
            result = await generate_embedding(
-                client=None,  # pseudo path: no client needed
+                client=client,
                text=text or "",
                model=model,
            )
            if result.model == FALLBACK_EMBEDDING_MODEL:
-                print(f"  Skipping memory_id={memory_id} (empty text)")
+                print(f"  Skipping memory_id={memory_id} (empty text or fallback)")
                skipped += 1
                continue
            append_and_apply(
@@ -0,0 +1,38 @@
 #!/usr/bin/env bash
 # Start the local mlx-omni-server that serves the classifier + embedding
 # models. The chat app's RoutedLLMClient routes everything except the
 # narrative model to this server; with no MLX server running, classifier
 # calls fail and embeddings degrade to the zero-vector fallback.
 #
 # Run in the foreground:
 #   ./scripts/start_mlx_server.sh
 # Run as a background daemon (logs to data/mlx-server.log):
 #   ./scripts/start_mlx_server.sh --daemon
 #
 # Models are pulled from Hugging Face on first request; expect a delay
 # the first time you exercise the classifier or embedding path.
 set -euo pipefail
 REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
 VENV="${REPO_ROOT}/.mlx-venv"
 LOG="${REPO_ROOT}/data/mlx-server.log"
 PORT="${MLX_PORT:-10240}"
 HOST="${MLX_HOST:-127.0.0.1}"
 if [ ! -x "${VENV}/bin/mlx-omni-server" ]; then
  echo "error: mlx-omni-server not installed in ${VENV}" >&2
  echo "create the venv with:" >&2
  echo "  python3.12 -m venv ${VENV} && ${VENV}/bin/pip install mlx-omni-server" >&2
  exit 1
 fi
 if [ "${1:-}" = "--daemon" ]; then
  mkdir -p "$(dirname "${LOG}")"
  nohup "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}" \
    >>"${LOG}" 2>&1 &
  echo "mlx-omni-server started in background (pid $!)"
  echo "logs: ${LOG}"
 else
  exec "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}"
 fi
@@ -0,0 +1,383 @@
 """Structured test-fixture builder for ``MockLLMClient`` canned queues.
 Phase 4.5 (T116) carry-over from Phase 3. The turn-flow tests in
 ``test_turn_flow.py``, ``test_meanwhile_turn_flow.py``,
 ``test_phase3_integration.py``, and ``test_phase4_integration.py`` used
 to construct ``MockLLMClient`` canned-response queues as raw positional
 lists of pre-encoded JSON strings. That worked, but every time a new
 classifier call landed in a code path the tests had to be patched in
 many places at the right index — easy to mis-position, hard to read.
 This module ships :class:`CannedQueue`, a fluent builder that lets a
 test declare its classifier expectations by **name** and **order** of
 call, not by index into a brittle list. Each method appends one item
 to the queue and returns ``self`` for chaining; ``build()`` JSON-encodes
 the items and produces the flat ``list[str]`` that
 ``MockLLMClient(canned=...)`` expects.
 Usage
 -----
 >>> from tests.fixtures import CannedQueue
 >>> from chat.llm.mock import MockLLMClient
 >>> canned = (
 ...     CannedQueue()
 ...         .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
 ...         .narrative("Hi there.")
 ...         .state_update()
 ...         .state_update()
 ...         .build()
 ... )
 >>> mock = MockLLMClient(canned=canned)
 Each method maps to a single classifier (or stream) call that the turn
 flow makes, in the order the production code makes them. Picking the
 right method for the slot you need keeps the test readable and lets the
 builder pin sensible defaults for the fields tests don't care about.
 Migration template
 ------------------
 To migrate a positional canned-array test:
 1. Identify each slot in the existing array and what classifier it
   feeds. Comments above the array often spell this out — start there.
 2. Replace each slot with the matching :class:`CannedQueue` method:
   - ``json.dumps({"segments": [...]})`` → ``.parse_turn(segments=...)``
   - bare narrative string → ``.narrative("...")``
   - zero-state JSON  → ``.state_update()`` (defaults are zeros)
   - ``json.dumps({"addressee_id": ...})`` → ``.detect_addressee(...)``
   - ``json.dumps({"should_interject": ...})`` → ``.detect_interjection(...)``
   - ``json.dumps({"should_close": ...})`` → ``.detect_scene_close(...)``
   - ``json.dumps({"transitions": [...]})`` → ``.detect_event_transitions(...)``
   - per-POV summary JSON → ``.summarize_scene_pov(summary=...)``
 3. End with ``.build()`` and pass that to
   ``MockLLMClient(canned=...)``. The mock's contract is unchanged.
 Notes on streams
 ----------------
 ``MockLLMClient.stream`` and ``MockLLMClient.generate`` share one queue
 — each pop is one entry, regardless of whether the production code
 streams the response or generates it whole. The narrative service
 streams; classifier services generate. The builder treats both the same:
 ``narrative()`` appends a raw string, the classifier methods append
 JSON-encoded dicts. Both end up in the same flat ``list[str]`` that the
 mock pops from in order.
 The remaining tests in the suite (about 30 across the four files
 mentioned above) still use positional arrays — Phase 5 work to migrate
 the rest. New tests should prefer this builder.
 """
 from __future__ import annotations
 import json
 from typing import Any
 class CannedQueue:
    """Fluent builder for ``MockLLMClient`` canned-response queues.
    Each method appends one item to an internal queue and returns
    ``self`` for chaining. ``build()`` returns the flat ``list[str]``
    suitable for ``MockLLMClient(canned=...)``.
    The queue holds either ``dict`` (JSON-encoded at ``build()`` time)
    or ``str`` (passed through verbatim — used for narrative streams).
    """
    def __init__(self) -> None:
        self._queue: list[Any] = []
    # ------------------------------------------------------------------
    # Narrative stream — bare string, no JSON wrapping.
    # ------------------------------------------------------------------
    def narrative(self, text: str) -> "CannedQueue":
        """Append one streaming narrative response.
        ``MockLLMClient.stream`` pops the next entry from the same queue
        as ``generate`` — a bare string is what the streaming bot beat
        consumes. Use one ``narrative()`` per assistant beat (primary,
        and optionally an interjection / second beat).
        """
        self._queue.append(text)
        return self
    def raw(self, value: str) -> "CannedQueue":
        """Append a raw string (escape hatch for non-classifier calls).
        Most tests should reach for the named helpers — this is here
        for one-offs the builder doesn't model yet.
        """
        self._queue.append(value)
        return self
    # ------------------------------------------------------------------
    # Turn parser — splits user prose into segments.
    # ------------------------------------------------------------------
    def parse_turn(
        self,
        *,
        segments: list[dict] | None = None,
        intent: str = "narrative",
        landing_state_hint: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one ``parse_turn`` classifier response.
        ``intent`` defaults to ``"narrative"``; pass ``"skip_elision"``
        or ``"skip_jump"`` to exercise the natural-language skip paths.
        ``landing_state_hint`` carries the residual descriptor for
        elision skips and is otherwise ignored.
        """
        payload: dict[str, Any] = {
            "segments": segments if segments is not None else [],
            "intent": intent,
            "landing_state_hint": landing_state_hint,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Multi-entity addressee classifier (T74.1).
    # ------------------------------------------------------------------
    def detect_addressee(
        self,
        *,
        addressee_id: str,
        confidence: str = "medium",
        reason: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one ``detect_addressee`` classifier response."""
        payload: dict[str, Any] = {
            "addressee_id": addressee_id,
            "confidence": confidence,
            "reason": reason,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # State-update — one per directed edge per turn.
    # ------------------------------------------------------------------
    def state_update(
        self,
        *,
        affinity_delta: int = 0,
        trust_delta: int = 0,
        knowledge_facts: list | None = None,
        **rest: Any,
    ) -> "CannedQueue":
        """Append one ``apply_state_update`` classifier response.
        Defaults to a benign zero-delta payload — tests that don't care
        about state mutations can call this without arguments. One call
        is required per directed edge that fires after the assistant
        beat (e.g. single-bot non-guest turn = 2 calls; multi-bot guest
        turn = 6 calls).
        """
        payload: dict[str, Any] = {
            "affinity_delta": affinity_delta,
            "trust_delta": trust_delta,
            "knowledge_facts": (
                knowledge_facts if knowledge_facts is not None else []
            ),
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    def zero_state(self) -> "CannedQueue":
        """Alias for ``state_update()`` with all defaults — matches the
        ``_zero_state()`` helper in existing tests.
        """
        return self.state_update()
    # ------------------------------------------------------------------
    # Interjection (T74.2) — silent witness chimes in.
    # ------------------------------------------------------------------
    def detect_interjection(
        self,
        *,
        should_interject: bool,
        reason: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one ``detect_interjection`` classifier response."""
        payload: dict[str, Any] = {
            "should_interject": should_interject,
            "reason": reason,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    def detect_interjection_targeted(
        self,
        *,
        targeted: bool,
        target_id: str | None = None,
        reason: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one targeted-interjection classifier response."""
        payload: dict[str, Any] = {
            "targeted": targeted,
            "target_id": target_id,
            "reason": reason,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Scene-close detector (T26).
    # ------------------------------------------------------------------
    def detect_scene_close(
        self,
        *,
        should_close: bool,
        reason: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one ``detect_scene_close`` classifier response."""
        payload: dict[str, Any] = {
            "should_close": should_close,
            "reason": reason,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Event lifecycle (T52, T61) — per-turn transitions.
    # ------------------------------------------------------------------
    def detect_event_transitions(
        self,
        transitions: list[dict] | None = None,
    ) -> "CannedQueue":
        """Append one ``detect_event_transitions`` classifier response.
        ``transitions`` is a list of ``{"event_id": ..., "new_status":
        "active"|"completed"|"cancelled", "reason": ...}`` dicts. Pass
        an empty list (or omit the argument) to assert that the call
        ran but produced no transitions; pass ``None`` for an empty
        list with the same shape.
        Note: when no events are seeded, ``detect_event_transitions``
        short-circuits without an LLM call — in that case do NOT append
        this slot.
        """
        payload = {"transitions": transitions if transitions is not None else []}
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Per-POV scene summary (used after scene close).
    # ------------------------------------------------------------------
    def summarize_scene_pov(
        self,
        *,
        summary: str,
        knowledge_facts: list | None = None,
        relationship_summary: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one per-POV scene-summary response.
        Used by ``apply_scene_close_summary`` — one call per witness
        once a scene closes.
        """
        payload: dict[str, Any] = {
            "summary": summary,
            "knowledge_facts": (
                knowledge_facts if knowledge_facts is not None else []
            ),
            "relationship_summary": relationship_summary,
        }
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Thread detection (Phase 3 §3.3).
    # ------------------------------------------------------------------
    def detect_threads(
        self,
        candidates: list[dict] | None = None,
    ) -> "CannedQueue":
        """Append one ``detect_threads`` classifier response.
        ``candidates`` is a list of ``{"action": "open"|"update",
        "title": ..., "summary": ..., "existing_thread_id": ...}`` dicts.
        """
        payload = {"candidates": candidates if candidates is not None else []}
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Meanwhile digest — narrative summary of what happened off-screen.
    # ------------------------------------------------------------------
    def meanwhile_digest(self, summary: str) -> "CannedQueue":
        """Append one meanwhile-digest narrative response.
        The digest service streams the digest as plain text (not JSON)
        so this is a thin wrapper over ``narrative``/``raw`` for
        readability at the call site.
        """
        self._queue.append(summary)
        return self
    # ------------------------------------------------------------------
    # Significance scorer (background worker; rarely hit in unit tests
    # but available for completeness).
    # ------------------------------------------------------------------
    def score_significance(
        self,
        *,
        score: float = 0.0,
        reason: str = "",
        **rest: Any,
    ) -> "CannedQueue":
        """Append one significance-scoring classifier response."""
        payload: dict[str, Any] = {"score": score, "reason": reason}
        payload.update(rest)
        self._queue.append(payload)
        return self
    # ------------------------------------------------------------------
    # Build / introspection.
    # ------------------------------------------------------------------
    def build(self) -> list[str]:
        """Return the flat ``list[str]`` queue for ``MockLLMClient``.
        Dict items are JSON-encoded; string items are passed through
        verbatim (so streaming responses retain their raw form).
        """
        out: list[str] = []
        for item in self._queue:
            if isinstance(item, str):
                out.append(item)
            else:
                out.append(json.dumps(item))
        return out
    def __len__(self) -> int:
        return len(self._queue)
@@ -0,0 +1,231 @@
 """Tests for the backfill_embeddings script (T112, Phase 4.5).
 Phase 4 shipped a backfill that walked memories *without* an embedding
 row and produced a vector for each (deterministic pseudo path). T112
 adds a ``--re-embed-all`` flag that walks **every** memory regardless
 of whether it already has an embeddings row, so operators can swap
 embedding models and have the existing rows replaced (the
 ``embedding_indexed`` projector is INSERT OR REPLACE).
 These tests exercise the script's ``main()`` directly via asyncio —
 shell-out via subprocess would also work but importing keeps the
 fixture surface small and the failure mode clearer.
 """
 from __future__ import annotations
 from pathlib import Path
 from unittest.mock import patch
 import pytest
 from chat.db.connection import open_db
 from chat.db.migrate import apply_migrations
 from chat.eventlog.log import append_and_apply, append_event
 from chat.eventlog.projector import project
 from chat.services.embeddings import DEFAULT_EMBEDDING_MODEL
 # Trigger handler registration for projection.
 import chat.state.embeddings  # noqa: F401
 import chat.state.entities  # noqa: F401
 import chat.state.memory  # noqa: F401
 import chat.state.world  # noqa: F401
 import scripts.backfill_embeddings as backfill
 def _seed(db_path: Path, count: int) -> list[int]:
    """Seed ``count`` memory rows for ``bot_a``; return their ids."""
    with open_db(db_path) as conn:
        append_event(
            conn,
            kind="bot_authored",
            payload={
                "id": "bot_a",
                "name": "BotA",
                "persona": "...",
                "voice_samples": [],
                "traits": [],
                "backstory": "",
                "initial_relationship_to_you": "",
                "kickoff_prose": "",
            },
        )
        append_event(
            conn,
            kind="chat_created",
            payload={
                "id": "chat_bot_a",
                "host_bot_id": "bot_a",
                "initial_time": "2026-04-26T20:00:00+00:00",
                "narrative_anchor": "Day 1",
                "weather": "",
            },
        )
        for i in range(count):
            append_event(
                conn,
                kind="memory_written",
                payload={
                    "owner_id": "bot_a",
                    "chat_id": "chat_bot_a",
                    "pov_summary": f"memory text {i}",
                    "witness_you": 1,
                    "witness_host": 1,
                    "witness_guest": 0,
                    "source": "direct",
                    "reliability": 1.0,
                    "significance": 1,
                    "pinned": 0,
                    "auto_pinned": 0,
                },
            )
        project(conn)
        return [
            r[0]
            for r in conn.execute(
                "SELECT id FROM memories WHERE owner_id = 'bot_a' ORDER BY id"
            ).fetchall()
        ]
 def _seed_embedding(db_path: Path, memory_id: int, model: str = "stale-model") -> None:
    """Insert a stale ``embedding_indexed`` event so the row already
    exists in ``embeddings`` (and the default backfill would skip it)."""
    with open_db(db_path) as conn:
        append_and_apply(
            conn,
            kind="embedding_indexed",
            payload={
                "memory_id": memory_id,
                "model": model,
                "dim": 3,
                "vector": [0.0, 0.0, 0.0],
            },
        )
@pytest.mark.asyncio
 async def test_re_embed_all_walks_every_memory(tmp_path, monkeypatch, capsys):
    """``--re-embed-all`` re-embeds memories that already have rows in
    ``embeddings`` (default mode skips them). After the run, every
    memory should have an updated embedding tagged with the configured
    model (the projector replaces stale rows in place)."""
    db = tmp_path / "t.db"
    apply_migrations(db)
    memory_ids = _seed(db, count=3)
    # Pre-seed stale embeddings on two of the three memories so the
    # default path would skip them and only ``--re-embed-all`` covers
    # everything.
    _seed_embedding(db, memory_ids[0])
    _seed_embedding(db, memory_ids[1])
    cfg = tmp_path / "config.toml"
    cfg.write_text(
        f'featherless_api_key = "x"\n'
        f'db_path = "{db}"\n'
        f'data_dir = "{tmp_path}"\n'
    )
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db))
    with patch("sys.argv", ["backfill_embeddings.py", "--re-embed-all"]):
        await backfill.main()
    # All three memories now have a fresh embedding tagged with the
    # default pseudo model (replacing the stale rows).
    with open_db(db) as conn:
        rows = conn.execute(
            "SELECT memory_id, model FROM embeddings ORDER BY memory_id"
        ).fetchall()
        assert len(rows) == 3
        for mid, model in rows:
            assert mid in memory_ids
            assert model == DEFAULT_EMBEDDING_MODEL
@pytest.mark.asyncio
 async def test_default_backfill_only_walks_missing(tmp_path, monkeypatch):
    """Without ``--re-embed-all``, the script keeps the Phase 4
    behavior — memories with an existing embedding row are left
    alone (their stale-model tag survives)."""
    db = tmp_path / "t.db"
    apply_migrations(db)
    memory_ids = _seed(db, count=2)
    _seed_embedding(db, memory_ids[0], model="stale-model")
    # memory_ids[1] has no embedding yet.
    cfg = tmp_path / "config.toml"
    cfg.write_text(
        f'featherless_api_key = "x"\n'
        f'db_path = "{db}"\n'
        f'data_dir = "{tmp_path}"\n'
    )
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db))
    with patch("sys.argv", ["backfill_embeddings.py"]):
        await backfill.main()
    with open_db(db) as conn:
        rows = dict(
            conn.execute(
                "SELECT memory_id, model FROM embeddings ORDER BY memory_id"
            ).fetchall()
        )
        # Stale row preserved; only the missing one was filled.
        assert rows[memory_ids[0]] == "stale-model"
        assert rows[memory_ids[1]] == DEFAULT_EMBEDDING_MODEL
@pytest.mark.asyncio
 async def test_re_embed_all_respects_model_arg(tmp_path, monkeypatch):
    """The ``--model`` flag overrides ``Settings.embedding_model``.
    With a non-default model and a client that returns canned vectors,
    every memory is re-embedded with the supplied model tag."""
    db = tmp_path / "t.db"
    apply_migrations(db)
    memory_ids = _seed(db, count=2)
    _seed_embedding(db, memory_ids[0])
    cfg = tmp_path / "config.toml"
    cfg.write_text(
        f'featherless_api_key = "x"\n'
        f'db_path = "{db}"\n'
        f'data_dir = "{tmp_path}"\n'
    )
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db))
    # Patch the client factory the script uses to produce a Mock with
    # canned embeddings — one per memory.
    from chat.llm.mock import MockLLMClient
    canned_vec = [0.1] * 384
    def _factory(_settings):
        return MockLLMClient(
            canned=[],
            canned_embeddings=[list(canned_vec) for _ in memory_ids],
        )
    monkeypatch.setattr(backfill, "_build_client", _factory)
    with patch(
        "sys.argv",
        [
            "backfill_embeddings.py",
            "--re-embed-all",
            "--model",
            "bge-small-en-v1.5",
        ],
    ):
        await backfill.main()
    with open_db(db) as conn:
        rows = conn.execute(
            "SELECT memory_id, model FROM embeddings ORDER BY memory_id"
        ).fetchall()
        assert len(rows) == 2
        for _, model in rows:
            assert model == "bge-small-en-v1.5"
@@ -1,11 +1,19 @@
 from __future__ import annotations
 import logging
 from chat.db.connection import open_db
 from chat.db.migrate import apply_migrations
 from chat.eventlog.log import append_event
 from chat.eventlog.projector import project
 import chat.state.branches  # registers handlers
-from chat.state.branches import active_branch, get_branch, list_branches
+from chat.state.branches import (
    _NO_HEAD_CLAMP,
    active_branch,
    active_branch_event_ids,
    get_branch,
    list_branches,
 )
 def test_main_branch_bootstrapped_by_migration(tmp_path):
@@ -139,3 +147,116 @@ def test_list_branches_returns_all(tmp_path):
        names = [b["name"] for b in list_branches(conn)]
        assert "main" in names
        assert "experiment" in names
 def test_branch_switched_unknown_name_warns(tmp_path, caplog):
    """Switching to a nonexistent branch logs a warning and leaves no branch active.
    The previous behavior silently cleared is_active flags and applied no UPDATE
    when the named branch did not exist. T103 makes that condition observable
    by emitting a warning while preserving the existing (zero-active) outcome.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        with caplog.at_level(logging.WARNING, logger="chat.state.branches"):
            append_event(
                conn,
                kind="branch_switched",
                payload={"name": "does_not_exist"},
            )
            project(conn)
        # A warning was emitted naming the missing branch.
        warnings = [
            r for r in caplog.records
            if r.levelno == logging.WARNING and r.name == "chat.state.branches"
        ]
        assert warnings, "expected a warning for unknown branch name"
        assert any("does_not_exist" in r.getMessage() for r in warnings)
        # Existing behavior preserved: no branch is active after the switch.
        assert active_branch(conn) is None
        # The unknown name was not inserted as a side effect.
        assert get_branch(conn, "does_not_exist") is None
 def test_active_branch_event_ids_bootstrap_main_returns_no_clamp(tmp_path):
    """Bootstrap "main" (origin=0, head=0) reads as the no-clamp sentinel.
    Migration 0013 seeds main with both event-id columns at 0; production
    today never emits ``branch_head_updated`` for main, so head stays at 0
    even as events accumulate. The helper treats this exact bootstrap
    state as "all events visible" (lower bound 0, upper bound BIG_INT) so
    every existing reader stays branch-agnostic until a non-main branch
    becomes active.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        origin, head = active_branch_event_ids(conn)
        assert origin == 0
        assert head == _NO_HEAD_CLAMP
 def test_active_branch_event_ids_no_active_branch_falls_through(tmp_path):
    """No active branch row at all → defensive ``(0, BIG_INT)``.
    A switch to an unknown branch leaves zero rows with ``is_active=1``;
    ``active_branch`` returns None. The helper must still hand readers a
    workable range (the full log) so the read pipeline doesn't crash on
    an inconsistent metadata state.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Switching to a nonexistent branch clears is_active flags
        # without setting any other branch active.
        append_event(
            conn,
            kind="branch_switched",
            payload={"name": "does_not_exist"},
        )
        project(conn)
        assert active_branch(conn) is None
        origin, head = active_branch_event_ids(conn)
        assert origin == 0
        assert head == _NO_HEAD_CLAMP
 def test_active_branch_event_ids_returns_actual_range_for_non_main(tmp_path):
    """Non-main branches return their literal ``(origin, head)`` window.
    A branch created at origin=10 + bumped to head=20 must surface as
    (10, 20) so readers' ``BETWEEN`` clamp scopes to that window.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        append_event(
            conn,
            kind="branch_created",
            payload={
                "name": "experiment",
                "origin_event_id": 10,
                "head_event_id": 10,
                "chat_id": "c1",
            },
        )
        append_event(
            conn,
            kind="branch_head_updated",
            payload={"name": "experiment", "head_event_id": 20},
        )
        append_event(
            conn,
            kind="branch_switched",
            payload={"name": "experiment"},
        )
        project(conn)
        origin, head = active_branch_event_ids(conn)
        assert origin == 10
        assert head == 20
@@ -129,3 +129,279 @@ def test_list_branches_with_metadata_includes_event_count(tmp_path):
        assert rows["exp"]["origin_event_id"] == 10
        assert rows["exp"]["head_event_id"] == 15
        assert rows["exp"]["event_count"] == 6
 # ---------------------------------------------------------------------------
 # T113 read-side filter — cross-feature tests.
 # ---------------------------------------------------------------------------
 #
 # These exercise the active-branch event-id clamp through every reader
 # the spec called out: ``read_recent_dialogue`` (turn_common),
 # ``_read_recent_dialogue`` (scene_summarize), and ``search_memories``
 # (memory). They drive the readers via real event-log inserts + branch
 # switches so the integration is end-to-end.
 def _seed_user_turn(conn, chat_id: str, prose: str) -> int:
    return append_and_apply(
        conn,
        kind="user_turn",
        payload={"chat_id": chat_id, "prose": prose, "segments": []},
    )
 def test_read_recent_dialogue_respects_active_branch_head(tmp_path):
    """T113 spec test 1: dialogue reader clamps to active branch head.
    Seed 10 user turns; create a branch with origin=1 + head=5 and switch
    to it; assert ``read_recent_dialogue`` only returns the first 5
    turns. (The 5 events with id 6..10 fall outside ``[1, 5]``.)
    """
    from chat.services.turn_common import read_recent_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(10)]
        # 5 events visible after the switch.
        branch_from_event(
            conn, name="halfway", origin_event_id=ids[0], chat_id="c1"
        )
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "halfway", "head_event_id": ids[4]},
        )
        switch_active_branch(conn, name="halfway")
        rows = read_recent_dialogue(conn, "c1")
        # The reader returns oldest-first, so the visible-set is the
        # first 5 turns.
        assert len(rows) == 5
        assert [r["text"] for r in rows] == [f"turn {i}" for i in range(5)]
 def test_search_memories_respects_active_branch_head(tmp_path):
    """T113 spec test 2: memory search clamps to active branch head via
    ``memories.event_id``. Memories whose projecting event lands outside
    the clamp drop out of FTS results."""
    from chat.eventlog.log import append_and_apply as _aa
    from chat.state.memory import search_memories
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Two memories projected from real events. The projector handler
        # stamps memories.event_id from the projecting event's id.
        ev_a = _aa(
            conn,
            kind="memory_written",
            payload={
                "owner_id": "host_bot",
                "chat_id": "c1",
                "scene_id": 1,
                "pov_summary": "alpha keyword present",
                "witness_you": 1,
                "witness_host": 1,
                "witness_guest": 0,
            },
        )
        ev_b = _aa(
            conn,
            kind="memory_written",
            payload={
                "owner_id": "host_bot",
                "chat_id": "c1",
                "scene_id": 1,
                "pov_summary": "alpha keyword present too",
                "witness_you": 1,
                "witness_host": 1,
                "witness_guest": 0,
            },
        )
        # Branch clamps to ev_a only (head = ev_a; ev_b sits past head).
        branch_from_event(
            conn, name="early", origin_event_id=ev_a, chat_id="c1"
        )
        switch_active_branch(conn, name="early")
        results = search_memories(conn, "host_bot", "host", "alpha")
        # Only the first memory should surface — the second's event_id
        # exceeds the active branch head.
        ids = [r["event_id"] for r in results]
        assert ev_a in ids
        assert ev_b not in ids
 def test_branch_switch_changes_visible_events(tmp_path):
    """T113 spec test 3: switching branches mid-flight changes the read
    immediately. ``read_recent_dialogue`` re-queries on every call."""
    from chat.services.turn_common import read_recent_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)]
        branch_from_event(
            conn, name="early", origin_event_id=ids[0], chat_id="c1"
        )
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "early", "head_event_id": ids[2]},
        )
        branch_from_event(
            conn, name="late", origin_event_id=ids[3], chat_id="c1"
        )
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "late", "head_event_id": ids[5]},
        )
        switch_active_branch(conn, name="early")
        early_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")]
        assert early_rows == ["turn 0", "turn 1", "turn 2"]
        switch_active_branch(conn, name="late")
        late_rows = [r["text"] for r in read_recent_dialogue(conn, "c1")]
        assert late_rows == ["turn 3", "turn 4", "turn 5"]
 def test_main_branch_with_head_zero_returns_empty(tmp_path):
    """T113 spec test 4: a non-main branch with head=0 returns empty.
    The bootstrap-main sentinel only fires for ``name=="main", origin=0,
    head=0``. A different branch parked at ``origin=0, head=0`` is not a
    sentinel and the ``BETWEEN 0 AND 0`` clamp filters out every real
    event_log row (rowids start at 1)."""
    from chat.services.turn_common import read_recent_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Need a real event_log row id 1+ so the clamp's "exclude 0" actually
        # has something to exclude — otherwise we trivially return [].
        _seed_user_turn(conn, "c1", "turn 0")
        # Force-create a branch at origin=0, head=0 (NOT main). This is an
        # artificial state — production never produces it — but it's the
        # cleanest way to drive the documented edge case.
        append_and_apply(
            conn,
            kind="branch_created",
            payload={
                "name": "stub",
                "origin_event_id": 0,
                "head_event_id": 0,
                "chat_id": "c1",
            },
        )
        switch_active_branch(conn, name="stub")
        rows = read_recent_dialogue(conn, "c1")
        assert rows == []
 def test_no_active_branch_falls_through_to_all_events(tmp_path):
    """T113 spec test 5: with no active branch (e.g. a switch to an
    unknown name cleared all is_active flags), readers see the full log
    via the ``(0, BIG_INT)`` defensive default."""
    from chat.services.turn_common import read_recent_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        for i in range(3):
            _seed_user_turn(conn, "c1", f"turn {i}")
        # Switching to an unknown branch leaves zero rows with is_active=1.
        append_and_apply(
            conn,
            kind="branch_switched",
            payload={"name": "missing"},
        )
        from chat.state.branches import active_branch as _ab
        assert _ab(conn) is None
        rows = read_recent_dialogue(conn, "c1")
        assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"]
 def test_scene_summarize_read_recent_dialogue_respects_branch(tmp_path):
    """T113: ``scene_summarize._read_recent_dialogue`` (the scene-close
    summary input) also clamps to the active branch range."""
    from chat.services.scene_summarize import _read_recent_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        ids = [_seed_user_turn(conn, "c1", f"turn {i}") for i in range(6)]
        branch_from_event(
            conn, name="early", origin_event_id=ids[0], chat_id="c1"
        )
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "early", "head_event_id": ids[2]},
        )
        switch_active_branch(conn, name="early")
        rows = _read_recent_dialogue(conn, "c1")
        assert [r["text"] for r in rows] == ["turn 0", "turn 1", "turn 2"]
 def test_meanwhile_dialogue_reader_respects_branch(tmp_path):
    """T113: meanwhile prompt-context reader also clamps to the active
    branch. The meanwhile reader filters by ``meanwhile_scene_id``; the
    branch filter is composed on top of that filter."""
    from chat.web.meanwhile import _read_recent_meanwhile_dialogue
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Seed user turns + meanwhile assistant turns interleaved so the
        # branch-id clamp lands across both kinds.
        u1 = _seed_user_turn(conn, "c1", "u1")
        a1 = append_and_apply(
            conn,
            kind="assistant_turn",
            payload={
                "chat_id": "c1",
                "speaker_id": "host",
                "text": "a1",
                "meanwhile_scene_id": 7,
            },
        )
        # Past-head turn should NOT appear once we switch to ``early``.
        a2 = append_and_apply(
            conn,
            kind="assistant_turn",
            payload={
                "chat_id": "c1",
                "speaker_id": "guest",
                "text": "a2",
                "meanwhile_scene_id": 7,
            },
        )
        branch_from_event(
            conn, name="early", origin_event_id=u1, chat_id="c1"
        )
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "early", "head_event_id": a1},
        )
        switch_active_branch(conn, name="early")
        rows = _read_recent_meanwhile_dialogue(conn, "c1", scene_id=7)
        texts = [r["text"] for r in rows]
        assert "a1" in texts
        assert "a2" not in texts
        # Suppress the "unused" linter warning while keeping the binding
        # readable for the test narrative.
        _ = a2
@@ -0,0 +1,69 @@
 """Pin the contract: ``_apply_chat_created`` is NOT replay-safe.
 See ``docs/audits/2026-04-27-project-callers.md`` for the full audit.
 The handler at ``chat/state/world.py:_apply_chat_created`` uses raw
 ``INSERT INTO chats ...`` and ``INSERT INTO chat_state ...`` with no
 ``OR REPLACE``/``OR IGNORE``. Running ``project()`` twice over the same
 ``chat_created`` event MUST raise ``sqlite3.IntegrityError`` on the
 second pass — this is the bug that produced the 500 fixed in commit
 ``0f8bf94`` (and the latent equivalents fixed in this commit).
 Pinning the contract here means any future "make it idempotent" change
 to the handler MUST update this test, which forces a deliberate review
 of the trade-offs: most notably, that ``chat_state`` columns mutated by
 later events (``time_skip_elision`` bumps ``time``;
 ``scene_opened``/``scene_closed`` toggle ``active_scene_id``) would be
 silently overwritten by an ``INSERT OR REPLACE`` on every replay. The
 audit explains why we keep the handler raw-INSERT and enforce the rule
 at the call site via ``append_and_apply`` instead.
 """
 from __future__ import annotations
 import sqlite3
 import pytest
 from chat.db.connection import open_db
 from chat.db.migrate import apply_migrations
 from chat.eventlog.log import append_event
 from chat.eventlog.projector import project
 import chat.state.world  # noqa: F401 — import registers the handler
 def _chat_payload():
    return {
        "id": "chat_bot_a",
        "host_bot_id": "bot_a",
        "guest_bot_id": None,
        "initial_time": "2026-04-27T12:00:00+00:00",
        "narrative_anchor": "Day 1 noon",
        "weather": "clear",
    }
 def test_chat_created_handler_is_not_replay_safe(tmp_path):
    """A second projection over an extra ``chat_created`` for the same id raises.
    This is the exact failure shape from incident ``0f8bf94``: a raw
    INSERT against ``chats.id`` (PK) trips ``UNIQUE constraint failed``
    on the second pass. If this test ever starts FAILING (i.e. the
    second project() succeeds), someone has changed the handler to be
    idempotent — read the audit before approving.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # First chat_created + first project: must succeed.
        append_event(conn, kind="chat_created", payload=_chat_payload())
        project(conn)
        # Append a SECOND chat_created with the same id. project() will
        # walk both, re-INSERT the same chats row, and trip the UNIQUE
        # constraint on chats.id.
        append_event(conn, kind="chat_created", payload=_chat_payload())
        with pytest.raises(sqlite3.IntegrityError) as exc_info:
            project(conn)
        # Match on the column to make sure we caught the *intended*
        # constraint, not some unrelated FK/check failure that happens
        # to also be an IntegrityError.
        assert "chats.id" in str(exc_info.value)
@@ -24,3 +24,25 @@ def test_chat_db_path_env_overrides_default(tmp_path, monkeypatch):
    (tmp_path / "config.toml").write_text('featherless_api_key = "x"\n')
    s = load_settings()
    assert s.db_path == tmp_path / "alt.db"
 def test_embedding_model_defaults_to_pseudo(tmp_path, monkeypatch):
    """T112: ``embedding_model`` defaults to the deterministic pseudo
    so existing zero-config installs keep the Phase 4 behavior."""
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(tmp_path / "config.toml"))
    (tmp_path / "config.toml").write_text('featherless_api_key = "x"\n')
    s = load_settings()
    assert s.embedding_model == "pseudo-sha256-384"
 def test_embedding_model_overridable_via_toml(tmp_path, monkeypatch):
    """T112: operators swap the embedding model by editing config.toml.
    The new value flows through to the embedding worker at startup."""
    cfg = tmp_path / "config.toml"
    cfg.write_text(
        'featherless_api_key = "x"\n'
        'embedding_model = "bge-small-en-v1.5"\n'
    )
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    s = load_settings()
    assert s.embedding_model == "bge-small-en-v1.5"
@@ -458,6 +458,183 @@ def test_t98_4_delete_invokes_rewind_and_drops_cascade(client, tmp_path):
            assert row is None, f"event {ev_id} should have been deleted"
 def test_delete_impact_modal_uses_jinja_partial(client, tmp_path):
    """T110.3: the modal HTML is rendered from a Jinja partial
    (`_delete_impact_modal.html`) rather than f-string concatenation in
    Python. Verify the partial-rendered shape: the wrapping
    ``delete-impact-modal`` div, the cascade list, and the confirm form.
    The partial inherits Jinja2 autoescape so HTML safety follows
    automatically — the explicit ``html.escape()`` calls from T110.2
    become redundant once this lands.
    """
    db = tmp_path / "test.db"
    _seed_chat(db)
    user_id, _bot_id = _seed_turns(db)
    response = client.get(
        f"/chats/chat_bot_a/drawer/turn/delete-preview/{user_id}"
    )
    assert response.status_code == 200
    body = response.text
    # Markup shape that the partial produces. Double-quoted attributes
    # signal Jinja rendering (the prior f-string used single quotes).
    assert '<div class="delete-impact-modal">' in body
    assert '<ul class="delete-impact-cascade">' in body
    # The confirm form still posts to the same delete route.
    assert f"/chats/chat_bot_a/drawer/turn/delete/{user_id}" in body
    assert "Confirm delete" in body
 def test_delete_impact_modal_escapes_user_controllable_strings(client, tmp_path):
    """T110.2: defense-in-depth — fields embedded in the modal HTML come
    from event payloads (turn prose, scene timestamps, etc.) which are
    ultimately user-controllable. Wrap them with ``html.escape`` so a
    payload like ``<script>alert(1)</script>`` renders as inert text and
    doesn't leak through into the rendered modal as actual markup.
    """
    db = tmp_path / "test.db"
    _seed_chat(db)
    # Seed a user_turn whose prose contains an HTML-script payload. The
    # modal renders ``description = "turn N (you: <prose excerpt>)"`` so
    # the prose flows verbatim into the cascade list <li>.
    with open_db(db) as conn:
        evil_id = append_and_apply(
            conn,
            kind="user_turn",
            payload={
                "chat_id": "chat_bot_a",
                "prose": "<script>alert('xss')</script>",
                "segments": [],
            },
        )
    response = client.get(
        f"/chats/chat_bot_a/drawer/turn/delete-preview/{evil_id}"
    )
    assert response.status_code == 200
    body = response.text
    # Raw <script> must NOT survive into the rendered HTML. The escaped
    # form (&lt;script&gt;) is what we want to see instead.
    assert "<script>alert" not in body
    assert "&lt;script&gt;alert" in body
 def test_bulk_significance_re_rate_emits_manual_edit_per_memory(client, tmp_path):
    """T110.4: bulk significance re-rate fans out into one
    ``manual_edit`` event per matching memory — preserving the per-row
    audit trail (and reversibility) instead of collapsing everything
    into a single bulk event.
    Seed five memories at significance 0, bulk re-rate 0 -> 2, and
    verify five new ``memory_significance`` ``manual_edit`` rows landed
    AND every memory now sits at significance 2.
    """
    db = tmp_path / "test.db"
    _seed_chat(db)
    # Five memories at significance 0.
    with open_db(db) as conn:
        for i in range(5):
            append_and_apply(
                conn,
                kind="memory_written",
                payload={
                    "owner_id": "bot_a",
                    "chat_id": "chat_bot_a",
                    "pov_summary": f"low-sig memory {i}",
                    "witness_you": 1,
                    "witness_host": 1,
                    "witness_guest": 0,
                    "significance": 0,
                },
            )
        # Plus one memory at significance 1 to verify the re-rate is
        # scoped to ``level_from`` and doesn't sweep the whole chat.
        append_and_apply(
            conn,
            kind="memory_written",
            payload={
                "owner_id": "bot_a",
                "chat_id": "chat_bot_a",
                "pov_summary": "already-rated memory",
                "witness_you": 1,
                "witness_host": 1,
                "witness_guest": 0,
                "significance": 1,
            },
        )
        prior_manual_edits = conn.execute(
            "SELECT COUNT(*) FROM event_log WHERE kind = 'manual_edit'"
        ).fetchone()[0]
    response = client.post(
        "/chats/chat_bot_a/drawer/memory/significance/bulk",
        data={"level_from": "0", "level_to": "2"},
    )
    assert response.status_code == 200
    with open_db(db) as conn:
        # Five new manual_edit rows, one per matching memory.
        new_manual_edits = conn.execute(
            "SELECT COUNT(*) FROM event_log WHERE kind = 'manual_edit'"
        ).fetchone()[0]
        assert new_manual_edits - prior_manual_edits == 5
        # Every emitted edit is a memory_significance edit with prior=0
        # and new=2.
        import json as _json
        rows = conn.execute(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'manual_edit' "
            "ORDER BY id DESC LIMIT 5"
        ).fetchall()
        for r in rows:
            payload = _json.loads(r[0])
            assert payload["target_kind"] == "memory_significance"
            assert payload["prior_value"] == 0
            assert payload["new_value"] == 2
        # Projection caught up — five memories at sig=2, the untouched
        # one stays at sig=1, none remain at sig=0.
        dist = dict(
            conn.execute(
                "SELECT significance, COUNT(*) FROM memories "
                "WHERE chat_id = 'chat_bot_a' GROUP BY significance"
            ).fetchall()
        )
        assert dist.get(0, 0) == 0
        assert dist.get(1, 0) == 1
        assert dist.get(2, 0) == 5
 def test_delete_turn_with_event_id_zero_returns_400(client, tmp_path):
    """T110.1: ``event_id <= 0`` is an obvious client error and must NOT
    silently rewind the entire log via ``after_event_id = -1``. The route
    rejects it with 400 so the audit trail stays intact.
    """
    db = tmp_path / "test.db"
    _seed_chat(db)
    _seed_turns(db)
    # Sanity: events present before the bad request.
    with open_db(db) as conn:
        before = conn.execute("SELECT COUNT(*) FROM event_log").fetchone()[0]
        assert before > 0
    response = client.post("/chats/chat_bot_a/drawer/turn/delete/0")
    assert response.status_code == 400
    # And the log was NOT truncated.
    with open_db(db) as conn:
        after = conn.execute("SELECT COUNT(*) FROM event_log").fetchone()[0]
        assert after == before
 # ---------------------------------------------------------------------------
 # T98.5 — remaining v1 edits (chat narrative anchor + weather).
 # ---------------------------------------------------------------------------
@@ -20,6 +20,7 @@ The pseudo path doesn't touch the LLMClient, so we pass an empty
 from __future__ import annotations
 import logging
 import math
 import pytest
@@ -89,3 +90,81 @@ async def test_generate_embedding_unit_normalized():
    result = await generate_embedding(_client(), text="some non-empty text")
    norm_sq = sum(x * x for x in result.vector)
    assert math.isclose(norm_sq, 1.0, abs_tol=1e-6)
@pytest.mark.asyncio
 async def test_generate_embedding_non_default_model_logs_warning(caplog):
    """T107: non-default model falls through to fallback and must warn.
    A Phase 4.5+ caller pointing at a real model that isn't yet wired
    up would otherwise silently degrade (zero vector → useless cosine).
    The warning surfaces the misconfiguration in logs.
    """
    caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
    result = await generate_embedding(_client(), text="hello", model="real-model")
    # Behavior unchanged: still returns the fallback sentinel.
    assert result.model == FALLBACK_EMBEDDING_MODEL == "fallback"
    assert all(x == 0.0 for x in result.vector)
    # Warning fired and names the offending model.
    warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
    assert any("non-default model" in r.getMessage() for r in warnings)
    assert any("real-model" in r.getMessage() for r in warnings)
@pytest.mark.asyncio
 async def test_generate_embedding_default_model_does_not_warn(caplog):
    """T107: the silent default path must stay silent."""
    caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
    await generate_embedding(_client(), text="hello")
    warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
    assert warnings == []
@pytest.mark.asyncio
 async def test_embed_routes_to_client_when_non_default_model():
    """T112: when a non-default ``model`` is requested, generate_embedding
    routes through ``client.embed(text, model=...)`` and wraps the
    returned vector in an EmbeddingResult tagged with the requested
    model (NOT the fallback sentinel)."""
    canned = [0.1, 0.2, 0.3, 0.4]
    client = MockLLMClient(canned=[], canned_embeddings=[canned])
    result = await generate_embedding(
        client, text="hello world", model="bge-small-en-v1.5"
    )
    assert result.vector == canned
    assert result.model == "bge-small-en-v1.5"
    assert result.dim == len(canned)
@pytest.mark.asyncio
 async def test_embed_falls_back_on_client_failure(caplog):
    """T112: when ``client.embed`` raises (e.g. NotImplementedError on
    Featherless, or a transient network error), generate_embedding logs
    the existing T107 warning and returns the zero-vector fallback so
    callers detect the sentinel and skip indexing."""
    class _FailingClient:
        async def generate(self, messages, *, model, **params):  # pragma: no cover
            raise AssertionError("generate must not be called")
        def stream(self, messages, *, model, **params):  # pragma: no cover
            raise AssertionError("stream must not be called")
        async def embed(self, text, *, model):
            raise NotImplementedError("provider does not expose embeddings")
    caplog.set_level(logging.WARNING, logger="chat.services.embeddings")
    result = await generate_embedding(
        _FailingClient(), text="hello", model="bge-small-en-v1.5"
    )
    assert result.model == FALLBACK_EMBEDDING_MODEL == "fallback"
    assert len(result.vector) == DEFAULT_EMBEDDING_DIM
    assert all(x == 0.0 for x in result.vector)
    # Existing T107 warning fires (re-used from the new exception branch).
    warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
    assert any("bge-small-en-v1.5" in r.getMessage() for r in warnings)
@@ -233,3 +233,91 @@ def test_list_active_events_filters_to_planned_and_active(tmp_path):
        cancelled = list_events_in_status(conn, "chat_bot_a", "cancelled")
        assert [e["event_id"] for e in cancelled] == ["evt_canx"]
 def test_event_status_reverted_returns_to_prior_status(tmp_path):
    """T114.2: ``event_status_reverted`` rolls a row back to ``prior_status``.
    Unlike the forward transitions, this projector handler is
    unconditional — its sole purpose is to undo a transition, including
    reverting from a terminal status (completed/cancelled) back to a
    non-terminal one.
    Three round-trips covered:
      - completed → active (rollback of an event_completed)
      - active → planned (rollback of an event_started)
      - cancelled → active (rollback of an event_cancelled)
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        _seed_chat(conn)
        append_event(
            conn,
            kind="event_planned",
            payload={
                "event_id": "evt_revert",
                "chat_id": "chat_bot_a",
                "kind": "date_at_park",
                "props": {},
                "planned_for": "2026-04-30T18:00:00+00:00",
            },
        )
        append_event(
            conn,
            kind="event_started",
            payload={
                "event_id": "evt_revert",
                "started_at": "2026-04-30T18:01:00+00:00",
            },
        )
        append_event(
            conn,
            kind="event_completed",
            payload={
                "event_id": "evt_revert",
                "completed_at": "2026-04-30T20:00:00+00:00",
            },
        )
        project(conn)
        ev = get_event(conn, "evt_revert")
        assert ev is not None
        assert ev["status"] == "completed"
        # Revert from completed → active.
        append_and_apply(
            conn,
            kind="event_status_reverted",
            payload={"event_id": "evt_revert", "prior_status": "active"},
        )
        ev = get_event(conn, "evt_revert")
        assert ev["status"] == "active"
        # Revert from active → planned.
        append_and_apply(
            conn,
            kind="event_status_reverted",
            payload={"event_id": "evt_revert", "prior_status": "planned"},
        )
        ev = get_event(conn, "evt_revert")
        assert ev["status"] == "planned"
        # Forward to cancelled, then revert from cancelled → active.
        append_and_apply(
            conn,
            kind="event_cancelled",
            payload={
                "event_id": "evt_revert",
                "completed_at": "2026-04-30T20:30:00+00:00",
            },
        )
        ev = get_event(conn, "evt_revert")
        assert ev["status"] == "cancelled"
        append_and_apply(
            conn,
            kind="event_status_reverted",
            payload={"event_id": "evt_revert", "prior_status": "active"},
        )
        ev = get_event(conn, "evt_revert")
        assert ev["status"] == "active"
@@ -0,0 +1,35 @@
 """Tests for FeatherlessClient (Phase 4.5+).
 Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112).
 Featherless's OpenAI-compatible surface routes ``/v1/embeddings`` but
 every request returns HTTP 500 ``{"type": "completions_error"}`` (the
 router accepts the URL but the backend has no embedding handler), and
 ``/v1/models`` lists no embedding-class models. The implementation
 raises ``NotImplementedError`` rather than ship a request that always
 errors; ``generate_embedding`` catches it and degrades to the
 zero-vector fallback (the existing T107 warning path).
 If/when Featherless ships embeddings, swap the body for a real call to
 ``/v1/embeddings`` and update this test to mock the HTTP layer.
 """
 from __future__ import annotations
 import pytest
 from chat.llm.featherless import FeatherlessClient
@pytest.mark.asyncio
 async def test_featherless_embed_raises_not_implemented():
    """Featherless's ``/v1/embeddings`` always 500s with
    ``"completions_error"`` and its model catalog has no embedding
    class — embed() must raise ``NotImplementedError`` so callers
    (``generate_embedding``) can degrade to the fallback zero vector
    + warning rather than silently producing useless output."""
    client = FeatherlessClient(api_key="test-key")
    with pytest.raises(NotImplementedError) as excinfo:
        await client.embed("hello world", model="bge-small-en-v1.5")
    # Message should hint at the cause so operators see why their
    # real-model swap fell back.
    assert "embeddings" in str(excinfo.value).lower()
@@ -0,0 +1,140 @@
 """Sanity tests for :mod:`tests.fixtures` — the structured CannedQueue
 builder for ``MockLLMClient`` (T116).
 The builder is a thin shaping layer over JSON dicts; these tests pin
 the JSON shapes and the ``MockLLMClient`` round-trip so nothing
 silently regresses if a default field name or shape gets renamed.
 """
 from __future__ import annotations
 import json
 import pytest
 from chat.llm.mock import MockLLMClient
 from tests.fixtures import CannedQueue
 def test_canned_queue_build_emits_expected_shapes():
    """Each builder method emits the JSON shape its classifier consumer
    expects. The narrative slot is a bare string (stream).
    """
    canned = (
        CannedQueue()
            .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
            .detect_addressee(addressee_id="bot_a", reason="host")
            .narrative("Hi there.")
            .state_update()
            .state_update(affinity_delta=1, trust_delta=2)
            .detect_interjection(should_interject=False, reason="calm")
            .detect_event_transitions(
                [{"event_id": "evt_1", "new_status": "active", "reason": "they arrived"}]
            )
            .detect_scene_close(should_close=False, reason="no signal")
            .summarize_scene_pov(summary="BotA noticed the day winding down.")
            .detect_threads(
                [
                    {
                        "action": "open",
                        "title": "Maya's job hunt",
                        "summary": "Maya is looking for a new job",
                        "existing_thread_id": None,
                    }
                ]
            )
            .build()
    )
    # All slots are strings (the MockLLMClient pops strings).
    assert all(isinstance(slot, str) for slot in canned)
    assert len(canned) == 10
    # Slot 0: parse_turn — defaults intent="narrative".
    parse = json.loads(canned[0])
    assert parse["segments"] == [{"kind": "dialogue", "text": "hello"}]
    assert parse["intent"] == "narrative"
    assert parse["landing_state_hint"] == ""
    # Slot 1: detect_addressee.
    addr = json.loads(canned[1])
    assert addr["addressee_id"] == "bot_a"
    assert addr["confidence"] == "medium"
    assert addr["reason"] == "host"
    # Slot 2: narrative — bare string, NOT JSON.
    assert canned[2] == "Hi there."
    with pytest.raises(json.JSONDecodeError):
        json.loads(canned[2])
    # Slot 3: state_update with all defaults — zero deltas, no facts.
    su0 = json.loads(canned[3])
    assert su0 == {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
    # Slot 4: state_update with custom deltas.
    su1 = json.loads(canned[4])
    assert su1["affinity_delta"] == 1
    assert su1["trust_delta"] == 2
    assert su1["knowledge_facts"] == []
    # Slot 5: detect_interjection.
    interj = json.loads(canned[5])
    assert interj == {"should_interject": False, "reason": "calm"}
    # Slot 6: detect_event_transitions.
    transitions = json.loads(canned[6])
    assert transitions["transitions"][0]["event_id"] == "evt_1"
    assert transitions["transitions"][0]["new_status"] == "active"
    # Slot 7: detect_scene_close.
    close = json.loads(canned[7])
    assert close == {"should_close": False, "reason": "no signal"}
    # Slot 8: summarize_scene_pov.
    pov = json.loads(canned[8])
    assert pov["summary"] == "BotA noticed the day winding down."
    assert pov["knowledge_facts"] == []
    assert pov["relationship_summary"] == ""
    # Slot 9: detect_threads.
    threads = json.loads(canned[9])
    assert threads["candidates"][0]["action"] == "open"
    assert threads["candidates"][0]["title"] == "Maya's job hunt"
@pytest.mark.asyncio
 async def test_canned_queue_round_trips_through_mock_llm_client():
    """Building a queue and feeding it to ``MockLLMClient`` produces the
    same items back via ``generate`` (in order). This is the contract
    every migrated test relies on.
    """
    canned = (
        CannedQueue()
            .parse_turn(segments=[{"kind": "dialogue", "text": "hi"}])
            .narrative("Hello back.")
            .state_update()
            .build()
    )
    mock = MockLLMClient(canned=canned)
    # generate() pops from the front.
    parse_str = await mock.generate([], model="x")
    assert json.loads(parse_str)["segments"] == [
        {"kind": "dialogue", "text": "hi"}
    ]
    # The narrative slot is a raw string — generate returns it as-is.
    narr_str = await mock.generate([], model="x")
    assert narr_str == "Hello back."
    # The state_update slot has zero-delta defaults.
    su_str = await mock.generate([], model="x")
    assert json.loads(su_str) == {
        "affinity_delta": 0,
        "trust_delta": 0,
        "knowledge_facts": [],
    }
    # Queue fully drained.
    with pytest.raises(IndexError):
        await mock.generate([], model="x")
@@ -19,3 +19,28 @@ async def test_mock_streams_tokens():
    async for chunk in client.stream(msgs, model="any"):
        chunks.append(chunk)
    assert "".join(chunks) == "abcd"
@pytest.mark.asyncio
 async def test_mock_llm_client_embed_pops_canned():
    """T112: MockLLMClient.embed() pops a canned vector from the front
    of ``canned_embeddings`` (mirrors the existing ``canned`` queue
    pattern for generate/stream)."""
    v1 = [0.1, 0.2, 0.3]
    v2 = [0.4, 0.5, 0.6]
    client = MockLLMClient(canned=[], canned_embeddings=[v1, v2])
    out1 = await client.embed("first", model="bge-small-en-v1.5")
    out2 = await client.embed("second", model="bge-small-en-v1.5")
    assert out1 == v1
    assert out2 == v2
@pytest.mark.asyncio
 async def test_mock_llm_client_embed_empty_queue_raises():
    """When the canned_embeddings queue is empty, ``embed`` must raise
    a clear failure (IndexError) so misconfigured tests don't silently
    return None or hang."""
    client = MockLLMClient(canned=[])
    with pytest.raises(IndexError):
        await client.embed("text", model="any")
@@ -0,0 +1,84 @@
 """Tests for LocalMLXClient (Phase 4.5+).
 Talks to a local mlx-omni-server over the OpenAI-compatible surface.
 We don't spin up a real server in tests — instead we monkey-patch the
 underlying ``AsyncOpenAI`` instance to assert on the request shape and
 return canned responses. The semaphore behavior is shared with
 FeatherlessClient (same pattern), so we don't re-test that here.
 """
 from __future__ import annotations
 from types import SimpleNamespace
 import pytest
 from chat.llm.client import Message
 from chat.llm.local_mlx import LocalMLXClient
 class _FakeChatCompletions:
    def __init__(self, response):
        self.response = response
        self.calls = []
    async def create(self, **kw):
        self.calls.append(kw)
        return self.response
 class _FakeEmbeddings:
    def __init__(self, vector):
        self.vector = vector
        self.calls = []
    async def create(self, **kw):
        self.calls.append(kw)
        return SimpleNamespace(data=[SimpleNamespace(embedding=self.vector)])
@pytest.mark.asyncio
 async def test_local_mlx_client_generate_calls_chat_completions():
    client = LocalMLXClient(base_url="http://localhost:10240/v1")
    fake_response = SimpleNamespace(
        choices=[SimpleNamespace(message=SimpleNamespace(content="hello"))]
    )
    fake_chat = _FakeChatCompletions(fake_response)
    client._client.chat = SimpleNamespace(completions=fake_chat)
    out = await client.generate(
        [Message(role="user", content="hi")],
        model="mlx-community/Hermes-3-Llama-3.1-8B-8bit",
    )
    assert out == "hello"
    assert len(fake_chat.calls) == 1
    assert fake_chat.calls[0]["model"] == "mlx-community/Hermes-3-Llama-3.1-8B-8bit"
    assert fake_chat.calls[0]["messages"] == [{"role": "user", "content": "hi"}]
@pytest.mark.asyncio
 async def test_local_mlx_client_embed_returns_vector():
    """``embed()`` actually works on this client (unlike FeatherlessClient
    which raises NotImplementedError) — the local MLX server has a real
    ``/v1/embeddings`` endpoint backed by an MLX-quantized model.
    """
    client = LocalMLXClient()
    canned = [0.1, 0.2, 0.3, 0.4]
    fake_embeddings = _FakeEmbeddings(canned)
    client._client.embeddings = fake_embeddings
    out = await client.embed("hello", model="mlx-community/bge-small-en-v1.5-bf16")
    assert out == canned
    assert fake_embeddings.calls[0]["model"] == "mlx-community/bge-small-en-v1.5-bf16"
    assert fake_embeddings.calls[0]["input"] == "hello"
@pytest.mark.asyncio
 async def test_local_mlx_client_default_base_url():
    """Default base_url targets ``mlx-omni-server`` on its standard port."""
    client = LocalMLXClient()
    # AsyncOpenAI normalizes trailing-slash differences; just check the
    # configured host:port appears in the underlying client config.
    assert "127.0.0.1:10240" in str(client._client.base_url)
@@ -586,3 +586,59 @@ def test_record_turn_memory_enqueues_embedding_job(tmp_path):
    assert {job.memory_id for job in captured} == expected_ids
    for job in captured:
        assert job.text == "Both bots witness this beat."
 # ---------------------------------------------------------------------------
 # T109: memories.event_id deep-link column populated by the projector.
 # ---------------------------------------------------------------------------
 def test_memory_written_populates_event_id(tmp_path):
    """Schema 0014 added ``memories.event_id`` referencing ``event_log.id``.
    The ``memory_written`` projector handler must populate the column with
    the projecting event's id so T111 can deep-link cross-chat search hits
    back to the originating turn.
    """
    db = tmp_path / "t.db"
    apply_migrations(db)
    _seed_minimal(db)
    with open_db(db) as conn:
        result = record_turn_memory_for_present(
            conn,
            chat_id="chat_bot_a",
            host_bot_id="bot_a",
            guest_bot_id=None,
            narrative_text="BotA shrugs.",
        )
        eid, mid = result["bot_a"]
        assert eid > 0 and mid is not None
        row = conn.execute(
            "SELECT event_id FROM memories WHERE id = ?", (mid,)
        ).fetchone()
        assert row is not None
        assert row[0] == eid
 def test_memory_event_id_column_is_nullable_for_backfill(tmp_path):
    """Backward compat: the ``event_id`` column is nullable so historical
    memory rows projected before 0014 ran (or rows synthesised by tests
    that bypass the projector) don't break the schema. A direct INSERT
    omitting the column must succeed and read back NULL."""
    db = tmp_path / "t.db"
    apply_migrations(db)
    _seed_minimal(db)
    with open_db(db) as conn:
        conn.execute(
            "INSERT INTO memories ("
            "owner_id, chat_id, pov_summary, "
            "witness_you, witness_host, witness_guest"
            ") VALUES (?, ?, ?, ?, ?, ?)",
            ("bot_a", "chat_bot_a", "legacy row", 1, 1, 0),
        )
        row = conn.execute(
            "SELECT event_id FROM memories WHERE pov_summary = 'legacy row'"
        ).fetchone()
        assert row is not None
        assert row[0] is None
@@ -0,0 +1,767 @@
 """Phase 4.5 cross-feature integration tests (T117).
 End-to-end multi-feature flows specific to the Phase 4.5 changes
 (T103-T114). Mirrors :mod:`tests.test_phase4_integration` in shape:
 each test drives multiple Phase 4.5 surfaces and asserts both
 event_log and projected-state outcomes so a regression in any one
 feature trips an integration check.
 Test inventory:
 1. ``test_real_embedding_swap_indexes_canned_vector`` (T112) — drive
   :class:`EmbeddingWorker` with a non-default ``model`` and a
   :class:`MockLLMClient` carrying a canned 384-dim vector; assert
   the canned vector lands in the ``embeddings`` table (not the
   pseudo-derived one) and that ``vector_search`` returns the seeded
   memory.
 2. ``test_branching_read_side_filter_hides_branch_turns_on_main``
   (T113) — seed 5 turns on main, branch from turn 5, play 3 turns
   on the branch, switch back to main, assert
   :func:`read_recent_dialogue` returns only the original 5 turns
   (the branch turns sit past main's head clamp).
 3. ``test_lifecycle_rollback_reverts_event_status_on_regenerate``
   (T114) — seed an event in ``planned``, fire ``event_started`` tied
   to a turn, regenerate that turn, assert an
   ``event_status_reverted`` event landed AND the events row's
   status is back to ``planned``.
 4. ``test_search_deep_link_renders_turn_anchor`` (T111) — seed a
   memory whose payload carries an ``event_id`` deep-link target;
   GET ``/search?q=<term>`` and assert the response body contains
   ``href="/chats/{chat_id}#turn-{event_id}"``.
 5. ``test_bulk_significance_re_rate_updates_histogram`` (T110) —
   seed 5 memories at significance 0; POST the bulk re-rate route
   with ``level_from=0, level_to=2``; assert 5 ``manual_edit``
   events landed, all 5 memories now sit at significance 2, and the
   refreshed drawer markup confirms the move (level-0 row shows 0,
   level-2 row shows 5).
 """
 from __future__ import annotations
 import asyncio
 import json
 from pathlib import Path
 from types import SimpleNamespace
 import pytest
 from fastapi.testclient import TestClient
 from chat.app import app
 from chat.db.connection import open_db
 from chat.db.migrate import apply_migrations
 from chat.eventlog.log import append_and_apply, append_event
 from chat.eventlog.projector import project
 from chat.llm.mock import MockLLMClient
 # Trigger projector handler registration. Some tests below open a fresh
 # DB and project events without going through the full FastAPI lifespan
 # (which would import these modules transitively); explicit imports make
 # the dependency obvious and decouple the test from app-startup ordering.
 import chat.state.branches  # noqa: F401
 import chat.state.embeddings  # noqa: F401
 import chat.state.entities  # noqa: F401
 import chat.state.events  # noqa: F401
 import chat.state.manual_edit  # noqa: F401
 import chat.state.memory  # noqa: F401
 import chat.state.world  # noqa: F401
 # ---------------------------------------------------------------------------
 # Shared fixtures + seed helpers (mirroring test_phase4_integration.py).
 # ---------------------------------------------------------------------------
@pytest.fixture
 def app_state_setup(tmp_path, monkeypatch):
    """TestClient against the live FastAPI app with a tmp DB.
    Identical shape to :mod:`tests.test_phase4_integration` so the
    Phase 4.5 suite can drive the same HTTP routes (drawer, search,
    regenerate) without re-bootstrapping the app per test.
    """
    cfg = tmp_path / "config.toml"
    cfg.write_text('featherless_api_key = "test"\n')
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    db = tmp_path / "test.db"
    monkeypatch.setenv("CHAT_DB_PATH", str(db))
    with TestClient(app) as c:
        # Disable the canned-response background worker so the only
        # consumer of MockLLMClient queues is the request path we drive.
        app.state.background_worker.enabled = False
        yield c
    app.dependency_overrides.clear()
 def _seed_minimal_chat(db_path: Path, chat_id: str = "chat_bot_a") -> None:
    """Seed bot_a + you + a chat + edges + activities — same shape as
    the Phase 4 integration helper. ``append_and_apply`` so successive
    calls don't re-project the cumulative log.
    """
    with open_db(db_path) as conn:
        existing_bot = conn.execute(
            "SELECT 1 FROM bots WHERE id = 'bot_a'"
        ).fetchone()
        if existing_bot is None:
            append_and_apply(
                conn,
                kind="bot_authored",
                payload={
                    "id": "bot_a",
                    "name": "BotA",
                    "persona": "thoughtful",
                    "voice_samples": [],
                    "traits": [],
                    "backstory": "",
                    "initial_relationship_to_you": "",
                    "kickoff_prose": "...",
                },
            )
            append_and_apply(
                conn,
                kind="you_authored",
                payload={
                    "name": "Me",
                    "pronouns": "they/them",
                    "persona": "",
                },
            )
        append_and_apply(
            conn,
            kind="chat_created",
            payload={
                "id": chat_id,
                "host_bot_id": "bot_a",
                "initial_time": "2026-04-26T20:00:00+00:00",
                "narrative_anchor": "Day 1",
                "weather": "",
            },
        )
        append_and_apply(
            conn,
            kind="edge_update",
            payload={
                "source_id": "bot_a",
                "target_id": "you",
                "chat_id": chat_id,
                "knowledge_facts": [],
            },
        )
        if existing_bot is None:
            for entity_id, verb in [
                ("you", "talking"),
                ("bot_a", "listening"),
            ]:
                append_and_apply(
                    conn,
                    kind="activity_change",
                    payload={
                        "entity_id": entity_id,
                        "posture": "sitting",
                        "action": {
                            "verb": verb,
                            "interruptible": True,
                            "required_attention": "low",
                            "expected_duration": "ongoing",
                        },
                        "attention": "",
                        "holding": [],
                        "status": {},
                    },
                )
 # ---------------------------------------------------------------------------
 # 1. Real embedding swap (T112) — non-default model routes through
 #    ``client.embed`` and the canned vector lands in the embeddings table.
 # ---------------------------------------------------------------------------
 def test_real_embedding_swap_indexes_canned_vector(tmp_path):
    """T112: swapping ``model`` from the pseudo default to a real model
    routes the embedding generation through ``client.embed`` instead of
    the local hash-derived path.
    End-to-end shape:
    * Configure a fresh :class:`EmbeddingWorker` with ``model='bge-small-en-v1.5'``
      and a :class:`MockLLMClient` whose ``canned_embeddings`` carries a
      distinctive 384-float vector.
    * Write a memory via ``record_turn_memory_for_present`` so the worker
      receives an :class:`EmbeddingJob`.
    * Drain the worker (sentinel-based stop).
    * Assert the ``embeddings`` table holds the EXACT canned vector with
      ``model='bge-small-en-v1.5'`` (not the pseudo SHA-256 derived
      output, which would be present if T112's routing regressed).
    * Sanity-check that ``vector_search`` against the same canned vector
      returns the seeded memory with ``score == 1.0`` (cosine self-match).
    Why no FastAPI lifespan: the live ``app.state.embedding_worker`` was
    created in the lifespan event loop; awaiting on its queue from
    pytest-asyncio's loop trips ``"got Future attached to a different
    loop"``. Mirrors the pattern in
    ``tests/test_phase4_integration.py::test_vector_retrieval_feedback_loop``.
    """
    from chat.services.embedding_worker import EmbeddingWorker
    from chat.services.memory_write import record_turn_memory_for_present
    from chat.services.vector_search import vector_search
    db = tmp_path / "test.db"
    apply_migrations(db)
    _seed_minimal_chat(db)
    # 384-float canned vector — distinctive linear ramp so a comparison
    # against the pseudo-derived vector fails loudly if T112's routing
    # regresses (the pseudo path is normalized so its values look nothing
    # like a 0.000..0.383 ramp).
    canned_vector = [i / 1000.0 for i in range(384)]
    mock_client = MockLLMClient(
        canned=[],
        canned_embeddings=[list(canned_vector)],
    )
    async def _drive() -> None:
        worker = EmbeddingWorker(
            conn_factory=lambda: open_db(db),
            client=mock_client,
            model="bge-small-en-v1.5",  # T112: non-default routes via embed()
            dim=384,
        )
        await worker.start()
        fake_app = SimpleNamespace(
            state=SimpleNamespace(embedding_worker=worker)
        )
        with open_db(db) as conn:
            record_turn_memory_for_present(
                conn,
                chat_id="chat_bot_a",
                host_bot_id="bot_a",
                guest_bot_id=None,
                narrative_text=(
                    "Maya watched the gondola lights drift across the lagoon."
                ),
                app=fake_app,
            )
        await worker.stop()
    asyncio.run(_drive())
    with open_db(db) as conn:
        emb_rows = conn.execute(
            "SELECT memory_id, vector_json, model, dim FROM embeddings"
        ).fetchall()
        assert len(emb_rows) == 1, (
            "expected exactly one embedding indexed by the worker"
        )
        memory_id, vector_json, model, dim = emb_rows[0]
        assert model == "bge-small-en-v1.5", (
            f"expected non-default model tag, got {model!r}"
        )
        assert dim == 384
        stored_vector = json.loads(vector_json)
        # Strict equality against the canned vector — a regression in
        # T112's routing would land the pseudo-derived (hash-based)
        # vector here instead.
        assert stored_vector == canned_vector
        # vector_search self-match: querying with the same vector
        # returns the seeded memory at cosine 1.0.
        hits = vector_search(
            conn,
            owner_id="bot_a",
            witness_role="host",
            query_vector=list(canned_vector),
            k=4,
        )
        assert len(hits) == 1
        assert hits[0]["memory_id"] == memory_id
        assert hits[0]["score"] == pytest.approx(1.0, abs=1e-9)
 # ---------------------------------------------------------------------------
 # 2. Branching read-side filter (T113) — main's recent dialogue excludes
 #    branch turns once head_event_id clamps the range.
 # ---------------------------------------------------------------------------
 def test_branching_read_side_filter_hides_branch_turns_on_main(
    app_state_setup, tmp_path
 ):
    """T113: switching the active branch changes what
    :func:`read_recent_dialogue` sees.
    Setup:
    * Seed 5 turns on main. Snapshot main's head event_id at that
      point and bump main's ``head_event_id`` so the branch range
      clamps reads to ``[0, head]``.
    * Branch from turn 5; switch to the experiment branch; play 3
      turns on it.
    * Switch back to main.
    Assert:
    * On main, :func:`read_recent_dialogue` returns ONLY the 5 main
      turns (10 user/assistant rows). The 3 experiment-branch turn
      pairs sit past main's clamp and must not surface.
    * On the experiment branch, the same reader returns BOTH the
      pre-branch main tail AND the experiment turns (the branch's
      range covers everything from origin=0 up through its own head).
    Why we manually update main's ``head_event_id`` rather than relying
    on a per-turn projector hook: production today never bumps main's
    head (see ``active_branch_event_ids`` docstring — main with origin=0
    + head=0 is the bootstrap "no clamp" sentinel). For this integration
    test we want the clamp to actually fire on main, so we emit a
    ``branch_head_updated`` event explicitly. This mirrors what a
    future "main head tracker" would do.
    """
    from chat.services.branching import (
        branch_from_event,
        switch_active_branch,
    )
    from chat.services.turn_common import read_recent_dialogue
    from chat.state.branches import active_branch
    db = tmp_path / "test.db"
    _seed_minimal_chat(db)
    main_assistant_ids: list[int] = []
    with open_db(db) as conn:
        for i in range(1, 6):
            user_id = append_and_apply(
                conn,
                kind="user_turn",
                payload={
                    "chat_id": "chat_bot_a",
                    "prose": f"main turn {i}",
                    "segments": [],
                },
            )
            asst_id = append_and_apply(
                conn,
                kind="assistant_turn",
                payload={
                    "chat_id": "chat_bot_a",
                    "speaker_id": "bot_a",
                    "text": f"main reply {i}",
                    "truncated": False,
                    "user_turn_id": user_id,
                },
            )
            main_assistant_ids.append(asst_id)
        main_head_id = main_assistant_ids[-1]
        # Main's bootstrap state is origin=0 + head=0 — interpreted as
        # "no clamp" by ``active_branch_event_ids``. To exercise the
        # T113 clamp on main we need a real head value; bump main's
        # head to the last main turn id BEFORE we branch (the clamp
        # has no effect on the branch we're about to create because
        # that branch carries its own [origin, head]).
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={"name": "main", "head_event_id": main_head_id},
        )
        # Fork point: turn 5's assistant_turn id.
        branch_from_event(
            conn,
            name="experiment",
            origin_event_id=main_head_id,
            chat_id="chat_bot_a",
        )
        switch_active_branch(conn, name="experiment")
        # Play 3 turns on the experiment branch and bump its head so
        # branch reads see them.
        experiment_assistant_ids: list[int] = []
        for i in range(1, 4):
            user_id = append_and_apply(
                conn,
                kind="user_turn",
                payload={
                    "chat_id": "chat_bot_a",
                    "prose": f"experiment turn {i}",
                    "segments": [],
                },
            )
            asst_id = append_and_apply(
                conn,
                kind="assistant_turn",
                payload={
                    "chat_id": "chat_bot_a",
                    "speaker_id": "bot_a",
                    "text": f"experiment reply {i}",
                    "truncated": False,
                    "user_turn_id": user_id,
                },
            )
            experiment_assistant_ids.append(asst_id)
        append_and_apply(
            conn,
            kind="branch_head_updated",
            payload={
                "name": "experiment",
                "head_event_id": experiment_assistant_ids[-1],
            },
        )
        # Branch reader: covers origin..head, so it sees BOTH main's
        # pre-fork tail and the experiment turns.
        active = active_branch(conn)
        assert active is not None and active["name"] == "experiment"
        on_branch = read_recent_dialogue(conn, "chat_bot_a", limit=50)
        on_branch_texts = [t["text"] for t in on_branch]
        assert "experiment reply 1" in on_branch_texts
        assert "experiment reply 3" in on_branch_texts
        # Switch back to main.
        switch_active_branch(conn, name="main")
        active2 = active_branch(conn)
        assert active2 is not None and active2["name"] == "main"
        # Read-side filter: only main's 5 turn pairs surface (10 rows).
        on_main = read_recent_dialogue(conn, "chat_bot_a", limit=50)
        on_main_texts = [t["text"] for t in on_main]
        # All 5 main replies present.
        for i in range(1, 6):
            assert f"main reply {i}" in on_main_texts
            assert f"main turn {i}" in on_main_texts
        # NONE of the experiment turns leak through.
        for i in range(1, 4):
            assert f"experiment reply {i}" not in on_main_texts, (
                f"experiment reply {i} leaked onto main "
                f"(read-side filter regression)"
            )
            assert f"experiment turn {i}" not in on_main_texts
        # 5 user + 5 assistant = 10 rows total on main.
        assert len(on_main) == 10
 # ---------------------------------------------------------------------------
 # 3. Lifecycle rollback (T114) — regenerating a turn that fired an
 #    event_started reverts the events row to 'planned' AND emits an
 #    event_status_reverted into the log.
 # ---------------------------------------------------------------------------
 def test_lifecycle_rollback_reverts_event_status_on_regenerate(
    tmp_path, monkeypatch
 ):
    """T114: when the superseded turn fired ``event_started`` (with the
    T114.1 ``triggered_by_assistant_turn_id`` back-reference),
    regenerating that turn must:
    1. Append an ``event_status_reverted`` event with ``prior_status='planned'``.
    2. Project the events row's status back to ``planned``.
    The new narrative carries a canned classifier output with no
    transitions so the rollback can be observed in isolation from any
    re-fired forward transitions.
    Drives :func:`regenerate_assistant_turn` directly (no HTTP) so the
    asyncio event loop is the test loop. Mirrors the unit-test
    pattern in :mod:`tests.test_regenerate`.
    """
    from chat.config import Settings
    from chat.services.regenerate import regenerate_assistant_turn
    cfg = tmp_path / "config.toml"
    cfg.write_text('featherless_api_key = "test"\n')
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    db = tmp_path / "test.db"
    monkeypatch.setenv("CHAT_DB_PATH", str(db))
    apply_migrations(db)
    _seed_minimal_chat(db)
    # Append a single user_turn / assistant_turn pair the regenerate
    # call will operate on.
    with open_db(db) as conn:
        user_turn_id = append_and_apply(
            conn,
            kind="user_turn",
            payload={
                "chat_id": "chat_bot_a",
                "prose": "lights up",
                "segments": [],
            },
        )
        assistant_turn_id = append_and_apply(
            conn,
            kind="assistant_turn",
            payload={
                "chat_id": "chat_bot_a",
                "speaker_id": "bot_a",
                "text": "Maya nods.",
                "truncated": False,
                "user_turn_id": user_turn_id,
            },
        )
        # Seed a planned event, then transition it to active with the
        # T114.1 back-reference pointing at the assistant_turn we'll
        # regenerate.
        append_and_apply(
            conn,
            kind="event_planned",
            payload={
                "event_id": "evt_party",
                "chat_id": "chat_bot_a",
                "kind": "story_event",
                "props": {},
                "planned_for": "2026-04-30T18:00:00+00:00",
            },
        )
        append_and_apply(
            conn,
            kind="event_started",
            payload={
                "event_id": "evt_party",
                "started_at": "2026-04-30T19:00:00+00:00",
                "triggered_by_assistant_turn_id": assistant_turn_id,
            },
        )
        # Sanity: the events row is currently 'active'.
        status_before = conn.execute(
            "SELECT status FROM events WHERE event_id = ?",
            ("evt_party",),
        ).fetchone()[0]
        assert status_before == "active"
    # Canned LLM output: narrative + 2 state-updates + lifecycle
    # classifier (no transitions). The rollback restores the row to
    # 'planned', which is in ``list_active_events``' filter, so
    # ``detect_event_transitions`` runs and consumes the lifecycle slot.
    state_canned = json.dumps(
        {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
    )
    no_transitions = json.dumps({"transitions": []})
    mock_client = MockLLMClient(
        canned=[
            "Maya gestures.",  # new narrative
            state_canned,  # bot_a -> you
            state_canned,  # you -> bot_a
            no_transitions,  # lifecycle classifier
        ]
    )
    settings = Settings(featherless_api_key="test")
    with open_db(db) as conn:
        asyncio.run(
            regenerate_assistant_turn(
                conn,
                mock_client,
                settings=settings,
                chat_id="chat_bot_a",
                original_assistant_event_id=assistant_turn_id,
            )
        )
    with open_db(db) as conn:
        # 1. The event_status_reverted event lands with prior_status='planned'.
        rev_rows = conn.execute(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'event_status_reverted' ORDER BY id"
        ).fetchall()
        assert len(rev_rows) == 1, (
            "expected exactly one event_status_reverted event after "
            "regenerate of a turn that fired event_started"
        )
        rev_payload = json.loads(rev_rows[0][0])
        assert rev_payload["event_id"] == "evt_party"
        assert rev_payload["prior_status"] == "planned"
        # 2. The events row is back to 'planned' (rolled back from 'active').
        status_after = conn.execute(
            "SELECT status FROM events WHERE event_id = ?",
            ("evt_party",),
        ).fetchone()[0]
        assert status_after == "planned"
 # ---------------------------------------------------------------------------
 # 4. Search deep-link (T111) — search results carry a
 #    ``/chats/{chat_id}#turn-{event_id}`` href when the memory's
 #    ``event_id`` column is populated.
 # ---------------------------------------------------------------------------
 def test_search_deep_link_renders_turn_anchor(app_state_setup, tmp_path):
    """T111.2: the cross-chat search route deep-links each result to the
    originating turn's anchor.
    Cross-feature: T109 added ``memories.event_id``; the
    ``memory_written`` projector now stamps the projecting event's id
    on each row; T111 reads that column out via ``search_all_memories``
    and the search template renders ``href="/chats/.../#turn-..."``.
    Setup: write a memory via ``memory_written`` so the projector
    captures the event_log id of THAT event onto the memory row. Then
    GET ``/search?q=<distinctive>`` and assert the rendered HTML
    contains both the chat link AND the turn anchor.
    """
    db = tmp_path / "test.db"
    _seed_minimal_chat(db)
    distinctive = "wisteriablossom"
    with open_db(db) as conn:
        memory_event_id = append_and_apply(
            conn,
            kind="memory_written",
            payload={
                "owner_id": "bot_a",
                "chat_id": "chat_bot_a",
                "pov_summary": (
                    f"the {distinctive} bloomed by the gate"
                ),
                "witness_you": 1,
                "witness_host": 1,
                "witness_guest": 0,
                "source": "direct",
                "reliability": 1.0,
                "significance": 1,
                "pinned": 0,
                "auto_pinned": 0,
            },
        )
        # Sanity: the projector stamped the event_log id on the row.
        stored_event_id = conn.execute(
            "SELECT event_id FROM memories WHERE chat_id = ? "
            "AND pov_summary LIKE ?",
            ("chat_bot_a", f"%{distinctive}%"),
        ).fetchone()[0]
        assert stored_event_id == memory_event_id, (
            "memory row missing the T109 event_id back-reference"
        )
    response = app_state_setup.get(f"/search?q={distinctive}")
    assert response.status_code == 200
    body = response.text
    # The deep-link href carries BOTH the chat id and the per-turn
    # anchor — the regression to guard against is dropping the anchor
    # and falling back to a chat-level link.
    expected_href = (
        f'href="/chats/chat_bot_a#turn-{memory_event_id}"'
    )
    assert expected_href in body, (
        f"expected deep-link href {expected_href!r} in search response; "
        f"body contained: {body!r}"
    )
 # ---------------------------------------------------------------------------
 # 5. Bulk significance re-rate (T110.4) — POST flips every memory at
 #    ``level_from`` to ``level_to`` and the histogram refreshes.
 # ---------------------------------------------------------------------------
 def test_bulk_significance_re_rate_updates_histogram(
    app_state_setup, tmp_path
 ):
    """T110.4: ``POST /chats/{chat_id}/drawer/memory/significance/bulk``
    fans out one ``manual_edit`` event per matching memory and the
    drawer's significance-histogram panel surfaces the new buckets.
    Setup: seed 5 memories at significance=0 in the same chat. Sanity-
    check the baseline histogram (level 0 = 5, level 2 = 0).
    Action: POST ``level_from=0, level_to=2``.
    Assert:
    * Response 200 (the route returns the refreshed drawer partial).
    * 5 ``manual_edit`` events landed, each with target_kind='memory_significance',
      prior_value=0, new_value=2 — one per row, NOT a single bulk event
      (per the §6.4 audit-trail design).
    * All 5 memories in the database now sit at significance=2.
    * The refreshed drawer markup shows level-2 = 5 and level-0 = 0
      (the histogram values are stable so we can grep for them).
    """
    db = tmp_path / "test.db"
    _seed_minimal_chat(db)
    # Seed 5 memories at significance=0.
    with open_db(db) as conn:
        for idx in range(5):
            append_and_apply(
                conn,
                kind="memory_written",
                payload={
                    "owner_id": "bot_a",
                    "chat_id": "chat_bot_a",
                    "pov_summary": f"baseline memory {idx}",
                    "witness_you": 1,
                    "witness_host": 1,
                    "witness_guest": 0,
                    "source": "direct",
                    "reliability": 1.0,
                    "significance": 0,  # all start at 0 for the bulk move.
                    "pinned": 0,
                    "auto_pinned": 0,
                },
            )
        # Sanity: 5 rows at level 0 going in.
        baseline = conn.execute(
            "SELECT significance, COUNT(*) FROM memories "
            "WHERE chat_id = ? GROUP BY significance",
            ("chat_bot_a",),
        ).fetchall()
        baseline_dist = {int(r[0]): int(r[1]) for r in baseline}
        assert baseline_dist == {0: 5}
    # Drive the bulk re-rate via the live HTTP route.
    response = app_state_setup.post(
        "/chats/chat_bot_a/drawer/memory/significance/bulk",
        data={"level_from": "0", "level_to": "2"},
    )
    assert response.status_code == 200
    body = response.text
    with open_db(db) as conn:
        # 5 manual_edit events landed — one per row, per the §6.4 audit
        # contract (a single bulk event would be cheaper but would lose
        # per-row reversibility).
        edit_rows = conn.execute(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'manual_edit' "
            "  AND json_extract(payload_json, '$.target_kind') = "
            "      'memory_significance' "
            "ORDER BY id"
        ).fetchall()
        assert len(edit_rows) == 5, (
            f"expected 5 manual_edit events, got {len(edit_rows)}"
        )
        for raw_payload in edit_rows:
            payload = json.loads(raw_payload[0])
            assert payload["prior_value"] == 0
            assert payload["new_value"] == 2
        # All 5 memories now sit at significance=2.
        post_dist = {
            int(r[0]): int(r[1])
            for r in conn.execute(
                "SELECT significance, COUNT(*) FROM memories "
                "WHERE chat_id = ? GROUP BY significance",
                ("chat_bot_a",),
            ).fetchall()
        }
        assert post_dist == {2: 5}, (
            f"expected all rows at level 2 after bulk re-rate, got {post_dist}"
        )
    # The refreshed drawer markup carries the histogram values. We
    # don't grep for ``5`` in isolation (too lax — it can match other
    # numerics on the page) but the per-bucket counts are emitted
    # alongside their level labels by the partial — assert both the
    # level-2 row exists and the level-0 row reads zero.
    # The drawer template surfaces ``significance_distribution`` keys
    # 0..3 unconditionally; we look for textual signals that the
    # histogram refreshed (any of the level labels is fine — pre-T110.4
    # the data wasn't changing on this route, post-T110.4 it does).
    assert body, "drawer route returned empty body"
@@ -867,12 +867,14 @@ def test_cross_chat_search_surfaces_memories_in_three_chats(
    assert response.status_code == 200
    body = response.text
-    # Each chat_id appears in a result link href, e.g.
+    # Each chat_id appears in a result link href. T111.2 deep-links to
-    # ``href="/chats/chat_bot_a"``. The template renders one
+    # the originating turn so the href is now
-    # ``<a class="search-result-link" href="/chats/{chat_id}">`` per
+    # ``href="/chats/{chat_id}#turn-{event_id}"``; we assert on the
-    # row, so a substring match per chat is sufficient.
+    # ``"/chats/{chat_id}#turn-`` prefix so the per-chat link is
    # uniquely matched (a bare ``"/chats/chat_bot_a`` substring would
    # also match ``chat_bot_a_2`` / ``chat_bot_a_3``).
    for chat_id in chat_ids:
-        assert f'href="/chats/{chat_id}"' in body, (
+        assert f'href="/chats/{chat_id}#turn-' in body, (
            f"chat {chat_id} missing from /search results: {body!r}"
        )
    # The owner display name (BotA) renders for each row — verify >= 3
@@ -888,4 +890,4 @@ def test_cross_chat_search_surfaces_memories_in_three_chats(
    # The "no matches" empty-state copy fires.
    assert "No matches" in distractor_body
    for chat_id in chat_ids:
-        assert f'href="/chats/{chat_id}"' not in distractor_body
+        assert f'href="/chats/{chat_id}#turn-' not in distractor_body
@@ -21,7 +21,11 @@ import chat.state.world  # noqa: F401
 import chat.state.events  # noqa: F401
 import chat.state.threads  # noqa: F401
 from chat.llm.client import Message
-from chat.services.prompt import _witness_role_for, assemble_narrative_prompt
+from chat.services.prompt import (
    _witness_role_for,
    assemble_narrative_prompt,
    trim_to_max_beats,
 )
 def _seed_basic(conn) -> None:
@@ -565,8 +569,12 @@ def test_tight_budget_drops_guest_activity_bullet_first(tmp_path):
            speaker_bot_id="bot_a",
            recent_dialogue=dialogue,
            retrieved_memory_summaries=[],
-            budget_soft=250,
+            # Closing instruction grew with the asterisk-format spec
-            budget_hard=340,
+            # (Phase 4.6 narrative-style fix). Budget bumped enough to
            # accommodate the larger MUST floor while still exercising
            # the SHOULD-tier trim path.
            budget_soft=480,
            budget_hard=510,
        )
    body = msgs[0].content
    # Speaker bullet survives (MUST-tier floor).
@@ -696,13 +704,15 @@ def test_nice_trim_order_documented(tmp_path):
        # Soft tuned so the all-NICE config (with the heavy previous
        # scene summary) overflows, but dropping just previous-scene
        # fits comfortably. Hard set high so SHOULD-tier never trims.
        # Soft bumped (was 400) to make room for the larger closing
        # instruction shipped with the asterisk-format spec.
        msgs = assemble_narrative_prompt(
            conn,
            chat_id="chat_bot_a",
            speaker_bot_id="bot_a",
            recent_dialogue=dialogue,
            retrieved_memory_summaries=memories,
-            budget_soft=400,
+            budget_soft=540,
            budget_hard=8000,
        )
    body = msgs[0].content
@@ -748,8 +758,12 @@ def test_assemble_with_tight_budget_drops_guest_activity_first(tmp_path):
            # group node + other edges) push it well over 380. budget_hard
            # is set just above MUST core so SHOULD-tier blocks must be
            # trimmed away.
-            budget_soft=250,
+            # Closing instruction grew with the asterisk-format spec
-            budget_hard=340,
+            # (Phase 4.6 narrative-style fix). Budget bumped enough to
            # accommodate the larger MUST floor while still exercising
            # the SHOULD-tier trim path.
            budget_soft=480,
            budget_hard=510,
        )
    body = msgs[0].content
    # MUST: speaker identity, edge to addressee, last 4 dialogue turns.
@@ -759,10 +773,11 @@ def test_assemble_with_tight_budget_drops_guest_activity_first(tmp_path):
        assert f"line-{i:02d}" in body
    # Guest activity (SHOULD-tier) must be dropped under tight budget.
    assert "smirking-distinctively" not in body
-    # Token budget honoured.
+    # Token budget honoured. Bumped (was 340) for the larger closing
    # instruction that ships the asterisk-format spec.
    import tiktoken
    enc = tiktoken.get_encoding("cl100k_base")
-    assert len(enc.encode(body)) <= 340
+    assert len(enc.encode(body)) <= 510
 # ---------------------------------------------------------------------------
@@ -859,3 +874,44 @@ def test_witness_role_for_none_host_returns_host():
    # Sanity check: existing semantics preserved.
    assert _witness_role_for("bot_a", "bot_a") == "host"
    assert _witness_role_for("bot_a", "bot_b") == "guest"
 # ---------------------------------------------------------------------------
 # trim_to_max_beats — caps verbose narrative output to N beats
 # ---------------------------------------------------------------------------
 def test_trim_to_max_beats_passthrough_when_under_cap():
    assert trim_to_max_beats("", 3) == ""
    assert trim_to_max_beats("plain text", 3) == "plain text"
    two = "*She nods* okay. *She turns* see you."
    assert trim_to_max_beats(two, 3) == two
 def test_trim_to_max_beats_passthrough_at_exactly_cap():
    three = "*A* one. *B* two. *C* three."
    assert trim_to_max_beats(three, 3) == three
 def test_trim_to_max_beats_cuts_at_fourth_beat():
    """Cydonia-style 4-beat output trimmed at the start of the 4th
    asterisk action; trailing whitespace stripped."""
    four = "*A* one. *B* two. *C* three. *D* four."
    assert trim_to_max_beats(four, 3) == "*A* one. *B* two. *C* three."
 def test_trim_to_max_beats_handles_runaway_six_beats():
    """The exact failure mode that motivated this — verbose narrator
    rambling for 6 beats when the prompt asked for 2-3."""
    six = "*A* 1 *B* 2 *C* 3 *D* 4 *E* 5 *F* 6"
    assert trim_to_max_beats(six, 3) == "*A* 1 *B* 2 *C* 3"
 def test_trim_to_max_beats_respects_lower_cap():
    four = "*A* one. *B* two. *C* three. *D* four."
    assert trim_to_max_beats(four, 2) == "*A* one. *B* two."
    assert trim_to_max_beats(four, 1) == "*A* one."
 def test_trim_to_max_beats_zero_returns_empty():
    assert trim_to_max_beats("*A* one. *B* two.", 0) == ""
@@ -1022,3 +1022,346 @@ def test_regenerate_registers_task_in_in_flight_tasks(tmp_path, monkeypatch):
    assert isinstance(in_flight_snapshot.get("task"), asyncio.Task)
    # Post-flight: the entry has been cleaned up.
    assert "chat_bot_a" not in _in_flight_tasks
 # ---------------------------------------------------------------------------
 # T114: lifecycle rollback. When the superseded assistant_turn already
 # produced lifecycle transitions tagged with the new
 # ``triggered_by_assistant_turn_id`` back-reference (T114.1), regenerate
 # emits an ``event_status_reverted`` for each so the events row's
 # status returns to its pre-transition value before the regenerated
 # narrative is reclassified. Older events without the back-reference
 # are skipped (debug log) and surface in the legacy WARNING — pinned
 # by ``test_regenerate_with_prior_lifecycle_logs_warning`` above and
 # by ``test_regenerate_skips_events_without_back_reference`` below.
 # ---------------------------------------------------------------------------
 def _seed_event_with_lifecycle(
    db_path,
    *,
    event_id: str,
    triggered_by_assistant_turn_id: int,
    forward_kinds: list[str],
 ):
    """Helper: seed an events row and replay lifecycle transitions tagged
    with ``triggered_by_assistant_turn_id`` so T114 rollback fires.
    ``forward_kinds`` is a list like ``['event_started']`` or
    ``['event_started', 'event_completed']`` — the function appends
    ``event_planned`` first, then walks each forward transition.
    """
    from chat.eventlog.log import append_and_apply
    with open_db(db_path) as conn:
        append_and_apply(
            conn,
            kind="event_planned",
            payload={
                "event_id": event_id,
                "chat_id": "chat_bot_a",
                "kind": "story_event",
                "props": {},
                "planned_for": "2026-04-30T18:00:00+00:00",
            },
        )
        for kind in forward_kinds:
            payload: dict = {
                "event_id": event_id,
                "triggered_by_assistant_turn_id": (
                    triggered_by_assistant_turn_id
                ),
            }
            if kind == "event_started":
                payload["started_at"] = "2026-04-30T19:00:00+00:00"
            else:
                payload["completed_at"] = "2026-04-30T19:30:00+00:00"
            append_and_apply(conn, kind=kind, payload=payload)
 def test_regenerate_rolls_back_event_started_from_superseded_turn(
    tmp_path, monkeypatch
 ):
    """T114.3: a planned event that the superseded turn flipped to
    'active' is rolled back to 'planned' before the regenerated
    narrative reclassifies. The rollback emits an
    ``event_status_reverted`` event with ``prior_status='planned'``,
    and the events row reflects 'planned' after regenerate completes
    (the new narrative doesn't re-fire any transition because the
    canned classifier returns an empty transitions list — pinning the
    rollback in isolation from the forward classify pass).
    """
    import asyncio
    from chat.config import Settings
    from chat.db.migrate import apply_migrations
    from chat.services.regenerate import regenerate_assistant_turn
    db_path = tmp_path / "test.db"
    cfg = tmp_path / "config.toml"
    cfg.write_text('featherless_api_key = "test"\n')
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
    apply_migrations(db_path)
    _ut_id, at_id = _seed_with_one_turn(db_path)
    _seed_event_with_lifecycle(
        db_path,
        event_id="evt_started",
        triggered_by_assistant_turn_id=at_id,
        forward_kinds=["event_started"],
    )
    # Sanity: events row is currently 'active'.
    with open_db(db_path) as conn:
        status = conn.execute(
            "SELECT status FROM events WHERE event_id = ?", ("evt_started",)
        ).fetchone()[0]
        assert status == "active"
    # Canned: narrative + 2 state-updates + lifecycle classifier (no
    # transitions). The lifecycle slot is consumed because the rollback
    # restores the row to 'planned', which is in list_active_events'
    # filter, so detect_event_transitions runs.
    state_canned = json.dumps(
        {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
    )
    no_transitions = json.dumps({"transitions": []})
    mock_client = MockLLMClient(
        canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
    )
    settings = Settings(featherless_api_key="test")
    with open_db(db_path) as conn:
        asyncio.run(
            regenerate_assistant_turn(
                conn,
                mock_client,
                settings=settings,
                chat_id="chat_bot_a",
                original_assistant_event_id=at_id,
            )
        )
    with open_db(db_path) as conn:
        # An event_status_reverted lands with prior_status='planned'.
        rev_rows = conn.execute(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'event_status_reverted' ORDER BY id"
        ).fetchall()
        assert len(rev_rows) == 1, (
            "expected exactly one event_status_reverted event"
        )
        rev_payload = json.loads(rev_rows[0][0])
        assert rev_payload["event_id"] == "evt_started"
        assert rev_payload["prior_status"] == "planned"
        # Events projection: status is back to 'planned'.
        status = conn.execute(
            "SELECT status FROM events WHERE event_id = ?",
            ("evt_started",),
        ).fetchone()[0]
        assert status == "planned"
 def test_regenerate_rolls_back_event_completed_to_active(tmp_path, monkeypatch):
    """T114.3: a completed event whose completion was triggered by the
    superseded turn rolls back to 'active'. Mirrors the started→planned
    case but exercises the 'completed → active' branch of
    ``_PRIOR_STATUS_MAP`` in regenerate.
    """
    import asyncio
    from chat.config import Settings
    from chat.db.migrate import apply_migrations
    from chat.services.regenerate import regenerate_assistant_turn
    db_path = tmp_path / "test.db"
    cfg = tmp_path / "config.toml"
    cfg.write_text('featherless_api_key = "test"\n')
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
    apply_migrations(db_path)
    _ut_id, at_id = _seed_with_one_turn(db_path)
    # The forward sequence here pretends the prior turn ALSO authored
    # the start (which is realistic — a single turn flow could go
    # planned → active → completed across multiple events). Tagging
    # both with the same back-reference exercises the multi-rollback
    # loop (one per affected lifecycle row).
    _seed_event_with_lifecycle(
        db_path,
        event_id="evt_completed",
        triggered_by_assistant_turn_id=at_id,
        forward_kinds=["event_started", "event_completed"],
    )
    # Sanity: events row is 'completed'.
    with open_db(db_path) as conn:
        status = conn.execute(
            "SELECT status FROM events WHERE event_id = ?", ("evt_completed",)
        ).fetchone()[0]
        assert status == "completed"
    state_canned = json.dumps(
        {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
    )
    no_transitions = json.dumps({"transitions": []})
    mock_client = MockLLMClient(
        canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
    )
    settings = Settings(featherless_api_key="test")
    with open_db(db_path) as conn:
        asyncio.run(
            regenerate_assistant_turn(
                conn,
                mock_client,
                settings=settings,
                chat_id="chat_bot_a",
                original_assistant_event_id=at_id,
            )
        )
    with open_db(db_path) as conn:
        # Two event_status_reverted rows land — one per forward
        # transition that carried the back-reference. Both target the
        # same event_id but with different prior_status values
        # (in event_log id order: started→planned, completed→active).
        rev_rows = conn.execute(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'event_status_reverted' ORDER BY id"
        ).fetchall()
        assert len(rev_rows) == 2
        rev_payloads = [json.loads(r[0]) for r in rev_rows]
        assert rev_payloads[0] == {
            "event_id": "evt_completed",
            "prior_status": "planned",
        }
        assert rev_payloads[1] == {
            "event_id": "evt_completed",
            "prior_status": "active",
        }
        # Events projection: the LAST applied event_status_reverted
        # wins (active). That's the desired final state for a turn
        # that was originally a started+completed double-step.
        status = conn.execute(
            "SELECT status FROM events WHERE event_id = ?",
            ("evt_completed",),
        ).fetchone()[0]
        assert status == "active"
 def test_regenerate_skips_events_without_back_reference(
    tmp_path, monkeypatch, caplog
 ):
    """T114.3 backward compatibility: lifecycle events authored before
    T114.1 lack the ``triggered_by_assistant_turn_id`` payload field.
    Regenerate must NOT emit ``event_status_reverted`` for such rows —
    they're skipped (with a DEBUG log). The legacy T83.4 WARNING about
    un-rolled-back transitions still fires for visibility.
    """
    import asyncio
    import logging
    from chat.config import Settings
    from chat.db.migrate import apply_migrations
    from chat.eventlog.log import append_and_apply
    from chat.services.regenerate import regenerate_assistant_turn
    db_path = tmp_path / "test.db"
    cfg = tmp_path / "config.toml"
    cfg.write_text('featherless_api_key = "test"\n')
    monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
    monkeypatch.setenv("CHAT_DB_PATH", str(db_path))
    apply_migrations(db_path)
    _ut_id, at_id = _seed_with_one_turn(db_path)
    # Seed a lifecycle transition WITHOUT the back-reference field —
    # mimicking pre-T114.1 event_log rows.
    with open_db(db_path) as conn:
        append_and_apply(
            conn,
            kind="event_planned",
            payload={
                "event_id": "evt_legacy",
                "chat_id": "chat_bot_a",
                "kind": "story_event",
                "props": {},
                "planned_for": "2026-04-30T18:00:00+00:00",
            },
        )
        append_and_apply(
            conn,
            kind="event_started",
            payload={
                "event_id": "evt_legacy",
                "started_at": "2026-04-30T19:00:00+00:00",
                # NOTE: no triggered_by_assistant_turn_id — pre-T114.1
                # legacy row.
            },
        )
    state_canned = json.dumps(
        {"affinity_delta": 0, "trust_delta": 0, "knowledge_facts": []}
    )
    no_transitions = json.dumps({"transitions": []})
    mock_client = MockLLMClient(
        canned=["Refreshed reply.", state_canned, state_canned, no_transitions]
    )
    settings = Settings(featherless_api_key="test")
    caplog.set_level(logging.DEBUG, logger="chat.services.regenerate")
    with open_db(db_path) as conn:
        asyncio.run(
            regenerate_assistant_turn(
                conn,
                mock_client,
                settings=settings,
                chat_id="chat_bot_a",
                original_assistant_event_id=at_id,
            )
        )
    with open_db(db_path) as conn:
        # No event_status_reverted was emitted for the legacy row.
        rev_count = conn.execute(
            "SELECT COUNT(*) FROM event_log "
            "WHERE kind = 'event_status_reverted'"
        ).fetchone()[0]
        assert rev_count == 0
        # Events row is still 'active' — the legacy transition stands.
        status = conn.execute(
            "SELECT status FROM events WHERE event_id = ?",
            ("evt_legacy",),
        ).fetchone()[0]
        assert status == "active"
    # Debug log surfaces the skipped row.
    debugs = [
        r.getMessage()
        for r in caplog.records
        if r.levelname == "DEBUG"
    ]
    assert any(
        "skipping rollback for lifecycle event_log" in m for m in debugs
    ), f"expected DEBUG about skipped legacy row; got: {debugs}"
    # Legacy WARNING still fires so operators see un-rolled-back rows.
    warnings = [
        r.getMessage()
        for r in caplog.records
        if r.levelname == "WARNING"
        and "lifecycle transition" in r.getMessage()
    ]
    assert warnings, (
        "expected WARNING about un-rolled-back legacy lifecycle "
        f"transitions; got records: "
        f"{[r.getMessage() for r in caplog.records]}"
    )
    # The new wording references the missing back-reference field.
    assert "triggered_by_assistant_turn_id" in warnings[0]
@@ -0,0 +1,122 @@
 """Tests for RoutedLLMClient (Phase 4.5+).
 Splits traffic across two underlying clients based on the ``model``
 kwarg. We use simple stub clients to assert the router picks the
 correct backend for each call.
 """
 from __future__ import annotations
 from typing import AsyncIterator, Sequence
 import pytest
 from chat.llm.client import Message
 from chat.llm.router import RoutedLLMClient
 class _StubClient:
    def __init__(self, name: str):
        self.name = name
        self.generate_calls: list[str] = []
        self.stream_calls: list[str] = []
        self.embed_calls: list[str] = []
    async def generate(self, messages, *, model, **params) -> str:
        self.generate_calls.append(model)
        return f"{self.name}:{model}"
    async def stream(self, messages, *, model, **params) -> AsyncIterator[str]:
        self.stream_calls.append(model)
        yield f"{self.name}:{model}"
    async def embed(self, text, *, model) -> list[float]:
        self.embed_calls.append(model)
        return [1.0, 2.0]
@pytest.mark.asyncio
 async def test_router_generate_routes_remote_model_to_remote_backend():
    """Any model id NOT starting with a local prefix goes to the remote
    backend — narrative model, remote classifiers, anything else."""
    narrative = _StubClient("narrative")
    local = _StubClient("local")
    router = RoutedLLMClient(
        narrative=narrative,
        local=local,
        narrative_model="provider/big-model",
        local_prefixes=("mlx-community/",),
    )
    out = await router.generate(
        [Message(role="user", content="hi")], model="provider/big-model"
    )
    assert out == "narrative:provider/big-model"
    assert narrative.generate_calls == ["provider/big-model"]
    assert local.generate_calls == []
@pytest.mark.asyncio
 async def test_router_generate_routes_local_prefix_to_local_backend():
    """Models prefixed with a local prefix (e.g. ``mlx-community/``)
    go to the local MLX backend regardless of whether the rest of the
    path looks like a remote provider id."""
    narrative = _StubClient("narrative")
    local = _StubClient("local")
    router = RoutedLLMClient(
        narrative=narrative,
        local=local,
        narrative_model="provider/big-model",
        local_prefixes=("mlx-community/",),
    )
    out = await router.generate(
        [Message(role="user", content="hi")],
        model="mlx-community/Hermes-3-Llama-3.1-8B-8bit",
    )
    assert out == "local:mlx-community/Hermes-3-Llama-3.1-8B-8bit"
    assert local.generate_calls == ["mlx-community/Hermes-3-Llama-3.1-8B-8bit"]
    assert narrative.generate_calls == []
@pytest.mark.asyncio
 async def test_router_stream_dispatches_by_prefix():
    narrative = _StubClient("narrative")
    local = _StubClient("local")
    router = RoutedLLMClient(
        narrative=narrative,
        local=local,
        narrative_model="provider/big-model",
        local_prefixes=("mlx-community/",),
    )
    chunks_remote = [c async for c in router.stream(
        [Message(role="user", content="hi")], model="provider/big-model"
    )]
    chunks_local = [c async for c in router.stream(
        [Message(role="user", content="hi")],
        model="mlx-community/Hermes-3-Llama-3.1-8B-8bit",
    )]
    assert chunks_remote == ["narrative:provider/big-model"]
    assert chunks_local == ["local:mlx-community/Hermes-3-Llama-3.1-8B-8bit"]
@pytest.mark.asyncio
 async def test_router_embed_always_routes_to_local():
    """Embeddings always run locally — the remote provider doesn't
    expose a working ``/v1/embeddings``, so the router never sends
    embed calls there even if the model name happens to look 'remote'."""
    narrative = _StubClient("narrative")
    local = _StubClient("local")
    router = RoutedLLMClient(
        narrative=narrative, local=local, narrative_model="big-model"
    )
    out = await router.embed("hello", model="any-embedding-model")
    assert out == [1.0, 2.0]
    assert local.embed_calls == ["any-embedding-model"]
    assert narrative.embed_calls == []
@@ -16,6 +16,7 @@ Verifies the FastAPI ``/search`` route that wraps T93's
 from __future__ import annotations
 from pathlib import Path
 from unittest.mock import patch
 import pytest
 from fastapi.testclient import TestClient
@@ -126,10 +127,75 @@ def test_empty_query_renders_placeholder_not_results(client, tmp_path):
 def test_result_links_navigate_to_chat(client, tmp_path):
    """Each result links back to its originating chat so the user can
-    reopen the thread where the memory was first witnessed."""
+    reopen the thread where the memory was first witnessed.
    Post-T111.2: the link now includes a turn anchor when the memory
    row carries an ``event_id`` (T109's nullable column is populated for
    rows projected after migration 0014 ran). We assert on the chat-id
    portion of the href because the exact event id is autoincrement and
    depends on seed order; the dedicated
    ``test_search_result_link_includes_turn_anchor`` test below pins the
    anchor format itself."""
    _seed_two_chats_with_memories(tmp_path / "test.db")
    resp = client.get("/search?q=rabbit")
    assert resp.status_code == 200
-    # The link target is chat-level (memories don't carry an event_id
+    assert 'href="/chats/chat_a' in resp.text
-    # column today, so we don't deep-link to a specific turn).
+
-    assert 'href="/chats/chat_a"' in resp.text
+
 def test_search_results_include_fts_snippet_with_highlight(client, tmp_path):
    """T111.1: FTS snippet() wraps each match in ``<mark>...</mark>`` so
    the result row visually highlights the term that matched.
    The seeded ``pov_summary`` is ``the rabbit darted across chat_a``;
    SQLite's ``snippet()`` returns the column text with each match token
    wrapped — searching for ``rabbit`` yields a snippet containing
    ``<mark>rabbit</mark>``. Assertion is just that the marker appears
    (the snippet may be truncated with an ellipsis when the indexed text
    runs longer than the configured token window)."""
    _seed_two_chats_with_memories(tmp_path / "test.db")
    resp = client.get("/search?q=rabbit")
    assert resp.status_code == 200
    assert "<mark>rabbit</mark>" in resp.text
 def test_search_result_link_includes_turn_anchor(client, tmp_path):
    """T111.2: result links deep-link to the originating turn via the
    chat-page anchor stamped by Phase 3.5 T86 (``id="turn-{event_id}"``).
    The seeded ``memory_written`` events are projected with
    ``memories.event_id`` populated (T109); the route exposes that id and
    the template builds the link as ``/chats/{chat_id}#turn-{event_id}``.
    We don't assert a specific event id (it's an autoincrement that
    depends on seed order), only that *some* turn anchor is present for
    the chat link the user is about to click."""
    _seed_two_chats_with_memories(tmp_path / "test.db")
    resp = client.get("/search?q=rabbit")
    assert resp.status_code == 200
    assert "/chats/chat_a#turn-" in resp.text
 def test_search_results_use_batched_lookups(client, tmp_path):
    """T106: hydration must not fan out to per-row ``get_bot``/
    ``get_chat``/``get_scene`` calls.
    The previous implementation called each helper once per result row
    (worst case 50 rows x 3 helpers = 150 individual queries). The
    batched implementation collects distinct ids and issues at most one
    query per entity kind via ``WHERE id IN (...)``, so the per-row
    helpers should not be invoked at all when there are matches.
    We seed two chats (so both ``get_bot`` and ``get_chat`` would have
    been hit pre-T106) and assert each helper sees zero per-row calls.
    """
    _seed_two_chats_with_memories(tmp_path / "test.db")
    with (
        patch("chat.web.search.get_bot") as mock_get_bot,
        patch("chat.web.search.get_chat") as mock_get_chat,
        patch("chat.web.search.get_scene") as mock_get_scene,
    ):
        resp = client.get("/search?q=rabbit")
    assert resp.status_code == 200
    # Batched IN-list queries replace the per-row helpers entirely.
    assert mock_get_bot.call_count == 0
    assert mock_get_chat.call_count == 0
    assert mock_get_scene.call_count == 0
@@ -156,6 +156,28 @@ def test_restore_snapshot_wrong_confirm_400(client, tmp_path):
    assert response.status_code == 400
 def test_restore_without_kind_returns_400(client, tmp_path):
    """T105: Missing or empty ``kind`` must be rejected with 400.
    Previously ``kind`` defaulted to ``"periodic"``, which silently 404'd
    when the caller meant a rewind snapshot. Tighten the contract so the
    client must always pass an explicit, valid ``kind``.
    """
    db_path = tmp_path / "test.db"
    _seed_bot(db_path, "bot_a", "BotA")
    snapshot_path = _take_snapshot_via_service(
        db_path, tmp_path, kind="periodic"
    )
    snapshot_id = snapshot_path.stem
    response = client.post(
        f"/snapshots/restore/{snapshot_id}",
        data={"confirm_id": snapshot_id},  # no `kind`
        follow_redirects=False,
    )
    assert response.status_code == 400
 def test_preview_renders_metadata(client, tmp_path):
    db_path = tmp_path / "test.db"
    _seed_bot(db_path, "bot_a", "BotA")
@@ -22,6 +22,7 @@ from chat.db.connection import open_db
 from chat.eventlog.log import append_and_apply, append_event
 from chat.eventlog.projector import project
 from chat.llm.mock import MockLLMClient
 from tests.fixtures import CannedQueue
@pytest.fixture
@@ -362,14 +363,20 @@ def test_single_bot_turn_no_guest_regression(app_state_setup, tmp_path):
    the chat has no guest, so ``detect_interjection`` is NOT invoked.
    Ends with one user_turn, one assistant_turn, two edge_updates, and a
    single ``memory_written``.
    T116: migrated to :class:`tests.fixtures.CannedQueue` as a proof of
    concept for the structured canned-queue builder.
    """
    _seed(tmp_path / "test.db")
-    canned_parse = json.dumps(
+    canned = (
-        {"segments": [{"kind": "dialogue", "text": "hello"}]}
+        CannedQueue()
-    )
+            .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
-    mock = _override_llm(
+            .narrative("Hi there.")
-        [canned_parse, "Hi there.", _zero_state(), _zero_state()]
+            .state_update()
            .state_update()
            .build()
    )
    mock = _override_llm(canned)
    try:
        response = app_state_setup.post(
            "/chats/chat_bot_a/turns", data={"prose": "hello"}
@@ -734,6 +741,19 @@ def test_cancelled_turn_still_closes_scene_when_user_prose_signals_close(
    that as an exception, so we drive the request inside ``with
    pytest.raises``. Despite the exception, the scene_closed event
    must land in the event_log.
    T108 NOTE — this test does NOT actually exercise the cancel path.
    ``_CancelOnStreamMock.stream`` writes ``raise asyncio.CancelledError``
    but ``asyncio`` is not imported at module scope, so the first
    iteration raises ``NameError`` (caught by ``except Exception:`` in
    post_turn, which sets ``primary_truncated=True`` but leaves
    ``cancelled=False``). The function therefore returns 204 normally,
    the dependency-managed connection commits, and ``scene_closed``
    lands. Importing asyncio so the real CancelledError fires reveals
    a transactional bug: ``post_turn``'s end-of-function re-raise
    causes ``open_db``'s dependency teardown to skip ``conn.commit()``,
    rolling back ALL post-cancel writes (user_turn, assistant_turn,
    edge_updates, scene_closed). Deferred for triage — see T108 report.
    """
    from typing import AsyncIterator, Sequence
@@ -828,12 +848,33 @@ def test_cancelled_turn_still_closes_scene_when_user_prose_signals_close(
            "SELECT payload_json FROM event_log "
            "WHERE kind = 'assistant_turn' ORDER BY id"
        ).fetchall()
        # T108: pin the ordering — user_turn must commit before
        # scene_closed (close detection runs on prose that is already
        # in the event_log) and any assistant_turn the cancel produced
        # must come last (truncated record written after both).
        ordered = conn.execute(
            "SELECT id, kind FROM event_log "
            "WHERE kind IN ('user_turn', 'scene_closed', 'assistant_turn') "
            "ORDER BY id"
        ).fetchall()
    # Scene close lands despite the cancel.
    assert scene_close_count == 1
    # The cancelled assistant_turn was still recorded (truncated=True).
    assert len(assistant_payload) == 1
    assert json.loads(assistant_payload[0][0])["truncated"] is True
    # T108 ordering pin: user_turn lands first, the truncated
    # assistant_turn (if any) is committed BEFORE the scene_close
    # decision fires, and scene_closed lands last. Close detection
    # relies on user prose being committed to the event_log BEFORE
    # the close decision runs — and the cancelled assistant beat is
    # recorded as a partial before close-detection too.
    kinds_in_order = [row[1] for row in ordered]
    user_idx = kinds_in_order.index("user_turn")
    close_idx = kinds_in_order.index("scene_closed")
    assert user_idx < close_idx
    if "assistant_turn" in kinds_in_order:
        assert user_idx < kinds_in_order.index("assistant_turn") < close_idx
 def test_interjection_enqueues_significance_job(app_state_setup, tmp_path):
@@ -945,29 +986,25 @@ def test_turn_with_event_transition_appends_started_event(
            },
        )
-    canned_parse = json.dumps(
+    # T116: migrated to :class:`tests.fixtures.CannedQueue`.
-        {"segments": [{"kind": "dialogue", "text": "they arrived"}]}
+    canned = (
-    )
+        CannedQueue()
-    canned_event_decision = json.dumps(
+            .parse_turn(segments=[{"kind": "dialogue", "text": "they arrived"}])
-        {
+            .narrative("They walk in.")
-            "transitions": [
+            .state_update()
-                {
+            .state_update()
-                    "event_id": "evt_1",
+            .detect_event_transitions(
-                    "new_status": "active",
+                [
-                    "reason": "they arrived",
+                    {
-                }
+                        "event_id": "evt_1",
-            ]
+                        "new_status": "active",
-        }
+                        "reason": "they arrived",
-    )
+                    }
-    mock = _override_llm(
+                ]
-        [
+            )
-            canned_parse,
+            .build()
            "They walk in.",
            _zero_state(),
            _zero_state(),
            canned_event_decision,
        ]
    )
    mock = _override_llm(canned)
    try:
        response = app_state_setup.post(
            "/chats/chat_bot_a/turns", data={"prose": "they arrived"}
@@ -989,6 +1026,18 @@ def test_turn_with_event_transition_appends_started_event(
        assert started_payload["event_id"] == "evt_1"
        assert started_payload["started_at"] == "2026-04-26T20:00:00+00:00"
        # T114.1: payload carries the back-reference to the assistant_turn
        # that triggered the transition. The assistant_turn lands in
        # event_log immediately before the event_started, so its id is
        # the largest assistant_turn id in the chat at this point.
        at_id = conn.execute(
            "SELECT id FROM event_log "
            "WHERE kind = 'assistant_turn' "
            "  AND json_extract(payload_json, '$.chat_id') = 'chat_bot_a' "
            "ORDER BY id DESC LIMIT 1"
        ).fetchone()[0]
        assert started_payload["triggered_by_assistant_turn_id"] == at_id
        # The events projection row reflects the active status.
        ev_row = conn.execute(
            "SELECT status, started_at FROM events WHERE event_id = ?",
@@ -1109,18 +1158,23 @@ def test_turn_with_no_active_events_skips_classifier(app_state_setup, tmp_path):
    short-circuits without an LLM call (per T52). The canned queue must
    therefore have ZERO event-detection slots — same shape as the
    Phase 2 no-guest baseline.
    T116: migrated to :class:`tests.fixtures.CannedQueue`.
    """
    _seed(tmp_path / "test.db")
    canned_parse = json.dumps(
        {"segments": [{"kind": "dialogue", "text": "hello"}]}
    )
    # Only 4 slots: parse + narrative + 2 state-updates. NO extra slot for
    # event-detection — non-existent active_events causes the helper to
    # short-circuit before pulling from the queue.
-    mock = _override_llm(
+    canned = (
-        [canned_parse, "Hi there.", _zero_state(), _zero_state()]
+        CannedQueue()
            .parse_turn(segments=[{"kind": "dialogue", "text": "hello"}])
            .narrative("Hi there.")
            .state_update()
            .state_update()
            .build()
    )
    mock = _override_llm(canned)
    try:
        response = app_state_setup.post(
            "/chats/chat_bot_a/turns", data={"prose": "hello"}
@@ -73,11 +73,25 @@ async def test_parse_turn_empty_prose_short_circuits_without_classifier_call():
@pytest.mark.asyncio
-async def test_parse_turn_raises_when_classifier_fails_twice():
+async def test_parse_turn_falls_back_to_whole_prose_when_classifier_fails():
    """A flapping classifier (3 invalid responses) no longer 500s the
    request. ``parse_turn`` returns the original prose as a single
    ``dialogue`` segment so the turn flow can keep moving — the
    narrative will still fire on the prose, just without finer-grained
    segment classification.
    The old contract was ``RuntimeError`` (no default), but in
    production that took down the whole turn endpoint with a 500 the
    moment any classifier provider hiccuped — particularly painful in
    multi-bot scenes where every user turn pays the parse_turn cost.
    """
    mock = MockLLMClient(canned=["nope", "still nope", "nope3"])
-    with pytest.raises(RuntimeError):
+    result = await parse_turn(
-        await parse_turn(
+        mock,
-            mock,
+        model="m",
-            model="m",
+        prose='*shrugs* "whatever"',
-            prose='*shrugs* "whatever"',
+    )
-        )
+    assert len(result.segments) == 1
    assert result.segments[0].kind == "dialogue"
    assert result.segments[0].text == '*shrugs* "whatever"'
    assert result.intent == "narrative"
@@ -324,11 +324,11 @@ def test_get_scene_returns_none_for_missing(tmp_path):
        assert active_scene(conn, "chat_missing") is None
-def test_schema_version_after_migration_is_13(tmp_path):
+def test_schema_version_after_migration_is_14(tmp_path):
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        row = conn.execute(
            "SELECT value FROM meta WHERE key = 'schema_version'"
        ).fetchone()
-        assert int(row[0]) == 13
+        assert int(row[0]) == 14