Phase 4.5: cleanup — polish, branching, embeddings, lifecycle, deep-link #7

Merged
dohertj2 merged 39 commits from phase-4.5 into main 2026-04-27 11:33:27 -04:00
Owner

Burns down 23 of the 24 items in the Phase 4.5 / 5 backlog (T103–T118). T115 (sqlite-vec swap) deferred to Phase 5 — host Python lacks enable_load_extension.

What shipped

Wave 1 (polish):

  • T103 branches.py — global-leak doc + unknown-name warning
  • T104 memory.py — DRY MAX(id) helper + fts_rank=None contract
  • T105 snapshots.py — datetime hoist + strict kind validation + mtime doc
  • T106 search.py — k constant + N+1 batched lookups
  • T107 embeddings.py — fallback warning for non-default models
  • T108 scene-close-on-cancel — strengthened pin + documented rationale

Wave 2: T109 schema 0014 (memories.event_id for deep-link)

Wave 3: T110 drawer bundle — event_id guard, html.escape, Jinja partial, bulk significance re-rate

Wave 4: T111 search UX — FTS snippet highlighting + turn deep-link

Wave 5: T112 real embedding swap — LLMClient.embed Protocol + Mock impl + routing + --re-embed-all backfill (FeatherlessClient.embed raises NotImplementedError; documented gap)

Wave 6: T113 branching read-side filter — active_branch_event_ids helper applied to read_recent_dialogue, scene_summarize, search_memories, meanwhile

Wave 7: T114 regenerate lifecycle rollback — triggered_by_assistant_turn_id back-ref + event_status_reverted event kind

Wave 9: T116 CannedQueue fixture builder + T117 cross-feature integration tests + T118 docs sweep

Surfaced for Phase 5

  • T115 sqlite-vec swap — environmental
  • T108 cancel-path commit bug — post_turn re-raise causes open_db teardown to skip commit, rolling back post-cancel writes (the existing T74.3 pin only passes because asyncio isn't imported in the test module)
  • embeddings FK CASCADE — defer to broader migration cleanup
  • CannedQueue migration tail — POC migrated 3 tests; ~30 remain
  • HNSW vector index, branch-isolated event_log, embedding swap orchestration tooling, real-time collab branching, avatars

Schema

13 → 14 (one migration: 0014_phase45_schema.sql adds memories.event_id)

Tests

413 (Phase 4 baseline) → 457 passed (44 new, full suite green)

Test plan

  • Full suite green on phase-4.5: `457 passed in 290.23s`
  • Manual smoke: configure non-default embedding_model, write a memory, search results show new vector path
  • Manual smoke: create a branch, switch, play turns, switch back — main history unchanged
  • Manual smoke: plan + complete an event, regenerate, verify revert to planned
  • Manual smoke: cross-chat search, click a result, verify deep-link anchor
  • Manual smoke: drawer bulk significance re-rate
Burns down 23 of the 24 items in the Phase 4.5 / 5 backlog (T103–T118). T115 (sqlite-vec swap) deferred to Phase 5 — host Python lacks `enable_load_extension`. ## What shipped **Wave 1 (polish):** - T103 branches.py — global-leak doc + unknown-name warning - T104 memory.py — DRY MAX(id) helper + fts_rank=None contract - T105 snapshots.py — datetime hoist + strict kind validation + mtime doc - T106 search.py — k constant + N+1 batched lookups - T107 embeddings.py — fallback warning for non-default models - T108 scene-close-on-cancel — strengthened pin + documented rationale **Wave 2:** T109 schema 0014 (memories.event_id for deep-link) **Wave 3:** T110 drawer bundle — event_id guard, html.escape, Jinja partial, bulk significance re-rate **Wave 4:** T111 search UX — FTS snippet highlighting + turn deep-link **Wave 5:** T112 real embedding swap — LLMClient.embed Protocol + Mock impl + routing + --re-embed-all backfill (FeatherlessClient.embed raises NotImplementedError; documented gap) **Wave 6:** T113 branching read-side filter — active_branch_event_ids helper applied to read_recent_dialogue, scene_summarize, search_memories, meanwhile **Wave 7:** T114 regenerate lifecycle rollback — triggered_by_assistant_turn_id back-ref + event_status_reverted event kind **Wave 9:** T116 CannedQueue fixture builder + T117 cross-feature integration tests + T118 docs sweep ## Surfaced for Phase 5 - **T115 sqlite-vec swap** — environmental - **T108 cancel-path commit bug** — post_turn re-raise causes open_db teardown to skip commit, rolling back post-cancel writes (the existing T74.3 pin only passes because asyncio isn't imported in the test module) - **embeddings FK CASCADE** — defer to broader migration cleanup - **CannedQueue migration tail** — POC migrated 3 tests; ~30 remain - HNSW vector index, branch-isolated event_log, embedding swap orchestration tooling, real-time collab branching, avatars ## Schema 13 → **14** (one migration: 0014_phase45_schema.sql adds memories.event_id) ## Tests 413 (Phase 4 baseline) → **457 passed** (44 new, full suite green) ## Test plan - [x] Full suite green on phase-4.5: \`457 passed in 290.23s\` - [ ] Manual smoke: configure non-default embedding_model, write a memory, search results show new vector path - [ ] Manual smoke: create a branch, switch, play turns, switch back — main history unchanged - [ ] Manual smoke: plan + complete an event, regenerate, verify revert to planned - [ ] Manual smoke: cross-chat search, click a result, verify deep-link anchor - [ ] Manual smoke: drawer bulk significance re-rate
dohertj2 added 39 commits 2026-04-27 11:32:11 -04:00
Investigation surfaced a transactional bug in the cancel path: when the
primary stream raises asyncio.CancelledError mid-stream, post_turn
re-raises at end-of-function, and open_db's dependency teardown skips
conn.commit() — rolling back ALL post-cancel writes including the
scene_closed event. The existing T74.3 regression test only passes
because asyncio is not imported at module scope, so CancelledError
becomes NameError (caught by except Exception, leaves cancelled=False).
Documented in turns.py + test docstring; deferred for triage.
A stale tab or hand-crafted request posting event_id=0 to the surgical
delete route would compute after_event_id=-1 and silently truncate the
entire log. Now rejected with 400.

SQLite assigns event_log ids starting at 1, so any legitimate id is
always >= 1 — non-positive values can only indicate a client bug.

Test: tests/test_drawer_phase4.py::test_delete_turn_with_event_id_zero_returns_400.
The delete-impact modal is built via raw f-string concatenation from the
ImpactReport — item.kind / item.description / report.notes ultimately
embed user-controllable content (turn prose, scene timestamps). A turn
with prose like "<script>alert(1)</script>" would reach the rendered
HTML verbatim. Currently safe (the fields embedded today are bounded
strings) but defense-in-depth — wrap with html.escape() so future
description changes can't smuggle markup through.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_escapes_user_controllable_strings.
The modal HTML was assembled via raw f-string concatenation in
``delete_preview``. Move it to a dedicated Jinja2 partial
(``chat/templates/_delete_impact_modal.html``) and render via
``TEMPLATES.TemplateResponse``. Jinja2 autoescape now handles HTML
safety automatically — the explicit ``html.escape()`` calls added in
T110.2 (and the ``import html``) become redundant and are removed in
this commit.

Net behavioural change: attribute quoting style flips from single to
double quotes (Jinja default) — the existing T98.4 substring-based
assertions are unaffected, and the new T110.3 test pins the
double-quoted shape so future regressions surface.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_uses_jinja_partial.
The drawer's Significance review panel previously only supported
per-memory edits. Adds a bulk control: pick ``level_from`` and
``level_to``, and every memory in the chat at ``level_from`` is moved
to ``level_to``.

Implementation emits one ``manual_edit`` event per matching memory
(not a single bulk event) so the §6.4 per-row audit trail stays
intact — each affected memory carries its own ``prior_value -> new_value``
snapshot, so an inverse edit can restore an individual row without
needing to inspect a bulk payload's member list. Reuses the existing
``memory_significance`` ``manual_edit`` projector branch (T25), so no
state-layer changes are required.

The route rejects no-op submissions (``level_from == level_to``) with
400 to avoid padding the event log with empty edits, and clamps both
levels to 0..3 (matching ``edit_memory_significance``).

UI: a small ``<details>`` block in the Significance review section
with two number inputs and a submit button.

Test: tests/test_drawer_phase4.py::test_bulk_significance_re_rate_emits_manual_edit_per_memory.
Replace the ``pov_summary`` column in ``search_all_memories``'s SELECT
with ``snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)`` so each
match in a result row is wrapped in ``<mark>`` for the search-results
UI. The original ``pov_summary`` is still returned alongside as a
non-highlighted fallback. Template renders ``r.snippet|safe`` — the only
HTML in the snippet output is the configured ``<mark>`` markers, so it
is safe to bypass Jinja's auto-escape.
Add ``m.event_id`` (T109's nullable column from migration 0014) to
``search_all_memories``'s SELECT, propagate it through the route's
template context, and have ``search.html`` build result links as
``/chats/{chat_id}#turn-{event_id}`` — matching the ``id="turn-{event_id}"``
anchor that Phase 3.5 T86 stamps on each turn DOM node so the chat page
scrolls to the originating turn on load. Memory rows projected before
the 0014 migration ran read NULL ``event_id``; the template falls back
to a chat-level link in that case so we never emit ``#turn-None``.

Pre-existing tests that asserted on the bare ``href="/chats/{chat_id}"``
contract are updated to assert on the ``href="/chats/{chat_id}#turn-``
prefix to reflect the new deep-link.
Adds async def embed(self, text: str, *, model: str) -> list[float]
to the LLMClient Protocol so Phase 4.5 can wire a real-embedding swap
without changing call sites. Protocol is structural — existing
implementations that don't use it remain compatible; downstream
implementations (FeatherlessClient, MockLLMClient) ship in T112.2 and
T112.3.
Implements embed() on FeatherlessClient. Featherless's OpenAI-
compatible surface does NOT expose /v1/embeddings at the time of
writing, so this implementation raises NotImplementedError rather
than issuing a request that would 404. The
chat.services.embeddings.generate_embedding wrapper (T112.3)
catches the exception and degrades to the zero-vector fallback path
(plus the existing T107 warning) — misconfigured callers fail loudly
in logs while the request path keeps working.

If/when Featherless ships embeddings, swap the body for
self._client.embeddings.create(model=..., input=...) guarded by
the existing 2-conn semaphore (mirrors generate/stream). The Protocol
seam in T112.1 is already wired so no other code needs to change.

Adds tests/test_featherless.py pinning the NotImplementedError
contract.
When model != DEFAULT_EMBEDDING_MODEL, generate_embedding now
calls client.embed(text, model=model) and wraps the returned
vector in an EmbeddingResult tagged with the requested model.
On any exception (NotImplementedError from providers without an
embeddings endpoint, transient network errors, etc.), the existing
T107 warning fires and the function falls back to the zero-vector
sentinel — callers detect model == 'fallback' and skip indexing.

Adds:
- MockLLMClient accepts a canned_embeddings queue mirroring
  the existing canned pattern. embed() pops from the front;
  empty queue raises IndexError so misconfigured tests fail
  loudly.
- Settings.embedding_model defaults to "pseudo-sha256-384"
  so existing zero-config installs keep Phase 4 behavior. The app
  lifespan now passes this through to EmbeddingWorker.model.

The public signature of generate_embedding is unchanged:
(client, *, text, model=DEFAULT_EMBEDDING_MODEL, dim=..., timeout_s=...).
Adds two new flags to the backfill script:

* --re-embed-all walks **every** memory (not just those without
  an existing embeddings row) and re-emits embedding_indexed
  events. The projector is INSERT OR REPLACE, so re-emitting an event
  for an existing memory replaces the prior vector. Use this when
  swapping embedding models — the default mode still keeps the Phase
  4 gap-fill behavior.
* --model M overrides Settings.embedding_model for this run.

The script also gains a small _build_client helper that returns
None for the pseudo path (no client needed) and a FeatherlessClient
otherwise; tests monkeypatch this to inject a Mock with canned
embeddings.

Adds tests/test_backfill_embeddings.py with three integration
tests: re-embed-all walks every memory, default mode skips existing
rows, and --model overrides the configured model end-to-end.
Wire the active branch's [origin_event_id, head_event_id] window into
every user-facing event/memory reader so switching branches actually
changes what dialogue and memories the user sees. Phase 4 T89/T94
shipped branches as metadata-only — this closes the loop.

Helper:
- chat/state/branches.py: add `active_branch_event_ids(conn)` returning
  the active branch's id range, with two defensive fall-throughs to
  `(0, BIG_INT)`: (a) no active branch row at all, and (b) the
  bootstrap "main" sentinel (name="main", origin=0, head=0). Production
  never bumps main's head_event_id today, so this preserves existing
  reader behaviour for every test that doesn't explicitly switch.

Readers updated (all user-facing dialogue / retrieval surfaces):
- chat/services/turn_common.py::read_recent_dialogue — chat-history
  prompt context + the chat-view template path (via web/turns.py +
  web/chat.py).
- chat/services/scene_summarize.py::_read_recent_dialogue — scene-close
  per-POV summary input.
- chat/state/memory.py::search_memories — FTS leg filters via
  m.event_id (T109's column); legacy NULL event_id rows are *included*
  unconditionally so the filter doesn't break pre-0014 retrieval. The
  fused (FTS + RRF + vector) path also drops vector hits whose
  event_id falls outside the branch window.
- chat/web/meanwhile.py::_read_recent_meanwhile_dialogue — meanwhile
  prompt context.

Projector queries (chat/state/world.py et al.) and admin/management
surfaces (drawer hide-panel, cross-chat search, regenerate's row
lookups by id) are intentionally NOT branch-filtered: projection must
see the full log to build state correctly, and the admin surfaces
operate across branches by design.

Tests (10 new, 446 total):
- tests/test_branches_state.py: 3 tests for `active_branch_event_ids`
  itself (bootstrap-main, no-active-branch, non-main literal range).
- tests/test_branching.py: 7 cross-feature tests covering the spec's
  five required scenarios plus scene_summarize and meanwhile readers.
Phase 3.5 T83.4 surfaced un-rolled-back lifecycle transitions on
regenerate; T114 wires up the actual rollback. Step 1 is the back-
reference: every event_started / event_completed / event_cancelled
emitted by post_turn (chat/web/turns.py) and regenerate
(chat/services/regenerate.py) now carries
``triggered_by_assistant_turn_id`` in its payload, set to the id of
the assistant_turn event that produced the transition.

Schema decision (Option A from the plan): no migration. The field is
a payload convention only — older event_log rows lack it and rollback
will skip them with a debug log when T114.3 lands. Forward-only.

The post_turn lifecycle block already runs AFTER the assistant_turn
event is appended (step 8a vs step 7), so ``primary_assistant_event_id``
is in scope. Same for regenerate: the lifecycle classification (step 9a)
runs after step 6's append. **No emission-order reorder was needed**
in either flow.

Updates ``test_turn_with_event_transition_appends_started_event`` to
assert the new field is present in the emitted event_started payload
and points at the assistant_turn id.
Adds the inverse projection used by T114.3's regenerate rollback. The
new ``event_status_reverted`` event kind carries
``{event_id, prior_status}`` and the handler unconditionally sets
``events.status = prior_status`` for the row.

Unlike the forward transitions (event_started / event_completed /
event_cancelled), this handler does NOT guard against terminal
statuses — its entire purpose is to reverse a transition, including
walking back from a terminal status to a non-terminal one. Without
that, rolling back an event_completed (status='completed' is terminal
for the forward handlers) would silently no-op and leave the row in
the post-superseded state.

The handler registers via the existing ``@on(kind)`` decorator pattern
in chat/eventlog/projector.py, so future replays of an event_log that
contains event_status_reverted rows pick it up automatically.

Test exercises completed→active, active→planned, and cancelled→active
round-trips.
Closes the T83.4 gap: when ``regenerate_assistant_turn`` supersedes an
assistant_turn that already produced lifecycle transitions, it now
emits an ``event_status_reverted`` (T114.2) for each transition tagged
with ``triggered_by_assistant_turn_id == original_assistant_event_id``
(T114.1 back-reference) before the regenerated narrative is
reclassified.

Mapping from forward kind to ``prior_status`` lives in
``_PRIOR_STATUS_MAP``:
  - event_started   → planned
  - event_completed → active
  - event_cancelled → active (best-effort default; cancellation can fire
    from either planned or active, but detect_event_transitions only
    surfaces currently-active rows so 'active' is the realistic prior)

Backward compatibility: lifecycle rows authored before T114.1 lack the
back-reference field. Those are skipped (DEBUG log per row) and
collected into a legacy WARNING that preserves the T83.4
observability contract — operators still see un-rolled-back
transitions, just from older logs.

The classify-and-emit pass below the rollback now operates against an
events projection that has already been reverted, so re-firing
``event_started``/``event_completed``/``event_cancelled`` for the
regenerated narrative is safe — no double-emit of promotion artifacts.

Spec tests:
- ``test_regenerate_rolls_back_event_started_from_superseded_turn``
- ``test_regenerate_rolls_back_event_completed_to_active`` (also
  exercises the multi-rollback loop: a turn that fired both a start
  and a completion gets two event_status_reverted rows in id order,
  with active as the final projection — matching the per-row replay
  semantics of the projector)
- ``test_regenerate_skips_events_without_back_reference`` (pins the
  legacy compatibility path with both DEBUG and WARNING expectations)
Phase 4.5 carry-over from Phase 3. Tests across test_turn_flow.py,
test_meanwhile_turn_flow.py, and the phase3/4 integration suites built
positional canned-response arrays for MockLLMClient — adding a new
classifier call to a code path required updating the array index in
many places.

This change ships tests/fixtures.py with a fluent CannedQueue builder
that lets tests declare classifier expectations by name and call order
instead of by index. Each method appends one item to an internal queue
and returns self for chaining; build() emits the flat list[str] queue
that MockLLMClient(canned=...) already consumes. The mock's contract
is unchanged.

Builder methods cover: parse_turn, detect_addressee, state_update
(with zero_state alias), detect_interjection,
detect_interjection_targeted, detect_scene_close,
detect_event_transitions, summarize_scene_pov, detect_threads,
meanwhile_digest, score_significance, and a narrative() helper for
streaming bot beats. raw() is a passthrough escape hatch.

Migration scope: ship the builder + 2 sanity tests + migrate 3
representative tests in test_turn_flow.py as proof of concept
(test_single_bot_turn_no_guest_regression,
test_turn_with_event_transition_appends_started_event,
test_turn_with_no_active_events_skips_classifier). The remaining
positional-array tests stay as-is; the builder docstring documents
the migration template for Phase 5 work.
dohertj2 merged commit a03f664407 into main 2026-04-27 11:33:27 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/chat#7