Commit Graph

280 Commits

Author SHA1 Message Date
Joseph Doherty 3a81e540a1 chore: audit project() callers and non-idempotent handlers
Same defect class as 0f8bf94: routes that ``append_event`` then
``project(conn)`` 500 once any prior event makes the full-log replay
hit a raw-INSERT handler (chat_created, container_created,
scene_opened, memory_written, meanwhile_scene_started, etc.).

Fixes the two remaining live-path callers:
- chat/web/bots.py (bot_create) — bot_authored
- chat/web/settings.py (settings_post) — you_authored

Both swap ``append_event`` + ``project`` → ``append_and_apply`` so only
the new event is applied through its registered handler. Unused
imports of ``append_event`` and ``project`` removed from each file.

The rewind path (chat/services/rewind.py) intentionally calls
``project()`` after wiping every projected table — that's the
canonical "rebuild from log against an empty DB" entry point and is
left unchanged.

Inventory of every projector handler that uses raw INSERT
(chat_created, container_created, scene_opened, memory_written,
meanwhile_scene_started, meanwhile_digest_created, edge_update) is
documented with the trade-offs of why we don't blindly switch them to
INSERT OR REPLACE — for autoincrement-id rows there is no key to match
on, and for chat_created a lossy overwrite would silently clobber
chat_state mutations from later events. The handler layer stays
correctly non-idempotent under event-sourcing semantics; the rule is
enforced at the call site.

Adds a regression test (tests/test_chat_created_non_idempotent.py)
that pins the contract: appending two chat_created events for the same
id and then ``project()``ing a second time MUST raise
``IntegrityError`` on chats.id. Any future "make it idempotent" change
must update the test, forcing a deliberate review.

Suite: 471 passed in 11.82s (was 470 + this regression test).

Report: docs/audits/2026-04-27-project-callers.md
2026-04-27 14:51:49 -04:00
Joseph Doherty 0f8bf94d29 fix: kickoff_post uses append_and_apply per-event (no full replay)
The route appended 6 events with append_event() and then called
project(conn) to apply them all at once. project() replays the
*entire* event log, but _apply_chat_created in chat/state/world.py
uses raw INSERT (not INSERT OR REPLACE), so the moment a second bot
got kicked off, projecting hit the existing chat_<bot_id> row from
the first kickoff and 500'd with sqlite3.IntegrityError: UNIQUE
constraint failed: chats.id.

Switch to append_and_apply (the live-path pattern in
chat/eventlog/log.py) which appends and applies only the new event
through its registered handler — leaving prior state untouched.
project() / append_event() imports are now unused in this file
and removed.

Suite: 470 passed in 11.8s. Verified manually: a second bot's
kickoff now redirects to /chats/chat_<bot_id> instead of 500ing.
2026-04-27 14:43:16 -04:00
Joseph Doherty 6d57fe88b4 fix: kickoff form accepts loose datetime formats from the classifier
The kickoff classifier emits initial_time_iso as a free-form string
(KickoffParse declares it str, no schema constraint), and Cydonia
in practice produces things like 'Sun 2024-05-12 07:00:00' or
'Tuesday, May 14, 2024 7:00 AM'. The POST handler used
datetime.fromisoformat which only accepts strict ISO, so the
confirm-form submit 400'd.

Adds _coerce_iso_time() that tries fromisoformat first, then a
sequence of common classifier-emitted formats via strptime,
returning a canonical 'YYYY-MM-DDTHH:MM:SS+00:00' string. Naive
datetimes assumed UTC. The POST handler now stores the canonical
form (so downstream code sees consistent ISO regardless of what
the classifier emitted), and only 400s when nothing parses.
2026-04-27 14:39:13 -04:00
Joseph Doherty f775eb7e92 fix: GET /bots/{bot_id} redirects to the bot's chat
bot_list.html linked each bot to /bots/{bot.id} but no GET handler
existed, so clicking a bot 404'd. Add a route that:

- 404s if the bot doesn't exist
- 303 -> /chats/chat_{bot_id} when the bot's chat exists
- 303 -> /bots/{bot_id}/kickoff otherwise (so a freshly-created bot
  links straight to its kickoff form)

Declared AFTER /bots/new and /bots/{bot_id}/kickoff in routing order
so the path-parameter route doesn't swallow the more specific paths
(FastAPI matches in declaration order).
2026-04-27 14:36:04 -04:00
Joseph Doherty e535a0181e fix: load htmx-1.x SSE ext (was 2.x — incompatible) + Enter-to-send
- htmx-ext-sse@2.2.2 is for htmx 2.x; with htmx 1.9.12 the extension
  registers but the SSE attributes silently no-op (different ext API
  generation). The 1.x SSE extension is bundled with htmx itself at
  unpkg.com/htmx.org@1.9.12/dist/ext/sse.js — point at that.

- Enter submits the turn-input form (Shift+Enter for newline). Uses
  form.requestSubmit() so the existing submit listener (optimistic
  user-prose render + fetch + textarea clear) handles both keyboard
  and click submissions identically.
2026-04-27 14:30:07 -04:00
Joseph Doherty f7eec707a9 fix: chat UI — load htmx-ext-sse, render user prose optimistically, AJAX submit
Three coupled bugs reported as: 'click send doesn't clear textarea, my
message isn't displayed, bot response only after refresh'.

Root causes:

1) **htmx-ext-sse was never loaded**. base.html only included
   htmx.org; the hx-ext="sse", sse-connect, and
   sse-swap attributes on the chat shell were no-ops. The
   browser never opened the SSE EventSource, so bot streaming
   never arrived live. Added the extension script tag.

2) **Form was a vanilla POST**. The submit handler in chat.html
   locked the input but didn't preventDefault, so the browser
   POSTed to /chats/X/turns and waited for the 204. The textarea
   only cleared inside unlock(), which fires on the SSE
   turn_html event — which never fired (see #1). Even with
   SSE working, the vanilla POST left the page in a half-loaded
   state. Switched to fetch + e.preventDefault() so the
   page stays fully responsive while the bot streams.

3) **User's prose was never rendered client-side**. The server
   persists user_turn events but doesn't publish a turn_html
   for them — the SSE channel is bot-output-only by design. Added
   appendUserTurn(prose) that creates a turn-you div in
   the timeline immediately on submit, before the fetch fires, so
   the user can see what they sent.

Net flow now: click Send → user prose appears instantly →
textarea clears → fetch POSTs → bot streams in via SSE →
turn_html lands → unlock() re-enables the form. No reload
needed for any of it.
2026-04-27 14:26:29 -04:00
Joseph Doherty 3b83786b8b feat: cap narrative output at 2-3 beats via trim_to_max_beats post-processor
Verbose roleplay-tuned narrators (Cydonia, Magnum, etc.) reliably
ignore prompt-level beat-count instructions and ramble for 6-12
asterisk-action beats per turn — even with HARD CAP language and
worked examples in the closing instruction. The fix is a deterministic
post-stream trimmer:

- New trim_to_max_beats(text, max_beats) in chat/services/prompt.py.
  Counts * characters in the streamed output (each beat = 2
  asterisks: open + close), trims at the start of the (max_beats+1)th
  asterisk action, strips trailing whitespace. Idempotent and safe
  on under-cap input.

- Wired into post_turn for both the primary stream (3-beat cap) and
  the optional interjection stream (2-beat cap — interjections are
  by definition shorter chime-ins).

- Tightened the closing instruction: explicit "HARD CAP: 2-3 beats"
  with "After the third beat, STOP". Helps the well-behaved models
  self-cap; the post-processor catches the rest.

- max_tokens: 250 -> 160 (lets the 3rd beat finish naturally before
  hitting the physical cap; trim_to_max_beats handles 4+ beat
  overflow). temperature: 0.85 -> 0.7 (Cydonia is more compliant
  with format instructions at slightly cooler sampling).

- Test budgets bumped (closing grew ~15 tokens with the new wording).
  6 new tests for trim_to_max_beats covering passthrough, exact-cap,
  4-beat trim, 6-beat runaway, lower caps, zero cap.

Verified live: 4-turn bench against chat_maya, every response is
2-3 beats consistently. Suite: 470 passed in 11.7s.
2026-04-27 14:19:21 -04:00
Joseph Doherty a902d86432 fix: workers retry-on-lock so they don't drop writes under busy_timeout=100ms
The previous commit dropped open_db's busy_timeout from 5s to 100ms
to prevent the embedding worker from GIL-blocking the asyncio event
loop and silently adding 5s to every state_update LLM call. That fixed
the chat path but broke worker durability: any worker write that
collided with the request handler's brief open transaction failed
with 'database is locked' instead of waiting.

Adds append_and_apply_with_retry in chat/eventlog/log.py — same
contract as append_and_apply but runs through a conn_factory and
retries with exponential backoff (50ms..500ms, ~10s total budget) on
'database is locked'. Returns None and logs WARNING if all retries
fail; callers handle that as a no-op.

Wires it into:
- embedding_worker._process for embedding_indexed events
- background._process for memory_significance_set events (auto-pin
  still uses a direct open_db when the score warrants it; that one
  is fast and not racy in practice)

Verified live: ran 4 back-to-back chat turns, zero worker errors,
embeddings + significance landing correctly. Suite: 464 passed in
11.5s.
2026-04-27 14:04:27 -04:00
Joseph Doherty de7f6624f0 perf: 18s/turn -> 2.5s/turn (SQLite busy_timeout, parallel state pairs, OpenRouter Cerebras-pinned classifier)
Four changes that compound:

1) **SQLite busy_timeout 5.0s -> 0.1s** in chat/db/connection.py. Root
   cause of the bulk of the slowness. The embedding worker contends
   for the WAL write lock while the request handler holds an open
   transaction; conn.execute's busy-wait does NOT release the GIL, so
   every state_update LLM call after the narrative was silently
   freezing the asyncio event loop for ~5s. With 0.1s the worker
   fails fast and logs (already handled), the chat keeps moving, and
   any missed embedding can be backfilled out of band. Also takes the
   test suite from ~290s -> 13s as a bonus.

2) **Parallel state-update pairs** in multi_state_update.py. Each
   directed (src, tgt) pair becomes a coroutine in asyncio.gather
   instead of a sequential for-loop. Returned order is preserved.

3) **Classifier on OpenRouter, provider-pinned to Cerebras**. New
   prefix-based router: model id with mlx-community/ -> local MLX,
   model == narrative_model -> narrative remote, else -> classifier
   remote. Settings.classifier_provider_order populates extra_body for
   the classifier client only (FeatherlessClient now accepts
   default_extra_body to merge into every chat.completions.create).
   Llama-3.1-8B on Cerebras runs at ~423 tok/s, ~10x the default
   provider. narrative still routes to mistral-nemo:nitro (Friendli).

4) **Cap classify max_tokens at 512**. A misbehaving classifier
   (response_format=json_object ignored) could otherwise generate
   thousands of tokens of prose before classify's JSON validation
   trips the retry. 512 is generous; usual completions are 50-150.

CHAT_LLM_TIMING=1 env var enables per-call timing logs on stderr;
zero overhead when unset. Useful for finding the slow link.

Suite: 464 passed in 13s (was 290s).
2026-04-27 13:51:27 -04:00
Joseph Doherty d656ee8805 feat: narrative format — third-person asterisk-action style with concrete-beat example
Rewrites the closing instruction in assemble_narrative_prompt to enforce
the asterisk-action / interleaved-beat format: actions wrapped in
*asterisks* in third person, dialogue as plain text between beats (no
quote marks), 2-4 short concrete beats per response, no inner monologue
or stage-direction adverbs. Includes a one-line worked example so the
model has a concrete target.

Was producing first-person prose blocks like 'I stare at you... "Well,
that's direct," I murmur'. Target style is short interleaved beats:
'*She turns with soapy hands to cup your face* That's how I know it's
real... *She kisses you softly* You love me when I'm messy...'

Drops narrative_max_tokens 400 -> 250 so the model can't drift into
multi-paragraph monologue. Bumps three test budgets to fit the larger
closing (closing grew ~80 -> ~200 tokens; tests still exercise the
same trim-order behavior, just with proportionally larger budgets).
2026-04-27 12:21:03 -04:00
Joseph Doherty fe9c497038 feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless
Adds RoutedLLMClient that dispatches by model name: requests matching
Settings.narrative_model go to Featherless, everything else (classifier
calls, embed) goes to a local MLX server. The local server is
mlx-omni-server (separate venv at .mlx-venv) and exposes the standard
OpenAI surface at http://127.0.0.1:10240/v1.

LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood)
but with a working embed() — Featherless's /v1/embeddings always
returns 500 with completions_error, so the router unconditionally
sends embed traffic to the local backend.

Production deployment overrides via data/config.toml:
- classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB)
- embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB,
  384 dim — matches existing schema, no migration)

Defaults stay remote / pseudo so fresh installs and tests need no
external infra. Smoke-tested live: classifier returns expected output,
BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer
to cat-on-rug than to quantum-mechanics).

scripts/start_mlx_server.sh starts the daemon (foreground or --daemon).
.mlx-venv/ added to .gitignore.

Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).
2026-04-27 12:05:41 -04:00
Joseph Doherty b3d78c1603 docs: clarify FeatherlessClient.embed() rationale (verified 500 + empty embedding catalog)
Updates the docstring + test docstring for the NotImplementedError stub
shipped in T112 (Phase 4.5). Original wording said Featherless 'does
not expose /v1/embeddings'; verified the endpoint actually responds
but always returns HTTP 500 with type='completions_error' for every
model tried (text-embedding-3-small, BAAI/bge-small-en-v1.5,
sentence-transformers/all-MiniLM-L6-v2, etc.) and /v1/models has no
embedding-class entries. Stub behavior unchanged.
2026-04-27 11:39:53 -04:00
dohertj2 a03f664407 Merge pull request 'Phase 4.5: cleanup — polish, branching, embeddings, lifecycle, deep-link' (#7) from phase-4.5 into main 2026-04-27 11:33:26 -04:00
Joseph Doherty cdfacdd0c4 merge: T118 phase 4.5 docs sweep — Phase 4.5 status + Phase 5 backlog 2026-04-27 07:04:01 -04:00
Joseph Doherty 5c4356e4e2 merge: T117 phase 4.5 cross-feature integration tests 2026-04-27 07:04:01 -04:00
Joseph Doherty 969f8963bc merge: T116 CannedQueue test fixture builder 2026-04-27 07:04:01 -04:00
Joseph Doherty f71613786b test: phase 4.5 cross-feature integration coverage (T117) 2026-04-27 07:03:56 -04:00
Joseph Doherty 4afaf01de7 test: structured CannedQueue fixture builder for classifier mocks (T116)
Phase 4.5 carry-over from Phase 3. Tests across test_turn_flow.py,
test_meanwhile_turn_flow.py, and the phase3/4 integration suites built
positional canned-response arrays for MockLLMClient — adding a new
classifier call to a code path required updating the array index in
many places.

This change ships tests/fixtures.py with a fluent CannedQueue builder
that lets tests declare classifier expectations by name and call order
instead of by index. Each method appends one item to an internal queue
and returns self for chaining; build() emits the flat list[str] queue
that MockLLMClient(canned=...) already consumes. The mock's contract
is unchanged.

Builder methods cover: parse_turn, detect_addressee, state_update
(with zero_state alias), detect_interjection,
detect_interjection_targeted, detect_scene_close,
detect_event_transitions, summarize_scene_pov, detect_threads,
meanwhile_digest, score_significance, and a narrative() helper for
streaming bot beats. raw() is a passthrough escape hatch.

Migration scope: ship the builder + 2 sanity tests + migrate 3
representative tests in test_turn_flow.py as proof of concept
(test_single_bot_turn_no_guest_regression,
test_turn_with_event_transition_appends_started_event,
test_turn_with_no_active_events_skips_classifier). The remaining
positional-array tests stay as-is; the builder docstring documents
the migration template for Phase 5 work.
2026-04-27 07:03:20 -04:00
Joseph Doherty 5bc9a94734 docs: phase 4.5 status, prune backlog, capture phase 5 candidates (T118) 2026-04-27 06:56:20 -04:00
Joseph Doherty cfc05a140c merge: T114 regenerate lifecycle rollback (back-ref + event_status_reverted) 2026-04-27 06:51:13 -04:00
Joseph Doherty 80ce891bd8 feat: regenerate rolls back lifecycle transitions on supersede (T114.3)
Closes the T83.4 gap: when ``regenerate_assistant_turn`` supersedes an
assistant_turn that already produced lifecycle transitions, it now
emits an ``event_status_reverted`` (T114.2) for each transition tagged
with ``triggered_by_assistant_turn_id == original_assistant_event_id``
(T114.1 back-reference) before the regenerated narrative is
reclassified.

Mapping from forward kind to ``prior_status`` lives in
``_PRIOR_STATUS_MAP``:
  - event_started   → planned
  - event_completed → active
  - event_cancelled → active (best-effort default; cancellation can fire
    from either planned or active, but detect_event_transitions only
    surfaces currently-active rows so 'active' is the realistic prior)

Backward compatibility: lifecycle rows authored before T114.1 lack the
back-reference field. Those are skipped (DEBUG log per row) and
collected into a legacy WARNING that preserves the T83.4
observability contract — operators still see un-rolled-back
transitions, just from older logs.

The classify-and-emit pass below the rollback now operates against an
events projection that has already been reverted, so re-firing
``event_started``/``event_completed``/``event_cancelled`` for the
regenerated narrative is safe — no double-emit of promotion artifacts.

Spec tests:
- ``test_regenerate_rolls_back_event_started_from_superseded_turn``
- ``test_regenerate_rolls_back_event_completed_to_active`` (also
  exercises the multi-rollback loop: a turn that fired both a start
  and a completion gets two event_status_reverted rows in id order,
  with active as the final projection — matching the per-row replay
  semantics of the projector)
- ``test_regenerate_skips_events_without_back_reference`` (pins the
  legacy compatibility path with both DEBUG and WARNING expectations)
2026-04-27 06:45:43 -04:00
Joseph Doherty 6d4ad86e33 feat: event_status_reverted event kind + projector handler (T114.2)
Adds the inverse projection used by T114.3's regenerate rollback. The
new ``event_status_reverted`` event kind carries
``{event_id, prior_status}`` and the handler unconditionally sets
``events.status = prior_status`` for the row.

Unlike the forward transitions (event_started / event_completed /
event_cancelled), this handler does NOT guard against terminal
statuses — its entire purpose is to reverse a transition, including
walking back from a terminal status to a non-terminal one. Without
that, rolling back an event_completed (status='completed' is terminal
for the forward handlers) would silently no-op and leave the row in
the post-superseded state.

The handler registers via the existing ``@on(kind)`` decorator pattern
in chat/eventlog/projector.py, so future replays of an event_log that
contains event_status_reverted rows pick it up automatically.

Test exercises completed→active, active→planned, and cancelled→active
round-trips.
2026-04-27 06:39:03 -04:00
Joseph Doherty 7370f68bdf feat: lifecycle events carry triggered_by_assistant_turn_id back-reference (T114.1)
Phase 3.5 T83.4 surfaced un-rolled-back lifecycle transitions on
regenerate; T114 wires up the actual rollback. Step 1 is the back-
reference: every event_started / event_completed / event_cancelled
emitted by post_turn (chat/web/turns.py) and regenerate
(chat/services/regenerate.py) now carries
``triggered_by_assistant_turn_id`` in its payload, set to the id of
the assistant_turn event that produced the transition.

Schema decision (Option A from the plan): no migration. The field is
a payload convention only — older event_log rows lack it and rollback
will skip them with a debug log when T114.3 lands. Forward-only.

The post_turn lifecycle block already runs AFTER the assistant_turn
event is appended (step 8a vs step 7), so ``primary_assistant_event_id``
is in scope. Same for regenerate: the lifecycle classification (step 9a)
runs after step 6's append. **No emission-order reorder was needed**
in either flow.

Updates ``test_turn_with_event_transition_appends_started_event`` to
assert the new field is present in the emitted event_started payload
and points at the assistant_turn id.
2026-04-27 06:38:48 -04:00
Joseph Doherty f863cf0158 merge: T113 branching read-side filter (active branch range respected by readers) 2026-04-27 06:25:50 -04:00
Joseph Doherty 456f50d334 feat: branching read-side filter — event readers consult active branch range (T113)
Wire the active branch's [origin_event_id, head_event_id] window into
every user-facing event/memory reader so switching branches actually
changes what dialogue and memories the user sees. Phase 4 T89/T94
shipped branches as metadata-only — this closes the loop.

Helper:
- chat/state/branches.py: add `active_branch_event_ids(conn)` returning
  the active branch's id range, with two defensive fall-throughs to
  `(0, BIG_INT)`: (a) no active branch row at all, and (b) the
  bootstrap "main" sentinel (name="main", origin=0, head=0). Production
  never bumps main's head_event_id today, so this preserves existing
  reader behaviour for every test that doesn't explicitly switch.

Readers updated (all user-facing dialogue / retrieval surfaces):
- chat/services/turn_common.py::read_recent_dialogue — chat-history
  prompt context + the chat-view template path (via web/turns.py +
  web/chat.py).
- chat/services/scene_summarize.py::_read_recent_dialogue — scene-close
  per-POV summary input.
- chat/state/memory.py::search_memories — FTS leg filters via
  m.event_id (T109's column); legacy NULL event_id rows are *included*
  unconditionally so the filter doesn't break pre-0014 retrieval. The
  fused (FTS + RRF + vector) path also drops vector hits whose
  event_id falls outside the branch window.
- chat/web/meanwhile.py::_read_recent_meanwhile_dialogue — meanwhile
  prompt context.

Projector queries (chat/state/world.py et al.) and admin/management
surfaces (drawer hide-panel, cross-chat search, regenerate's row
lookups by id) are intentionally NOT branch-filtered: projection must
see the full log to build state correctly, and the admin surfaces
operate across branches by design.

Tests (10 new, 446 total):
- tests/test_branches_state.py: 3 tests for `active_branch_event_ids`
  itself (bootstrap-main, no-active-branch, non-main literal range).
- tests/test_branching.py: 7 cross-feature tests covering the spec's
  five required scenarios plus scene_summarize and meanwhile readers.
2026-04-27 06:25:22 -04:00
Joseph Doherty 757abf24f8 merge: T112 real embedding model swap (Protocol + Mock + routing + backfill) 2026-04-27 06:08:13 -04:00
Joseph Doherty 9b7a6d459f feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)
Adds two new flags to the backfill script:

* --re-embed-all walks **every** memory (not just those without
  an existing embeddings row) and re-emits embedding_indexed
  events. The projector is INSERT OR REPLACE, so re-emitting an event
  for an existing memory replaces the prior vector. Use this when
  swapping embedding models — the default mode still keeps the Phase
  4 gap-fill behavior.
* --model M overrides Settings.embedding_model for this run.

The script also gains a small _build_client helper that returns
None for the pseudo path (no client needed) and a FeatherlessClient
otherwise; tests monkeypatch this to inject a Mock with canned
embeddings.

Adds tests/test_backfill_embeddings.py with three integration
tests: re-embed-all walks every memory, default mode skips existing
rows, and --model overrides the configured model end-to-end.
2026-04-27 06:02:23 -04:00
Joseph Doherty e0a28abbcd feat: generate_embedding routes non-default models through client.embed (T112.3)
When model != DEFAULT_EMBEDDING_MODEL, generate_embedding now
calls client.embed(text, model=model) and wraps the returned
vector in an EmbeddingResult tagged with the requested model.
On any exception (NotImplementedError from providers without an
embeddings endpoint, transient network errors, etc.), the existing
T107 warning fires and the function falls back to the zero-vector
sentinel — callers detect model == 'fallback' and skip indexing.

Adds:
- MockLLMClient accepts a canned_embeddings queue mirroring
  the existing canned pattern. embed() pops from the front;
  empty queue raises IndexError so misconfigured tests fail
  loudly.
- Settings.embedding_model defaults to "pseudo-sha256-384"
  so existing zero-config installs keep Phase 4 behavior. The app
  lifespan now passes this through to EmbeddingWorker.model.

The public signature of generate_embedding is unchanged:
(client, *, text, model=DEFAULT_EMBEDDING_MODEL, dim=..., timeout_s=...).
2026-04-27 05:50:29 -04:00
Joseph Doherty ac6e74ab4c feat: FeatherlessClient.embed() against /v1/embeddings (T112.2)
Implements embed() on FeatherlessClient. Featherless's OpenAI-
compatible surface does NOT expose /v1/embeddings at the time of
writing, so this implementation raises NotImplementedError rather
than issuing a request that would 404. The
chat.services.embeddings.generate_embedding wrapper (T112.3)
catches the exception and degrades to the zero-vector fallback path
(plus the existing T107 warning) — misconfigured callers fail loudly
in logs while the request path keeps working.

If/when Featherless ships embeddings, swap the body for
self._client.embeddings.create(model=..., input=...) guarded by
the existing 2-conn semaphore (mirrors generate/stream). The Protocol
seam in T112.1 is already wired so no other code needs to change.

Adds tests/test_featherless.py pinning the NotImplementedError
contract.
2026-04-27 05:48:34 -04:00
Joseph Doherty 5f16bb575a feat: LLMClient Protocol gains embed() method (T112.1)
Adds async def embed(self, text: str, *, model: str) -> list[float]
to the LLMClient Protocol so Phase 4.5 can wire a real-embedding swap
without changing call sites. Protocol is structural — existing
implementations that don't use it remain compatible; downstream
implementations (FeatherlessClient, MockLLMClient) ship in T112.2 and
T112.3.
2026-04-27 05:47:55 -04:00
Joseph Doherty f05d1e0f21 merge: T111 search UX (FTS snippet + turn deep-link) 2026-04-27 05:42:48 -04:00
Joseph Doherty 9987da2c07 feat: cross-chat search deep-links to turn via memories.event_id (T111.2)
Add ``m.event_id`` (T109's nullable column from migration 0014) to
``search_all_memories``'s SELECT, propagate it through the route's
template context, and have ``search.html`` build result links as
``/chats/{chat_id}#turn-{event_id}`` — matching the ``id="turn-{event_id}"``
anchor that Phase 3.5 T86 stamps on each turn DOM node so the chat page
scrolls to the originating turn on load. Memory rows projected before
the 0014 migration ran read NULL ``event_id``; the template falls back
to a chat-level link in that case so we never emit ``#turn-None``.

Pre-existing tests that asserted on the bare ``href="/chats/{chat_id}"``
contract are updated to assert on the ``href="/chats/{chat_id}#turn-``
prefix to reflect the new deep-link.
2026-04-27 05:42:17 -04:00
Joseph Doherty fa87ab8c55 feat: cross-chat search FTS snippet highlighting (T111.1)
Replace the ``pov_summary`` column in ``search_all_memories``'s SELECT
with ``snippet(memories_fts, 0, '<mark>', '</mark>', '…', 32)`` so each
match in a result row is wrapped in ``<mark>`` for the search-results
UI. The original ``pov_summary`` is still returned alongside as a
non-highlighted fallback. Template renders ``r.snippet|safe`` — the only
HTML in the snippet output is the configured ``<mark>`` markers, so it
is safe to bypass Jinja's auto-escape.
2026-04-27 05:30:32 -04:00
Joseph Doherty fae6edef6b merge: T110 drawer Phase 4.5 bundle (event_id guard + html.escape + Jinja partial + bulk re-rate) 2026-04-27 05:26:03 -04:00
Joseph Doherty 2ab8fcbdf0 feat: drawer bulk significance re-rate per chat (T110.4)
The drawer's Significance review panel previously only supported
per-memory edits. Adds a bulk control: pick ``level_from`` and
``level_to``, and every memory in the chat at ``level_from`` is moved
to ``level_to``.

Implementation emits one ``manual_edit`` event per matching memory
(not a single bulk event) so the §6.4 per-row audit trail stays
intact — each affected memory carries its own ``prior_value -> new_value``
snapshot, so an inverse edit can restore an individual row without
needing to inspect a bulk payload's member list. Reuses the existing
``memory_significance`` ``manual_edit`` projector branch (T25), so no
state-layer changes are required.

The route rejects no-op submissions (``level_from == level_to``) with
400 to avoid padding the event log with empty edits, and clamps both
levels to 0..3 (matching ``edit_memory_significance``).

UI: a small ``<details>`` block in the Significance review section
with two number inputs and a submit button.

Test: tests/test_drawer_phase4.py::test_bulk_significance_re_rate_emits_manual_edit_per_memory.
2026-04-27 05:14:59 -04:00
Joseph Doherty 5d5c888acf refactor: drawer delete-impact modal extracted to Jinja partial (T110.3)
The modal HTML was assembled via raw f-string concatenation in
``delete_preview``. Move it to a dedicated Jinja2 partial
(``chat/templates/_delete_impact_modal.html``) and render via
``TEMPLATES.TemplateResponse``. Jinja2 autoescape now handles HTML
safety automatically — the explicit ``html.escape()`` calls added in
T110.2 (and the ``import html``) become redundant and are removed in
this commit.

Net behavioural change: attribute quoting style flips from single to
double quotes (Jinja default) — the existing T98.4 substring-based
assertions are unaffected, and the new T110.3 test pins the
double-quoted shape so future regressions surface.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_uses_jinja_partial.
2026-04-27 05:13:36 -04:00
Joseph Doherty a45a33534f fix: drawer delete-impact modal HTML escapes user-controllable fields (T110.2)
The delete-impact modal is built via raw f-string concatenation from the
ImpactReport — item.kind / item.description / report.notes ultimately
embed user-controllable content (turn prose, scene timestamps). A turn
with prose like "<script>alert(1)</script>" would reach the rendered
HTML verbatim. Currently safe (the fields embedded today are bounded
strings) but defense-in-depth — wrap with html.escape() so future
description changes can't smuggle markup through.

Test: tests/test_drawer_phase4.py::test_delete_impact_modal_escapes_user_controllable_strings.
2026-04-27 05:12:28 -04:00
Joseph Doherty f3827706df fix: drawer delete_turn guards event_id <= 0 (T110.1)
A stale tab or hand-crafted request posting event_id=0 to the surgical
delete route would compute after_event_id=-1 and silently truncate the
entire log. Now rejected with 400.

SQLite assigns event_log ids starting at 1, so any legitimate id is
always >= 1 — non-positive values can only indicate a client bug.

Test: tests/test_drawer_phase4.py::test_delete_turn_with_event_id_zero_returns_400.
2026-04-27 05:11:39 -04:00
Joseph Doherty 2afbb9fefc merge: T109 schema 0014 — memories.event_id column 2026-04-27 05:01:17 -04:00
Joseph Doherty 1f8b4d2078 feat: 0014 schema — embeddings FK CASCADE (deferred or applied) + memories.event_id column (T109) 2026-04-27 05:00:57 -04:00
Joseph Doherty 3f1a284acb merge: T108 scene-close-on-cancel strengthen test + rationale 2026-04-27 04:48:00 -04:00
Joseph Doherty 87f93f00b5 merge: T107 embeddings.py fallback warning 2026-04-27 04:48:00 -04:00
Joseph Doherty d1e2902655 merge: T106 search.py N+1 batching + k constant 2026-04-27 04:48:00 -04:00
Joseph Doherty 54dfa8d611 merge: T105 snapshots.py polish 2026-04-27 04:48:00 -04:00
Joseph Doherty 5d36d3456f merge: T104 memory.py DRY MAX(id) + fts_rank doc 2026-04-27 04:48:00 -04:00
Joseph Doherty 0e9421dcf7 merge: T103 branches polish (global-leak doc + unknown-name warning) 2026-04-27 04:48:00 -04:00
Joseph Doherty baffeb3a44 chore: scene-close-on-cancel — strengthen regression test + document rationale (T108)
Investigation surfaced a transactional bug in the cancel path: when the
primary stream raises asyncio.CancelledError mid-stream, post_turn
re-raises at end-of-function, and open_db's dependency teardown skips
conn.commit() — rolling back ALL post-cancel writes including the
scene_closed event. The existing T74.3 regression test only passes
because asyncio is not imported at module scope, so CancelledError
becomes NameError (caught by except Exception, leaves cancelled=False).
Documented in turns.py + test docstring; deferred for triage.
2026-04-27 04:47:26 -04:00
Joseph Doherty 29b7c90b29 chore: embeddings.py warns on fallback for non-default models (T107) 2026-04-27 04:47:17 -04:00
Joseph Doherty 64c9ca634a chore: snapshots.py polish — hoisted imports + strict kind + mtime doc (T105) 2026-04-27 04:47:14 -04:00
Joseph Doherty 374a76c867 chore: branches polish — global-leak docs + unknown-name warning (T103) 2026-04-27 04:34:32 -04:00