# Roleplay Engine Local-first roleplay chat app that treats fiction as a **simulation**, not a chat log. The LLM is a renderer for structured world state — it does not hold state. See [rp-engine-design.md](rp-engine-design.md) for the architectural design and [docs/plans/2026-04-26-v1-requirements-design.md](docs/plans/2026-04-26-v1-requirements-design.md) for the v1 product requirements & behavioral spec. This file is the working summary. ## Why this exists Fixes three failure modes of conventional RP chatbots: 1. **Memory loss** — old context drops as history grows 2. **Quality decay** — bots get terse and generic over long conversations 3. **Stale state pollution** — bots fixate on past props (the "picnic basket" problem) ## Hard scope constraints - **Single user, single machine** (the user's Mac) - **Max 3 entities per scene**: `you` + up to 2 bots (`botA`, `botB`) - **Chat-only** — no voice, no real-time The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities. ## Architecture - **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state. - **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which. - Streaming required for UX. ## Runtime stack (locked for v1) - **Backend**: Python 3.11+ with **FastAPI**. - **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain. - **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates). - **Inference backend**: **Featherless** (OpenAI-compatible API). - `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored). - `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`. - **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include. - **OOC marker**: `((double parens))` (configurable). - **Data layout**: everything under `/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override. - **Auth**: bind to `127.0.0.1` only in v1. No auth. ## Behavioral defaults (locked in v1 brainstorm round 2) - **Significance scale**: 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Score-3 turns auto-pin per witness. Drives retrieval ranking, compression, JSON exports. - **Edge updates**: per-turn deltas (`affinity_delta`, `trust_delta`, `knowledge_facts`, `last_interaction`); per-scene-close summary rewrite. Every mutation goes through the event log as `edge_update`. - **Classifier failure handling**: Pydantic-constrained → 1 retry with stricter reminder → schema-default fallback. 10s timeout. Never block the play loop. Refusals trigger fallback-model swap for that one call. Failures logged to `classifier_failures` table. - **Activity verbs**: open string + classifier-extracted `interruptible`, `required_attention`, `expected_duration`. Attention is optional free-form; omit from prompt when empty. - **Containers**: parse-and-extend. Per-chat scoped. Kickoff parse seeds initial; transitions create new. - **Pinning**: soft cap 8 / bot. Pivotal (score 3) = auto-pin. Manual pins never auto-evicted. - **Snapshots**: periodic every 100 events / 30 min; pre-rewind always. 5 periodic retained; pre-rewind retained 14 days. - **Streaming**: Stop button on streaming row; mid-stream disconnect commits partial with `truncated: true`; Send disabled mid-stream; multi-tab streaming via per-chat SSE channel. - **Display**: lightweight markdown; `*action*` italic; OOC `((parens))` shown dimmed/italic, never sent to bot. - **Multi-entity defaults (Phase 2)**: when `chat.guest_bot_id is None`, behavior matches Phase 1 single-bot 1:1. With a guest, all 3 entities are present in the prompt, witness writes, and state-update fan-out (6 directed pairs). - **Addressee detection**: simple substring match (whole-word, case-insensitive) over the user turn's body. If both bot names match or neither does, the host gets the floor. - **Interjection**: classifier-driven, conservative bias (default false on classifier failure / refusal / parse error). When the classifier returns true, the addressee speaks first, then the non-addressee may interject in a follow-up turn. - **Per-POV summaries (multi-entity)**: each present witness with a memory store gets their own per-POV summary on scene close. The summary differs per bot based on persona + their edge to "you". The group node summary is updated alongside. ## Core concepts (vocabulary) - **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory. - **Container**: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph. - **Activity record**: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block. - **Relationship graph**: 6 **directed** edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction. - **Scene configurations**: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule. - **Witnessed-by flag**: every memory has a 3-bit `[you, botA, botB]` mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know. - **Event**: scoped lifecycle (`planned | active | completed | cancelled | expired`) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory. - **Active threads**: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes. - **Scene**: closes when container changes meaningfully or significant time passes. Compression boundary. - **Per-POV summary**: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores. - **Time skip**: `elision` (skip the boring middle of an in-progress activity) vs `jump` (next morning, a week later). Skips run intervening events forward, compress, reset landing activity. ## What promotes out of an event (and what doesn't) - Object acquired → inventory - Knowledge gained → edge `knowledge` field - Relationship change → edge summary - **Everything else stays in the closed event record.** The blanket, the basket, the specific sandwich do **not** become memories. This rule is the whole point — don't bypass it. ## Persistence - **SQLite** (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction. - **sqlite-vss** or **sqlite-vec** for embeddings (same DB file). Decide at Phase 4. - **JSON** for snapshots, character templates, scene exports. - **No** Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer. Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch. ## Event sourcing — non-negotiable State is a **projection** of an append-only event log. State is **never mutated directly** — append an event, the projector applies it. Event kinds: `user_turn`, `assistant_turn`, `time_skip`, `event_triggered`, `edge_update`, `scene_transition`, `entity_state_change`, `activity_change`. This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes"). **Determinism on replay**: LLM calls are nondeterministic. Store the *outcome* in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay. **Snapshots** every N events / M minutes so we don't replay everything on load. Log is source of truth. ## Prompt construction A speaker's prompt is assembled from **their** edges and **their** witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene. Order (for speaker BotA, with you and BotB present): 1. BotA identity + current state 2. BotA → You edge 3. BotA → BotB edge 4. Group node (only if all three present) 5. World state (time, weather, location) 6. Active scene description 7. Activity snapshot for **all** present entities 8. Active threads 9. Recent dialogue window 10. Retrieved memories (top-K, witness-filtered, BotA-owned) 11. Currently active events + their props After every utterance, run a state-update pass on **every present entity**, not just the speaker. Silent witnesses still update edges. ## Memory retrieval - Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval). - Retrieved: top-K vector search over **the speaker's** memory store, filtered by witness flag, with recency + significance boosts. - Keep K small. Bloated retrieval poisons the prompt. - Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4. ## Implementation phases 1. **Core loop**: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback. 2. **Multi-entity**: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression. 3. **Events & skips**: event queue with triggers, time skips, active threads, significance classifier. 4. **Polish**: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds. Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands. ## Conventions for working in this repo - **Don't bypass the event log.** Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong. - **Don't collapse directed edges.** `botA → botB` and `botB → botA` are independent. Asymmetry is the point. - **Don't promote event props to memory.** Only the four promotion categories above survive an event closing. - **Per-POV, not omniscient.** When writing scene summaries, write one per witness, from their angle. - **Witness filter every memory read.** A bot must never see a memory their bit isn't set on. - **Activity block is always in the prompt.** It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures. - **Streaming on the inference path; non-blocking bookkeeping** (significance classification, embeddings, snapshots) runs while the LLM streams. - **No Docker, no extra services.** SQLite + a process. Push back on suggestions to add infrastructure. ## Open decisions (deferred — don't pre-decide) - Token budget strategy (during Phase 1, with real prompts) - Embedding model (Phase 4) - `sqlite-vss` vs `sqlite-vec` (Phase 4) - UI framework (local web app / Tauri / Electron / native — TBD) - Inference hosting (start with a cloud API, re-evaluate later) - Character template format (during Phase 1) - Multi-session / multi-character casts: **out of scope for v1**. Leave cheap schema hooks only. ## Phase 1 status Phase 1 shipped end-to-end across **35 tasks** (T0–T35). The single-bot core loop is functional: event log + projector, schema + migrations, settings/bot authoring, kickoff confirm, streaming turns, drawer rendering, regenerate/rewind, scene close + per-POV summaries, significance classifier, snapshots/backups, first-run navigation, and friendly 404/500 pages. **168 tests passing.** Deferred to Phase 2: second bot, group node, scene configurations, witness filtering across multi-entity scenes, activity/containers, scene-transition compression. Phase 3: event queue + triggers, time skips, active threads. Phase 4: vector retrieval, branching, surgical delete + regenerate, impact-preview UI. ### Known v1 limitations (read before extending) - **Drawer edits scope**: only affinity, significance, and pin can be hand-edited from the drawer. Other v1 fields (knowledge, summary text, traits) are deferred to Phase 1.5. - **Cold-load snapshot path** is wired and unit-tested but rarely exercised in dev — long-running sessions are the only realistic trigger. - **WAL sidecar files** (`-wal`, `-shm`) are not captured in nightly backups; the nightly snapshot is a fresh `.backup()` so this is fine for restore but worth knowing if you copy the db file by hand. - **HTMX SSE event names** may need a version check if you bump the htmx CDN URL in `base.html` — the swap targets are name-coupled. - **"You" activity rows** can linger after `bot_reset` (the reset purges the bot's chats and the bot's own activity row but not the "you" row that was associated with those chats). Cosmetic, fixed in Phase 1.5. - **Projector replay is non-idempotent** for plain `INSERT` events. After appending, call `apply_event(conn, event)` for the new row only — calling `project(conn)` re-runs every handler from scratch and will trip uniqueness or duplicate inserts. - **8-pin auto-cap eviction** is FIFO over the auto-pinned set only. Manual pins survive the eviction; this is by design (manual intent > auto-pin signal). - **Regenerate (T29) does not broadcast `turn_html` over SSE** — the page must refresh to show the regenerated turn. Acceptable for v1 single-tab usage; Phase 1.5 should wire the SSE event. - **First-run middleware** fires only on bare `/` and `/chats`. Sub-paths like `/chats/` and `/chats//drawer` pass through (correct: HTMX partials should not page-redirect, and a deep-link to a missing chat should 404, not redirect mid-setup). ### Phase 1.5 cleanup backlog All items shipped — see Phase 2.5 status below. ## Phase 2 status Phase 2 shipped end-to-end across **13 tasks** (T36–T48 wave). The multi-entity surface is functional: chats can host a guest bot, the prompt assembly is guest-aware, post-turn fans out across all directed pairs, and scene close writes a per-POV summary per present witness plus a group_node summary. - **Multi-entity scene support**: chats can now have a guest bot (you + host + guest). The 3-entity cap holds. New event kinds: `guest_added`, `guest_removed`, `group_node_initialized`, `group_node_updated`. New table: `group_node` (members, summary, dynamic, threads). - **Drawer guest UX**: add/remove guest from the drawer side panel. The "have they met?" prose seed is parsed by the `relationship_seed` classifier into inter-bot directed edges (host↔guest). - **Multi-entity turn flow**: `post_turn` assembles narrative with the guest-aware prompt; writes memories for **all** present bot witnesses; runs state updates for **all** directed pairs (6 with 3 entities); detects interjections via classifier (default false; the addressee gets the floor first). - **Per-POV scene close summaries**: each present witness with a memory store gets their own per-POV summary on close; `group_node` summary updated alongside. - **Bot reset cascade**: resetting a bot now also clears `chats.guest_bot_id` references in other chats (root-cause fix for stale-guest references after T47). ### Phase 2.5 / 3 backlog All items shipped — see Phase 2.5 status below. ## Phase 2.5 status Phase 2.5 cleanup shipped end-to-end across 8 tasks (T68–T75). Two CLAUDE.md backlogs (Phase 1.5 cleanup, Phase 2.5/3) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 2.6 / 3 backlog" section below. - **`open_db` with check_same_thread parameter (T68)**: refactored `chat/db/connection.py` so `chat/web/bots.py:get_conn` no longer duplicates the PRAGMA setup. Default behavior preserved. - **`bot_reset` cross-chat cleanup (T69)**: now purges orphaned "you" activity rows. Note: this also fixed a latent FK constraint crash that was lurking in the projector — `activity.container_id` is FK-referenced and the prior code would have crashed on any reset of a bot whose chat had a non-NULL `container_id` "you" activity row. The bug was masked because no prior test seeded such a row. - **LLM-merged group meta-summary (T70)**: replaces Phase 2 T45's naive concat with a classifier merge call. Falls back to the naive concat on classifier failure. - **`prompt.py` polish (T71)**: witness role parametric (`host` vs `guest` derived from chat membership); single `ACTIVITIES:` block with bullet-level trim; NICE trim order kept with documented rationale (greedy cheapest-impact-first beats spec-listing order in practice). - **Drawer polish (T72)**: deferred v1 edits (edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove) + first-meeting gate (Add-guest form disables prose textarea when host→guest edge already exists; "re-seed anyway" toggle re-enables) + witness flag inline-edit (per-memory checkboxes for [you, host, guest] flags). Two new `manual_edit` projector branches: `edge_knowledge_fact` and `memory_witness`. - **Regenerate polish (T73)**: regenerate now broadcasts `turn_html_replace` over SSE (NEW event distinct from `turn_html` to avoid breaking the existing append-semantic consumer); regenerate covers interjection turns (re-detects + re-streams or supersedes); defensive stale-guest degrade removed. - **Turn-flow polish + addressee service (T74)**: classifier-based addressee detection (substring helper kept as no-guest fast path); SignificanceJob enqueued for interjection memories; scene-close-on-cancel pinned with comment + regression test (close detection is genuinely user-prose-only); defensive stale-guest degrade removed. ### Phase 2.6 / 3 backlog New follow-ups discovered during Phase 2.5 execution. None are blocking; pick up at any time. - **Frontend handler for `turn_html_replace` SSE event (from T73.1 review)**: regenerate's backend broadcast lands, but no live tab swaps the regenerated turn until a JS handler is wired. The existing `turn_html` event uses HTMX `sse-swap` to append; `turn_html_replace` ships JSON with `supersedes_id` for replacement semantics. Phase 2.6 should wire the JS to swap the prior turn's DOM node in place. - **Cancel/stop hook for in-flight regenerate streams (from T73 review)**: `post_turn` registers stream tasks in `_in_flight_tasks` so the user can stop them. Regenerate doesn't. A user clicking "Stop" mid-regenerate has no cancel hook today. - **DRY: regenerate vs post_turn (from T73 review)**: recent-dialogue assembly and prior-edges block are duplicated between `chat/services/regenerate.py` and `chat/web/turns.py`. Extract to shared helpers analogous to `_gather_state_update_inputs`. - **Sibling-discovery query optimization (from T73 review)**: `regenerate.py`'s sibling-assistant-turn lookup scans all non-superseded `assistant_turn` rows globally. Adding a `chat_id` predicate via JSON extraction (or a denormalized column) bounds the cost to per-chat scale. - **`_witness_role_for` defensive coding (from T71 review)**: helper returns `"guest"` when `host_bot_id is None`, which is wrong for Phase-1 chats. Defensive: `return "host" if host_bot_id is None or speaker_bot_id == host_bot_id else "guest"`. Not exercised by current tests; harden as a precaution. - **Confidence type tightening (from T74 review)**: `chat/services/addressee.py::AddresseeDecision.confidence` could be typed as `Literal["high","medium","low"]` for stricter validation. Currently `str` with a comment. - **Scene-close-on-cancel UX revisit**: T74.3 pinned the existing behavior (close fires even on cancel). If real play-testing surfaces a regression, revisit.