- .gitignore: add *.egg-info/ so editable installs don't show in git status. - pyproject.toml: add [build-system] and [tool.setuptools.packages.find] scoped to chat*, fixing pip install -e . which was failing on data/ auto-discovery. - CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status, capturing the small follow-ups surfaced in implementer reviews (open_db refactor, regenerate SSE broadcast, you-activity purge, drawer edits for deferred fields, NICE trim order).
15 KiB
Roleplay Engine
Local-first roleplay chat app that treats fiction as a simulation, not a chat log. The LLM is a renderer for structured world state — it does not hold state.
See rp-engine-design.md for the architectural design and docs/plans/2026-04-26-v1-requirements-design.md for the v1 product requirements & behavioral spec. This file is the working summary.
Why this exists
Fixes three failure modes of conventional RP chatbots:
- Memory loss — old context drops as history grows
- Quality decay — bots get terse and generic over long conversations
- Stale state pollution — bots fixate on past props (the "picnic basket" problem)
Hard scope constraints
- Single user, single machine (the user's Mac)
- Max 3 entities per scene:
you+ up to 2 bots (botA,botB) - Chat-only — no voice, no real-time
The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities.
Architecture
- Mac (always-on): web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
- Inference endpoint: stateless
generate(prompt, params) -> text. Swap implementations behind one interface. The orchestrator never knows which. - Streaming required for UX.
Runtime stack (locked for v1)
- Backend: Python 3.11+ with FastAPI.
- Frontend: server-rendered HTML + HTMX + minimal vanilla JS/CSS. No JS build chain.
- Live updates: SSE per chat. Per-chat
asyncio.Queuepub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates). - Inference backend: Featherless (OpenAI-compatible API).
narrative_model=dphn/Dolphin-Mistral-24B-Venice-Edition(32K ctx, uncensored).classifier_model=NousResearch/Hermes-3-Llama-3.1-8B(128K ctx, uncensored, structured-output reliable). Fallbacks:cognitivecomputations/dolphin-2.9.4-llama3-8b→mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated.
- Token budgets: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
- OOC marker:
((double parens))(configurable). - Data layout: everything under
<repo>/data/—chat.db,backups/,snapshots/,exports/,config.toml. The whole tree is.gitignored.CHAT_DB_PATHenv var honored as override. - Auth: bind to
127.0.0.1only in v1. No auth.
Behavioral defaults (locked in v1 brainstorm round 2)
- Significance scale: 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Score-3 turns auto-pin per witness. Drives retrieval ranking, compression, JSON exports.
- Edge updates: per-turn deltas (
affinity_delta,trust_delta,knowledge_facts,last_interaction); per-scene-close summary rewrite. Every mutation goes through the event log asedge_update. - Classifier failure handling: Pydantic-constrained → 1 retry with stricter reminder → schema-default fallback. 10s timeout. Never block the play loop. Refusals trigger fallback-model swap for that one call. Failures logged to
classifier_failurestable. - Activity verbs: open string + classifier-extracted
interruptible,required_attention,expected_duration. Attention is optional free-form; omit from prompt when empty. - Containers: parse-and-extend. Per-chat scoped. Kickoff parse seeds initial; transitions create new.
- Pinning: soft cap 8 / bot. Pivotal (score 3) = auto-pin. Manual pins never auto-evicted.
- Snapshots: periodic every 100 events / 30 min; pre-rewind always. 5 periodic retained; pre-rewind retained 14 days.
- Streaming: Stop button on streaming row; mid-stream disconnect commits partial with
truncated: true; Send disabled mid-stream; multi-tab streaming via per-chat SSE channel. - Display: lightweight markdown;
*action*italic; OOC((parens))shown dimmed/italic, never sent to bot.
Core concepts (vocabulary)
- Entity:
you | botA | botB. Has identity (immutable), state (mood/goals/status), activity, per-POV memory. - Container: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph.
- Activity record: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block.
- Relationship graph: 6 directed edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction.
- Scene configurations: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule.
- Witnessed-by flag: every memory has a 3-bit
[you, botA, botB]mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know. - Event: scoped lifecycle (
planned | active | completed | cancelled | expired) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory. - Active threads: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes.
- Scene: closes when container changes meaningfully or significant time passes. Compression boundary.
- Per-POV summary: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores.
- Time skip:
elision(skip the boring middle of an in-progress activity) vsjump(next morning, a week later). Skips run intervening events forward, compress, reset landing activity.
What promotes out of an event (and what doesn't)
- Object acquired → inventory
- Knowledge gained → edge
knowledgefield - Relationship change → edge summary
- Everything else stays in the closed event record. The blanket, the basket, the specific sandwich do not become memories. This rule is the whole point — don't bypass it.
Persistence
- SQLite (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
- sqlite-vss or sqlite-vec for embeddings (same DB file). Decide at Phase 4.
- JSON for snapshots, character templates, scene exports.
- No Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer.
Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch.
Event sourcing — non-negotiable
State is a projection of an append-only event log. State is never mutated directly — append an event, the projector applies it.
Event kinds: user_turn, assistant_turn, time_skip, event_triggered, edge_update, scene_transition, entity_state_change, activity_change.
This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes").
Determinism on replay: LLM calls are nondeterministic. Store the outcome in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay.
Snapshots every N events / M minutes so we don't replay everything on load. Log is source of truth.
Prompt construction
A speaker's prompt is assembled from their edges and their witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene.
Order (for speaker BotA, with you and BotB present):
- BotA identity + current state
- BotA → You edge
- BotA → BotB edge
- Group node (only if all three present)
- World state (time, weather, location)
- Active scene description
- Activity snapshot for all present entities
- Active threads
- Recent dialogue window
- Retrieved memories (top-K, witness-filtered, BotA-owned)
- Currently active events + their props
After every utterance, run a state-update pass on every present entity, not just the speaker. Silent witnesses still update edges.
Memory retrieval
- Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval).
- Retrieved: top-K vector search over the speaker's memory store, filtered by witness flag, with recency + significance boosts.
- Keep K small. Bloated retrieval poisons the prompt.
- Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4.
Implementation phases
- Core loop: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
- Multi-entity: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
- Events & skips: event queue with triggers, time skips, active threads, significance classifier.
- Polish: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds.
Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands.
Conventions for working in this repo
- Don't bypass the event log. Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong.
- Don't collapse directed edges.
botA → botBandbotB → botAare independent. Asymmetry is the point. - Don't promote event props to memory. Only the four promotion categories above survive an event closing.
- Per-POV, not omniscient. When writing scene summaries, write one per witness, from their angle.
- Witness filter every memory read. A bot must never see a memory their bit isn't set on.
- Activity block is always in the prompt. It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures.
- Streaming on the inference path; non-blocking bookkeeping (significance classification, embeddings, snapshots) runs while the LLM streams.
- No Docker, no extra services. SQLite + a process. Push back on suggestions to add infrastructure.
Open decisions (deferred — don't pre-decide)
- Token budget strategy (during Phase 1, with real prompts)
- Embedding model (Phase 4)
sqlite-vssvssqlite-vec(Phase 4)- UI framework (local web app / Tauri / Electron / native — TBD)
- Inference hosting (start with a cloud API, re-evaluate later)
- Character template format (during Phase 1)
- Multi-session / multi-character casts: out of scope for v1. Leave cheap schema hooks only.
Phase 1 status
Phase 1 shipped end-to-end across 35 tasks (T0–T35). The single-bot core loop is functional: event log + projector, schema + migrations, settings/bot authoring, kickoff confirm, streaming turns, drawer rendering, regenerate/rewind, scene close + per-POV summaries, significance classifier, snapshots/backups, first-run navigation, and friendly 404/500 pages. 168 tests passing.
Deferred to Phase 2: second bot, group node, scene configurations, witness filtering across multi-entity scenes, activity/containers, scene-transition compression. Phase 3: event queue + triggers, time skips, active threads. Phase 4: vector retrieval, branching, surgical delete + regenerate, impact-preview UI.
Known v1 limitations (read before extending)
- Drawer edits scope: only affinity, significance, and pin can be hand-edited from the drawer. Other v1 fields (knowledge, summary text, traits) are deferred to Phase 1.5.
- Cold-load snapshot path is wired and unit-tested but rarely exercised in dev — long-running sessions are the only realistic trigger.
- WAL sidecar files (
-wal,-shm) are not captured in nightly backups; the nightly snapshot is a fresh.backup()so this is fine for restore but worth knowing if you copy the db file by hand. - HTMX SSE event names may need a version check if you bump the htmx CDN URL in
base.html— the swap targets are name-coupled. - "You" activity rows can linger after
bot_reset(the reset purges the bot's chats and the bot's own activity row but not the "you" row that was associated with those chats). Cosmetic, fixed in Phase 1.5. - Projector replay is non-idempotent for plain
INSERTevents. After appending, callapply_event(conn, event)for the new row only — callingproject(conn)re-runs every handler from scratch and will trip uniqueness or duplicate inserts. - 8-pin auto-cap eviction is FIFO over the auto-pinned set only. Manual pins survive the eviction; this is by design (manual intent > auto-pin signal).
- Regenerate (T29) does not broadcast
turn_htmlover SSE — the page must refresh to show the regenerated turn. Acceptable for v1 single-tab usage; Phase 1.5 should wire the SSE event. - First-run middleware fires only on bare
/and/chats. Sub-paths like/chats/<id>and/chats/<id>/drawerpass through (correct: HTMX partials should not page-redirect, and a deep-link to a missing chat should 404, not redirect mid-setup).
Phase 1.5 cleanup backlog
Small follow-ups identified during Phase 1 reviews. Pick up at any time; none are blocking.
open_dbrefactor.chat/web/bots.py:get_conn()duplicates the context-manager body to addcheck_same_thread=False. Extendopen_db(path, *, check_same_thread=True)and haveget_conncall it directly — eliminates the duplicated PRAGMA setup and ensures any future PRAGMA tweak only happens in one place.- Regenerate broadcasts
turn_htmlover SSE. Currently a refresh is needed (see T29 limitation above). Mirror the broadcast logic fromchat/web/turns.py:post_turnafter the newassistant_turnlands. bot_resetpurges orphaned "you" activity rows (see limitation above). Either deleteactivityrows by chat-membership or accept the noise indefinitely; the projection-layer fix is one extraDELETE FROM activity WHERE entity_id='you' AND container_id IN (SELECT id FROM containers WHERE chat_id IN (...))clause inside_apply_bot_reset.- Drawer edits for the deferred v1 fields: edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove. The
manual_editprojector already supportsedge_trust/edge_summary/memory_pov_summarytarget_kinds — only the routes are missing. Knowledge_facts needs a new dispatch branch. - NICE trim order in prompt assembly drops previous-scene first instead of last (T18 review). Greedy-cuts heuristic vs spec listing order; revisit if v1 play surfaces a real regression.