Resolves the deferred operational and UX semantics from round 2 of the brainstorm. Decisions 22–43 in the appendix decisions log. New / expanded sections: - §3.3 Classifier failure handling (Pydantic-constrained + retry + schema- default fallback, 10s timeout, refusal triggers fallback-model swap). - §3.4 Edge update granularity (per-turn deltas + per-scene-close summary rewrite; all mutations go through edge_update events). - §4.3 Chat clock format (stored ISO 8601 UTC; displayed friendly relative). - §5.1 Authoring expanded (voice samples format, trait list as free-form phrases, backstory length target). - §5.4 "You" entity authoring (one-time, shared). - §6.4 Drawer expanded (v1 editable fields cut: activity, edges, memory; read-only: container, identity, witness, structural; manual_edit events). - §6.5 Activity record specifics (open verb + classifier-extracted props). - §7.4 Container authoring (parse-and-extend, per-chat scoped). - §7.5 Guest leaves mid-scene (auto close + new scene with you+host). - §8.5 Pinning (soft cap 8, score-3 auto-pin, manual pins never auto-evict). - §10 Rollback expanded with full impact-preview modal, snapshot frequency (100 events / 30 min periodic, pre-rewind always), inline regenerate UX with edit-then-regenerate. - §11.1 Significance rubric (0=Routine, 1=Notable, 2=Significant, 3=Pivotal) with usage and tie-breakers. - §16 UI Shape & Flow (top-level nav, first-run experience, display formatting, streaming UX, error UX). CLAUDE.md adds a "Behavioral defaults (round 2)" section flagging the load-bearing rules for future Claude sessions. §14 Open / Deferred Decisions trimmed to the genuinely-still-open list (embedding model, vector index choice, prompt templates, search, etc.).
11 KiB
Roleplay Engine
Local-first roleplay chat app that treats fiction as a simulation, not a chat log. The LLM is a renderer for structured world state — it does not hold state.
See rp-engine-design.md for the architectural design and docs/plans/2026-04-26-v1-requirements-design.md for the v1 product requirements & behavioral spec. This file is the working summary.
Why this exists
Fixes three failure modes of conventional RP chatbots:
- Memory loss — old context drops as history grows
- Quality decay — bots get terse and generic over long conversations
- Stale state pollution — bots fixate on past props (the "picnic basket" problem)
Hard scope constraints
- Single user, single machine (the user's Mac)
- Max 3 entities per scene:
you+ up to 2 bots (botA,botB) - Chat-only — no voice, no real-time
The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities.
Architecture
- Mac (always-on): web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
- Inference endpoint: stateless
generate(prompt, params) -> text. Swap implementations behind one interface. The orchestrator never knows which. - Streaming required for UX.
Runtime stack (locked for v1)
- Backend: Python 3.11+ with FastAPI.
- Frontend: server-rendered HTML + HTMX + minimal vanilla JS/CSS. No JS build chain.
- Live updates: SSE per chat. Per-chat
asyncio.Queuepub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates). - Inference backend: Featherless (OpenAI-compatible API).
narrative_model=dphn/Dolphin-Mistral-24B-Venice-Edition(32K ctx, uncensored).classifier_model=NousResearch/Hermes-3-Llama-3.1-8B(128K ctx, uncensored, structured-output reliable). Fallbacks:cognitivecomputations/dolphin-2.9.4-llama3-8b→mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated.
- Token budgets: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
- OOC marker:
((double parens))(configurable). - Data layout: everything under
<repo>/data/—chat.db,backups/,snapshots/,exports/,config.toml. The whole tree is.gitignored.CHAT_DB_PATHenv var honored as override. - Auth: bind to
127.0.0.1only in v1. No auth.
Behavioral defaults (locked in v1 brainstorm round 2)
- Significance scale: 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Score-3 turns auto-pin per witness. Drives retrieval ranking, compression, JSON exports.
- Edge updates: per-turn deltas (
affinity_delta,trust_delta,knowledge_facts,last_interaction); per-scene-close summary rewrite. Every mutation goes through the event log asedge_update. - Classifier failure handling: Pydantic-constrained → 1 retry with stricter reminder → schema-default fallback. 10s timeout. Never block the play loop. Refusals trigger fallback-model swap for that one call. Failures logged to
classifier_failurestable. - Activity verbs: open string + classifier-extracted
interruptible,required_attention,expected_duration. Attention is optional free-form; omit from prompt when empty. - Containers: parse-and-extend. Per-chat scoped. Kickoff parse seeds initial; transitions create new.
- Pinning: soft cap 8 / bot. Pivotal (score 3) = auto-pin. Manual pins never auto-evicted.
- Snapshots: periodic every 100 events / 30 min; pre-rewind always. 5 periodic retained; pre-rewind retained 14 days.
- Streaming: Stop button on streaming row; mid-stream disconnect commits partial with
truncated: true; Send disabled mid-stream; multi-tab streaming via per-chat SSE channel. - Display: lightweight markdown;
*action*italic; OOC((parens))shown dimmed/italic, never sent to bot.
Core concepts (vocabulary)
- Entity:
you | botA | botB. Has identity (immutable), state (mood/goals/status), activity, per-POV memory. - Container: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph.
- Activity record: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block.
- Relationship graph: 6 directed edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction.
- Scene configurations: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule.
- Witnessed-by flag: every memory has a 3-bit
[you, botA, botB]mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know. - Event: scoped lifecycle (
planned | active | completed | cancelled | expired) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory. - Active threads: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes.
- Scene: closes when container changes meaningfully or significant time passes. Compression boundary.
- Per-POV summary: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores.
- Time skip:
elision(skip the boring middle of an in-progress activity) vsjump(next morning, a week later). Skips run intervening events forward, compress, reset landing activity.
What promotes out of an event (and what doesn't)
- Object acquired → inventory
- Knowledge gained → edge
knowledgefield - Relationship change → edge summary
- Everything else stays in the closed event record. The blanket, the basket, the specific sandwich do not become memories. This rule is the whole point — don't bypass it.
Persistence
- SQLite (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
- sqlite-vss or sqlite-vec for embeddings (same DB file). Decide at Phase 4.
- JSON for snapshots, character templates, scene exports.
- No Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer.
Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch.
Event sourcing — non-negotiable
State is a projection of an append-only event log. State is never mutated directly — append an event, the projector applies it.
Event kinds: user_turn, assistant_turn, time_skip, event_triggered, edge_update, scene_transition, entity_state_change, activity_change.
This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes").
Determinism on replay: LLM calls are nondeterministic. Store the outcome in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay.
Snapshots every N events / M minutes so we don't replay everything on load. Log is source of truth.
Prompt construction
A speaker's prompt is assembled from their edges and their witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene.
Order (for speaker BotA, with you and BotB present):
- BotA identity + current state
- BotA → You edge
- BotA → BotB edge
- Group node (only if all three present)
- World state (time, weather, location)
- Active scene description
- Activity snapshot for all present entities
- Active threads
- Recent dialogue window
- Retrieved memories (top-K, witness-filtered, BotA-owned)
- Currently active events + their props
After every utterance, run a state-update pass on every present entity, not just the speaker. Silent witnesses still update edges.
Memory retrieval
- Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval).
- Retrieved: top-K vector search over the speaker's memory store, filtered by witness flag, with recency + significance boosts.
- Keep K small. Bloated retrieval poisons the prompt.
- Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4.
Implementation phases
- Core loop: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
- Multi-entity: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
- Events & skips: event queue with triggers, time skips, active threads, significance classifier.
- Polish: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds.
Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands.
Conventions for working in this repo
- Don't bypass the event log. Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong.
- Don't collapse directed edges.
botA → botBandbotB → botAare independent. Asymmetry is the point. - Don't promote event props to memory. Only the four promotion categories above survive an event closing.
- Per-POV, not omniscient. When writing scene summaries, write one per witness, from their angle.
- Witness filter every memory read. A bot must never see a memory their bit isn't set on.
- Activity block is always in the prompt. It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures.
- Streaming on the inference path; non-blocking bookkeeping (significance classification, embeddings, snapshots) runs while the LLM streams.
- No Docker, no extra services. SQLite + a process. Push back on suggestions to add infrastructure.
Open decisions (deferred — don't pre-decide)
- Token budget strategy (during Phase 1, with real prompts)
- Embedding model (Phase 4)
sqlite-vssvssqlite-vec(Phase 4)- UI framework (local web app / Tauri / Electron / native — TBD)
- Inference hosting (start with a cloud API, re-evaluate later)
- Character template format (during Phase 1)
- Multi-session / multi-character casts: out of scope for v1. Leave cheap schema hooks only.