eb2f814f25
- rp-engine-design.md: full design for the simulation-based roleplay engine (entities, containers, directed relationship graph, witnessed-by memory, scoped events, scene compression, event-sourced state, time skips). - CLAUDE.md: working summary and conventions for development.
127 lines
8.0 KiB
Markdown
127 lines
8.0 KiB
Markdown
# Roleplay Engine
|
|
|
|
Local-first roleplay chat app that treats fiction as a **simulation**, not a chat log. The LLM is a renderer for structured world state — it does not hold state.
|
|
|
|
See [rp-engine-design.md](rp-engine-design.md) for the full design. This file is the working summary.
|
|
|
|
## Why this exists
|
|
|
|
Fixes three failure modes of conventional RP chatbots:
|
|
|
|
1. **Memory loss** — old context drops as history grows
|
|
2. **Quality decay** — bots get terse and generic over long conversations
|
|
3. **Stale state pollution** — bots fixate on past props (the "picnic basket" problem)
|
|
|
|
## Hard scope constraints
|
|
|
|
- **Single user, single machine** (the user's Mac)
|
|
- **Max 3 entities per scene**: `you` + up to 2 bots (`botA`, `botB`)
|
|
- **Chat-only** — no voice, no real-time
|
|
|
|
The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities.
|
|
|
|
## Architecture
|
|
|
|
- **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
|
|
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which.
|
|
- Streaming required for UX.
|
|
|
|
## Core concepts (vocabulary)
|
|
|
|
- **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
|
|
- **Container**: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph.
|
|
- **Activity record**: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block.
|
|
- **Relationship graph**: 6 **directed** edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction.
|
|
- **Scene configurations**: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule.
|
|
- **Witnessed-by flag**: every memory has a 3-bit `[you, botA, botB]` mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know.
|
|
- **Event**: scoped lifecycle (`planned | active | completed | cancelled | expired`) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory.
|
|
- **Active threads**: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes.
|
|
- **Scene**: closes when container changes meaningfully or significant time passes. Compression boundary.
|
|
- **Per-POV summary**: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores.
|
|
- **Time skip**: `elision` (skip the boring middle of an in-progress activity) vs `jump` (next morning, a week later). Skips run intervening events forward, compress, reset landing activity.
|
|
|
|
## What promotes out of an event (and what doesn't)
|
|
|
|
- Object acquired → inventory
|
|
- Knowledge gained → edge `knowledge` field
|
|
- Relationship change → edge summary
|
|
- **Everything else stays in the closed event record.** The blanket, the basket, the specific sandwich do **not** become memories. This rule is the whole point — don't bypass it.
|
|
|
|
## Persistence
|
|
|
|
- **SQLite** (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
|
|
- **sqlite-vss** or **sqlite-vec** for embeddings (same DB file). Decide at Phase 4.
|
|
- **JSON** for snapshots, character templates, scene exports.
|
|
- **No** Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer.
|
|
|
|
Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch.
|
|
|
|
## Event sourcing — non-negotiable
|
|
|
|
State is a **projection** of an append-only event log. State is **never mutated directly** — append an event, the projector applies it.
|
|
|
|
Event kinds: `user_turn`, `assistant_turn`, `time_skip`, `event_triggered`, `edge_update`, `scene_transition`, `entity_state_change`, `activity_change`.
|
|
|
|
This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes").
|
|
|
|
**Determinism on replay**: LLM calls are nondeterministic. Store the *outcome* in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay.
|
|
|
|
**Snapshots** every N events / M minutes so we don't replay everything on load. Log is source of truth.
|
|
|
|
## Prompt construction
|
|
|
|
A speaker's prompt is assembled from **their** edges and **their** witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene.
|
|
|
|
Order (for speaker BotA, with you and BotB present):
|
|
|
|
1. BotA identity + current state
|
|
2. BotA → You edge
|
|
3. BotA → BotB edge
|
|
4. Group node (only if all three present)
|
|
5. World state (time, weather, location)
|
|
6. Active scene description
|
|
7. Activity snapshot for **all** present entities
|
|
8. Active threads
|
|
9. Recent dialogue window
|
|
10. Retrieved memories (top-K, witness-filtered, BotA-owned)
|
|
11. Currently active events + their props
|
|
|
|
After every utterance, run a state-update pass on **every present entity**, not just the speaker. Silent witnesses still update edges.
|
|
|
|
## Memory retrieval
|
|
|
|
- Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval).
|
|
- Retrieved: top-K vector search over **the speaker's** memory store, filtered by witness flag, with recency + significance boosts.
|
|
- Keep K small. Bloated retrieval poisons the prompt.
|
|
- Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4.
|
|
|
|
## Implementation phases
|
|
|
|
1. **Core loop**: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
|
|
2. **Multi-entity**: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
|
|
3. **Events & skips**: event queue with triggers, time skips, active threads, significance classifier.
|
|
4. **Polish**: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds.
|
|
|
|
Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands.
|
|
|
|
## Conventions for working in this repo
|
|
|
|
- **Don't bypass the event log.** Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong.
|
|
- **Don't collapse directed edges.** `botA → botB` and `botB → botA` are independent. Asymmetry is the point.
|
|
- **Don't promote event props to memory.** Only the four promotion categories above survive an event closing.
|
|
- **Per-POV, not omniscient.** When writing scene summaries, write one per witness, from their angle.
|
|
- **Witness filter every memory read.** A bot must never see a memory their bit isn't set on.
|
|
- **Activity block is always in the prompt.** It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures.
|
|
- **Streaming on the inference path; non-blocking bookkeeping** (significance classification, embeddings, snapshots) runs while the LLM streams.
|
|
- **No Docker, no extra services.** SQLite + a process. Push back on suggestions to add infrastructure.
|
|
|
|
## Open decisions (deferred — don't pre-decide)
|
|
|
|
- Token budget strategy (during Phase 1, with real prompts)
|
|
- Embedding model (Phase 4)
|
|
- `sqlite-vss` vs `sqlite-vec` (Phase 4)
|
|
- UI framework (local web app / Tauri / Electron / native — TBD)
|
|
- Inference hosting (start with a cloud API, re-evaluate later)
|
|
- Character template format (during Phase 1)
|
|
- Multi-session / multi-character casts: **out of scope for v1**. Leave cheap schema hooks only.
|