From eb2f814f25a323d608117bd4b7ab92dfa6be33b0 Mon Sep 17 00:00:00 2001
From: Joseph Doherty <dohejw01@gmail.com>
Date: Sun, 26 Apr 2026 10:08:33 -0400
Subject: [PATCH] Initial commit: roleplay engine design and CLAUDE.md

- rp-engine-design.md: full design for the simulation-based roleplay engine
  (entities, containers, directed relationship graph, witnessed-by memory,
  scoped events, scene compression, event-sourced state, time skips).
- CLAUDE.md: working summary and conventions for development.
---
 CLAUDE.md           | 126 +++++++++++
 rp-engine-design.md | 525 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 651 insertions(+)
 create mode 100644 CLAUDE.md
 create mode 100644 rp-engine-design.md

diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..65e9b4f
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,126 @@
+# Roleplay Engine
+
+Local-first roleplay chat app that treats fiction as a **simulation**, not a chat log. The LLM is a renderer for structured world state — it does not hold state.
+
+See [rp-engine-design.md](rp-engine-design.md) for the full design. This file is the working summary.
+
+## Why this exists
+
+Fixes three failure modes of conventional RP chatbots:
+
+1. **Memory loss** — old context drops as history grows
+2. **Quality decay** — bots get terse and generic over long conversations
+3. **Stale state pollution** — bots fixate on past props (the "picnic basket" problem)
+
+## Hard scope constraints
+
+- **Single user, single machine** (the user's Mac)
+- **Max 3 entities per scene**: `you` + up to 2 bots (`botA`, `botB`)
+- **Chat-only** — no voice, no real-time
+
+The 3-entity cap is load-bearing: it makes the relationship graph fully enumerable (6 directed edges + 1 group node). Don't design for N entities.
+
+## Architecture
+
+- **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
+- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which.
+- Streaming required for UX.
+
+## Core concepts (vocabulary)
+
+- **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
+- **Container**: anything with slots that holds entities (car, booth, room). Has properties (moving, public, audible range). Spatial grounding lives here, separate from the relationship graph.
+- **Activity record**: per-entity live struct — position (container+slot), posture, current action (verb, duration, interruptible, required_attention), holding, attention, status. Always in the prompt as a small structured block.
+- **Relationship graph**: 6 **directed** edges (asymmetric feelings matter — never collapse to a single shared field) + 1 group node. Edges hold affinity, trust, summary, knowledge-known-about-target, private moments, last-interaction.
+- **Scene configurations**: exactly 4 — solo with botA, solo with botB, all three present, botA+botB without you ("meanwhile…"). Each has a fixed prompt-loading rule.
+- **Witnessed-by flag**: every memory has a 3-bit `[you, botA, botB]` mask. A speaker only sees memories where their bit is set. This is the mechanism that prevents bots referencing things they can't know.
+- **Event**: scoped lifecycle (`planned | active | completed | cancelled | expired`) with its own props, preconditions, on_start/on_complete hooks, significance. Solves the picnic-basket problem — props live and die with the event, only narrative gist promotes to memory.
+- **Active threads**: unresolved plot tensions. Sticky in context until resolved/dropped. Cheap, anchor continuity across compressed scenes.
+- **Scene**: closes when container changes meaningfully or significant time passes. Compression boundary.
+- **Per-POV summary**: every witness gets their own record of a closed scene, written from their POV. Different details, different interpretations. This is what gives bots inner lives — never write omniscient narration into per-POV stores.
+- **Time skip**: `elision` (skip the boring middle of an in-progress activity) vs `jump` (next morning, a week later). Skips run intervening events forward, compress, reset landing activity.
+
+## What promotes out of an event (and what doesn't)
+
+- Object acquired → inventory
+- Knowledge gained → edge `knowledge` field
+- Relationship change → edge summary
+- **Everything else stays in the closed event record.** The blanket, the basket, the specific sandwich do **not** become memories. This rule is the whole point — don't bypass it.
+
+## Persistence
+
+- **SQLite** (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
+- **sqlite-vss** or **sqlite-vec** for embeddings (same DB file). Decide at Phase 4.
+- **JSON** for snapshots, character templates, scene exports.
+- **No** Postgres, Redis, Pinecone, Docker. Single-user; don't over-engineer.
+
+Schema is event-sourced. See design doc § "Persistence Layer" for the full sketch.
+
+## Event sourcing — non-negotiable
+
+State is a **projection** of an append-only event log. State is **never mutated directly** — append an event, the projector applies it.
+
+Event kinds: `user_turn`, `assistant_turn`, `time_skip`, `event_triggered`, `edge_update`, `scene_transition`, `entity_state_change`, `activity_change`.
+
+This buys: free rewind, trivial replay-debugging, schema migrations against the same log, branching ("what if BotA had said yes").
+
+**Determinism on replay**: LLM calls are nondeterministic. Store the *outcome* in the event payload — on replay, use the stored outcome. Never re-call the LLM during replay.
+
+**Snapshots** every N events / M minutes so we don't replay everything on load. Log is source of truth.
+
+## Prompt construction
+
+A speaker's prompt is assembled from **their** edges and **their** witnessed memories — never the global state. BotA and BotB are effectively two separate agents who happen to share a scene.
+
+Order (for speaker BotA, with you and BotB present):
+
+1. BotA identity + current state
+2. BotA → You edge
+3. BotA → BotB edge
+4. Group node (only if all three present)
+5. World state (time, weather, location)
+6. Active scene description
+7. Activity snapshot for **all** present entities
+8. Active threads
+9. Recent dialogue window
+10. Retrieved memories (top-K, witness-filtered, BotA-owned)
+11. Currently active events + their props
+
+After every utterance, run a state-update pass on **every present entity**, not just the speaker. Silent witnesses still update edges.
+
+## Memory retrieval
+
+- Always-loaded: pinned, current scene, active threads, recent N scenes (no retrieval).
+- Retrieved: top-K vector search over **the speaker's** memory store, filtered by witness flag, with recency + significance boosts.
+- Keep K small. Bloated retrieval poisons the prompt.
+- Phase 1: SQLite FTS5 is enough. Vector search comes at Phase 4.
+
+## Implementation phases
+
+1. **Core loop**: schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
+2. **Multi-entity**: second bot, group node, scene configs, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
+3. **Events & skips**: event queue with triggers, time skips, active threads, significance classifier.
+4. **Polish**: vector retrieval, branching, surgical delete + regenerate, snapshots, backups, impact-preview UI for rewinds.
+
+Don't jump phases. Phase 1 must work end-to-end before Phase 2 lands.
+
+## Conventions for working in this repo
+
+- **Don't bypass the event log.** Any state change goes through an event. If you're tempted to UPDATE a row directly, you're doing it wrong.
+- **Don't collapse directed edges.** `botA → botB` and `botB → botA` are independent. Asymmetry is the point.
+- **Don't promote event props to memory.** Only the four promotion categories above survive an event closing.
+- **Per-POV, not omniscient.** When writing scene summaries, write one per witness, from their angle.
+- **Witness filter every memory read.** A bot must never see a memory their bit isn't set on.
+- **Activity block is always in the prompt.** It's the spatial anchor that prevents "leaning on the kitchen counter while in a car" failures.
+- **Streaming on the inference path; non-blocking bookkeeping** (significance classification, embeddings, snapshots) runs while the LLM streams.
+- **No Docker, no extra services.** SQLite + a process. Push back on suggestions to add infrastructure.
+
+## Open decisions (deferred — don't pre-decide)
+
+- Token budget strategy (during Phase 1, with real prompts)
+- Embedding model (Phase 4)
+- `sqlite-vss` vs `sqlite-vec` (Phase 4)
+- UI framework (local web app / Tauri / Electron / native — TBD)
+- Inference hosting (start with a cloud API, re-evaluate later)
+- Character template format (during Phase 1)
+- Multi-session / multi-character casts: **out of scope for v1**. Leave cheap schema hooks only.
diff --git a/rp-engine-design.md b/rp-engine-design.md
new file mode 100644
index 0000000..8c3a2ac
--- /dev/null
+++ b/rp-engine-design.md
@@ -0,0 +1,525 @@
+# Roleplay Engine: Design Document
+
+## Project Overview
+
+A roleplay chat application that treats fiction as a simulation rather than a chat log. The goal is to fix three persistent problems with conventional RP chatbots:
+
+1. **Memory loss over time** — old context gets dropped as history grows
+2. **Response quality decay** — bots become terse and generic as conversations lengthen
+3. **Stale state pollution** — bots fixate on past scene details (e.g., bringing a picnic basket to every subsequent scene after one picnic)
+
+The fix is to model the world as structured state (locations, time, who's present, what they're doing, what they remember, how they feel about each other) and use the LLM as a renderer for that state, not as the state-holder itself.
+
+## System Architecture
+
+### Topology
+
+- **Mac (always-on, local hardware)**: hosts the web UI, orchestrator, persistence layer, event queue, retrieval, prompt construction, and all state management
+- **Inference endpoint (remote or local)**: the LLM, called as a stateless service. Options: cloud APIs (Anthropic, OpenAI, OpenRouter), rented GPU (Runpod, Vast.ai), or local on-Mac (MLX, llama.cpp). The orchestrator doesn't care which.
+
+### Why this separation
+
+- Inference is the only piece that needs a GPU; everything else is light orchestration
+- Decoupling lets us swap models freely (different models for different tasks: small/fast for classifiers, larger for narrative)
+- State survives model outages
+- Debugging the orchestrator doesn't require burning inference
+
+### LLM client interface
+
+Define a single function: `generate(prompt, params) -> text`. Behind it, swap implementations (`AnthropicClient`, `OpenAIClient`, `LocalMLXClient`, `MockClient` for testing, etc.). The orchestrator never knows which is in use.
+
+Streaming is required for usable UX even though latency itself isn't a concern (chat-only, no voice).
+
+## Scope Constraints
+
+- **Single user** (you), single machine
+- **Maximum 3 entities in any scene**: you + up to 2 bots
+- **Chat-only**, no voice or real-time requirements
+
+The 3-entity cap simplifies the relationship graph dramatically — every relationship combination is enumerable.
+
+## Core Concepts
+
+### Entities
+
+Three possible entities: `you`, `botA`, `botB`. Each has:
+
+- **Identity**: name, persona, voice, core traits (immutable per session)
+- **State**: mood, current goals, status (conscious, sober, injured, etc.)
+- **Activity**: what they're doing right now (see below)
+- **Memory**: per-POV log of what they've witnessed/learned
+
+### Containers and Activity
+
+Spatial/embodied state, separate from the relationship graph.
+
+A **container** is anything that holds entities with defined slots: car (driver, front passenger, rear seats), restaurant booth, living room, hiking trail. Containers have properties (moving/stationary, public/private, audible range, who-can-see-whom constraints).
+
+Each entity has a live **activity record**:
+
+```
+EntityActivity {
+  position: { container, slot }
+  posture: standing | sitting | lying | etc.
+  current_action: { verb, started_at, expected_duration, interruptible, required_attention }
+  holding: [ items ]
+  attention: where their focus is directed
+  status: { conscious, sober, injured, ... }
+}
+```
+
+Containers create implicit constraints the LLM can reason over. "You are driving, BotB is in the back seat" is enough for the model to know BotB can't suddenly grab your hand. We don't enumerate every constraint — we surface the spatial context and let the model handle it.
+
+**Required attention** matters: high-attention actions (driving) create natural friction with conversation, which the LLM can dramatize. Low-attention actions don't.
+
+Activity state always lives in the prompt as a small structured block. This anchors spatial grounding and prevents the "BotA leaning against the kitchen counter when you're actually in a car" failure mode.
+
+### Relationship Graph (fixed cardinality)
+
+For 3 entities, there are exactly 6 directed edges + 1 group node:
+
+```
+Edges (directed):
+  you → botA, botA → you
+  you → botB, botB → you
+  botA → botB, botB → botA
+
+Group node:
+  {you, botA, botB}
+```
+
+Edges are directed because asymmetric feelings exist (BotA may secretly resent BotB while BotB thinks they're best friends). A single shared "relationship" field flattens this — bad.
+
+**Edge contents**: affinity, trust, summary, knowledge known about target, private moments shared, last interaction time.
+
+**Group node contents**: shared history summary, group dynamic, inside jokes, active threads (unresolved plot tensions).
+
+### Scene Configurations
+
+Exactly 4 social configurations, each with a defined prompt-loading rule:
+
+1. **Solo with BotA**: load `you ↔ botA` edges only
+2. **Solo with BotB**: load `you ↔ botB` edges only
+3. **All three present**: load all 6 edges + group node
+4. **BotA and BotB without you** ("meanwhile..."): load `botA ↔ botB` edges only, no group node
+
+Configuration 4 is optional but powerful — it lets BotA and BotB develop their relationship offscreen, creating dramatic possibilities.
+
+### Witnessed-By Tracking
+
+Every memory has a 3-bit witness flag: `[you, botA, botB]`.
+
+- Private moment between you and BotA: `[1,1,0]`
+- Group scene: `[1,1,1]`
+- BotA tells BotB about it secondhand: creates a new memory in BotB's store flagged `[0,0,1]` with `source: botA, reliability: ?`
+
+A bot speaking only has access to memories where their bit is set. This is the mechanism that prevents bots from referencing things they can't possibly know.
+
+### Events
+
+Events solve the "picnic basket" problem by scoping props to event lifecycles instead of promoting them to permanent memory.
+
+```
+Event {
+  id, name
+  status: planned | active | completed | cancelled | expired
+  trigger: { type, condition }
+    type: time-based | location-based | condition-based | manual
+  scope: { participants, container, duration }
+  props: [ items, topics, plot_threads ]   // scoped to this event
+  preconditions: [ ... ]
+  on_start: [ state_changes ]
+  on_complete: [ state_changes, memory_writes ]
+  significance: low | medium | high
+}
+```
+
+**Lifecycle example (picnic basket)**:
+
+1. Event `picnic` created, status `planned`, props include `picnic_basket`, trigger "arrive at park"
+2. Arrive at park → event activates, basket is real in the active scene
+3. Picnic ends → event closes. Basket goes back into the closed event record. Does NOT leak into general memory.
+4. Memory written: "Had a picnic with BotA at the park, talked about her sister" (narrative gist, not prop list).
+
+When "go to war" happens later, war event has its own scope and props. Basket isn't in active context. Bot can't reach for it.
+
+**Promotion rules (what survives an event)**:
+
+- Object acquired (BotA gives you a locket) → promotes to inventory
+- Knowledge gained (BotA mentions she has a sister) → promotes to `botA → you` edge knowledge
+- Relationship change (first kiss) → promotes to relationship summary
+- Everything else (the basket, the blanket, the specific sandwich) stays in the closed event record, surfaces only on explicit recall
+
+**Plan vs. expectation**: planned events shouldn't pollute current bot behavior. They're visible to bots only as anticipated knowledge with low salience until proximity (a few hours of in-fiction time, or one scene before).
+
+### Active Threads
+
+Unresolved plot threads (promises, mysteries, tensions, plans). Maintained at the group/relationship level. Sticky — stay loaded in context until resolved or explicitly dropped. Cheap to keep loaded, anchor continuity across many compressed scenes.
+
+Example: "BotA still hasn't told you why she left town" — one line, costs almost nothing, preserves arc continuity.
+
+### Scenes and Compression
+
+A scene closes when the container changes meaningfully or significant in-fiction time passes. Scene transitions are the natural compression boundary.
+
+**Transition pipeline**:
+
+1. Detect transition (container change, time skip, explicit user signal)
+2. Run significance pass on closing scene (small LLM call as classifier)
+3. Generate per-POV summaries (one per witness, written from their POV)
+4. Update relationship edges and group node
+5. Close ended events; activate newly triggered events
+6. Archive raw dialogue to cold storage
+7. Build new active context
+
+**Per-POV summaries** (not omniscient narration). Same scene, different records:
+
+- *You*: "Drove to the park with BotA and BotB. BotA mentioned her sister for the first time. BotB seemed distracted."
+- *BotA*: "Rode with [you] and BotB. Told [you] about my sister — felt like the right moment. BotB on her phone the whole time, annoyed me."
+- *BotB*: "Got a ride to the park. Was texting Mom. Heard BotA mention a sister, didn't know she had one."
+
+Each captures different details and interpretations. This is what makes bots feel like they have inner lives.
+
+**Compression tiers**:
+
+- Last scene: full dialogue retained
+- Recent scenes: per-POV summary + key quotes
+- Older scenes: per-POV summary only
+- Distant past: rolled up into edge summaries and "era" descriptions
+
+### Time Skips
+
+Two flavors:
+
+1. **Elision**: "we arrived at the park" — skip the boring middle of an in-progress activity, activity completes normally
+2. **Jump**: "next morning", "a week later" — fast-forward through arbitrary time, possibly across multiple scenes
+
+**Skip pipeline**:
+
+```
+on skip_request(target):
+  1. Resolve target (when/where landing?)
+  2. Run intervening time forward, firing events in [now, target]
+  3. Compress the skipped period (auto or user-prompted)
+  4. Update activity/container for landing state
+  5. Generate transition narration
+  6. Resume play
+```
+
+**Target resolution**:
+
+- "Skip to when we arrive" → end of current travel action
+- "Next morning" → next 7am (or wake time)
+- "A week later" → arithmetic on in-fiction time
+- "Skip to dinner" → next scheduled dinner event
+
+**Compression strategies for skipped intervals**:
+
+- *Auto-generate, then confirm*: orchestrator drafts an interval summary, user approves/edits before commit. Default for short skips (< 1 day).
+- *Prompt for highlights*: ask user "anything notable happen during the week?" before jumping. Default for long skips. User can answer "uneventful" or describe events that become synthesized memories.
+
+**Landing state**: at target time, reset activity coherently. Don't keep pre-skip activity. For "skip to arrival" → standing in park, attention on surroundings, drive completed. For "next morning" → at home, in bed/kitchen, possibly with dream-memories, hungry/rested per skipped state.
+
+**Edge cases**:
+
+- *Stale plans*: skipped past a planned event without firing — happened off-screen, cancelled, or rescheduled? Default: if user wasn't required, happened off-screen and they may hear about it. If required, missed and has consequences.
+- *Compounding skips*: each commits before the next runs.
+- *Bot-initiated skips*: bots can propose skips ("let's pick this up tomorrow") but user confirms. Don't let bots silently advance time.
+
+**Bot reactions to skips**: after a skip, bots should behave as if time passed. Track "days since last interaction" on edges; fire any events affecting them during skip; apply off-screen memories. Without this, time skips feel like nothing happened.
+
+## Persistence Layer
+
+### Stack
+
+- **SQLite** for everything structured. Single file, zero ops. Plenty for single-user.
+- **sqlite-vss or sqlite-vec** for embeddings. Vector search as extension to the same SQLite DB. One file, one connection, one backup.
+- **Plain JSON** for snapshots, character templates, scene exports.
+
+That's the whole data layer. No Postgres, Redis, Pinecone, Docker.
+
+### Schema sketch
+
+```sql
+-- Core entities and state
+entities         (id, name, type, identity_json, created_at)
+entity_state     (entity_id, mood, goals_json, status_json, updated_at)
+activity         (entity_id, container_id, slot, posture, action_json,
+                  attention, holding_json, updated_at)
+containers       (id, name, type, properties_json, parent_id)
+world_state      (singleton: time, weather, active_scene_id, ...)
+
+-- Relationship graph
+edges            (id, source_id, target_id, affinity, trust, summary,
+                  knowledge_json, updated_at)
+group_node       (id, members_json, summary, dynamic, threads_json, updated_at)
+
+-- Events
+events           (id, name, status, trigger_json, scope_json, props_json,
+                  preconditions_json, on_start_json, on_complete_json,
+                  significance, scheduled_for, fired_at)
+active_threads   (id, scope_id, scope_type, description, created_at, resolved_at)
+
+-- Scenes and memories
+scenes           (id, container_id, started_at, ended_at, participants_json,
+                  summary, significance)
+memories         (id, scene_id, owner_id, pov_summary, witnessed_flags,
+                  source, reliability, significance, created_at)
+dialogue         (id, scene_id, speaker_id, content, ts, beat_or_ambient)
+
+-- Vector index
+memory_embeddings (memory_id, embedding BLOB)   -- via sqlite-vss
+
+-- Event sourcing log
+event_log        (id, branch_id, ts, kind, payload_json,
+                  resulting_state_hash, hidden_flag)
+snapshots        (id, branch_id, ts, event_log_id, full_state_json)
+branches         (id, name, parent_branch_id, parent_event_id, created_at)
+```
+
+### SQLite settings
+
+- `PRAGMA journal_mode=WAL` (better concurrency)
+- `PRAGMA foreign_keys=ON` (catch orchestration bugs)
+- Wrap each turn in a transaction (atomic commits)
+
+### Backups
+
+- Auto-backup the DB file on schedule (cron, launchd, or in-orchestrator timer)
+- Export significant scenes to JSON as they close (human-readable folder)
+- Version snapshots with timestamps for rollback
+
+### Migration story
+
+- Number schema versions in a `meta` table
+- Migrations as ordered SQL scripts
+- Apply pending migrations on startup
+
+## Event Sourcing and Time Travel
+
+### Core principle
+
+The structured state is a **projection** built by replaying an append-only event log. State is never mutated directly — events are appended, then projector applies them.
+
+### What's an event
+
+- `user_turn`: user types something
+- `assistant_turn`: LLM generates a response
+- `time_skip`: user advances time
+- `event_triggered`: scheduled event fires
+- `edge_update`: relationship state changes
+- `scene_transition`: scene closes/opens
+- `entity_state_change`: mood/goal/status update
+- `activity_change`: posture, position, action update
+
+Every state change goes through an event. Discipline matters here.
+
+### What this buys us
+
+- **Free rewind**: truncate log, rebuild state
+- **Trivial debugging**: replay log step-by-step, find the bug-introducing event
+- **Survivable schema changes**: re-run projector with new schema against same log
+- **Branching**: fork the log to explore alternate timelines
+
+### Snapshots
+
+Periodic full-state snapshots so we don't replay thousands of events on every load. Take new snapshot every N events or M minutes. Old snapshots prunable, log is source of truth.
+
+### Determinism on replay
+
+LLM calls are nondeterministic. Solution: store the *outcomes* of nondeterministic operations in the event payload, not just inputs. On replay, use the stored outcome rather than re-calling the LLM.
+
+### Rollback
+
+```
+rollback_to(event_id):
+  truncate event_log where id > event_id
+  invalidate cached state
+  rebuild state by replaying events up to event_id
+```
+
+UX: offer rollback at natural granularities (last turn, last scene, last skip, custom). Always snapshot pre-rollback for "undo rollback" recovery.
+
+### Branching
+
+```
+branch(event_id, branch_name):
+  create new branch_id
+  events copy-on-write from parent up to event_id
+  switch active branch
+  parent branch preserved
+```
+
+Allows "what if BotA had said yes" exploration without losing main canon.
+
+### Surgical delete
+
+Three approaches:
+
+1. **Soft delete**: `hidden` flag, message hidden from UI but still influences state. Useful for censoring, not for "make it never have happened."
+2. **Delete + cascade**: remove event, also remove dependent events. Effectively "rollback to before this message." Predictable, recommended default.
+3. **Delete + regenerate**: remove event, regenerate dependent LLM responses against new context. Right answer for some cases but complex (LLM calls, cost, nondeterminism).
+
+UX should be explicit. When user clicks delete:
+
+- **Delete and rewind**: remove this and everything after (option 2)
+- **Delete and regenerate**: remove this, regenerate downstream (option 3, only for user messages with one bot response after)
+- **Hide from view**: soft delete (option 1)
+
+Show user what will be affected before confirming:
+
+```
+Rewind to turn 47?
+This will remove:
+  - 12 messages (turns 48–59)
+  - 1 scene transition (drive to park)
+  - 2 edge updates (BotA → You, Group)
+  - 1 fired event (arrived at park)
+[Cancel] [Rewind] [Rewind, keep current as branch]
+```
+
+This impact preview is something only event-sourced systems can show cleanly.
+
+## Memory Retrieval
+
+When constructing a prompt, surface memories relevant to the current moment.
+
+**Always-loaded** (no retrieval needed):
+- Pinned items
+- Current scene
+- Active threads
+- Recent N scenes
+
+**Retrieved** (top-K from vector search):
+- Embed recent dialogue + active scene description as query
+- Search the speaker's memory embeddings
+- Filter by witness flag (never return memories speaker didn't witness)
+- Return top 3–5
+
+**Ranking refinements**:
+- Recency boost: weight recent memories higher (decay function on age)
+- Significance boost: high-significance memories surface more readily
+
+Keep K small. Too many retrieved memories pollute the prompt and confuse the model.
+
+For early implementation: SQLite FTS5 (`WHERE owner_id=? AND text MATCH ?`) is enough for first dozens of scenes. Add vector search after corpus grows.
+
+## Prompt Construction
+
+When the speaker is BotA in a scene with you and BotB present:
+
+1. BotA's identity, current state (mood, goals)
+2. BotA → You edge
+3. BotA → BotB edge
+4. Group {You, BotA, BotB} node summary (only if all 3 present)
+5. World state (time, weather, location)
+6. Active scene description
+7. Activity snapshot for all present entities (always small structured block)
+8. Active threads (relevant to scene)
+9. Recent dialogue (small window)
+10. Retrieved memories (top-K, witnessed by BotA, relevant to topic)
+11. Currently active events and their props
+
+BotB gets a completely different prompt assembled from BotB's edges and knowledge. They're effectively two separate agents who happen to share a scene.
+
+After each utterance, run state-update pass on **every present entity**, not just the speaker. Silent witnessing should still update edges (BotB watching BotA say something cruel updates `botB → botA`).
+
+### Token budget management
+
+Deferred — to be designed during implementation. Likely approach: budget tiers (must-include, should-include, nice-to-include) and trim from the bottom when over budget. Active threads and current activity always must-include.
+
+## Orchestrator Tick
+
+```
+on user_input:
+  parse intent + actions
+  log user_turn event
+  update activity state
+  check event triggers (any conditions met?)
+  if scene transition detected:
+    run significance pass on closing scene
+    write per-POV memories
+    update edges and threads
+    close/open events
+  build prompt for next speaker
+  call LLM (stream)
+  log assistant_turn event with response
+  update edges (silent witnesses too)
+  advance in-fiction time
+  persist transaction
+```
+
+This maps cleanly to actor-model patterns: each entity is an actor with private state, the world is a coordinator, events are messages with TTLs. Optional implementation choice, not required.
+
+## Background Tasks
+
+While LLM is generating the next response, orchestrator can do non-blocking work:
+- Run significance classifier on the previous turn
+- Update embeddings for new memories
+- Persist state
+- Take snapshots when due
+
+By the time user finishes reading the response, bookkeeping is done.
+
+## Implementation Plan (suggested phases)
+
+### Phase 1: Core loop
+- SQLite schema, basic entities + edges, single scene container
+- Event log + projector + replay
+- Single-bot conversation (you + 1 bot, no skips, no events)
+- LLM client abstraction with one backend
+- Streaming UI, basic web chat
+- Manual rollback (truncate log)
+
+### Phase 2: Multi-entity
+- Second bot, group node, scene configurations
+- Witnessed-by filtering
+- Per-POV memories
+- Activity state and containers
+- Scene transition with compression
+
+### Phase 3: Events and skips
+- Event queue with triggers
+- Time skips (elision and jump)
+- Active threads
+- Significance classifier (small LLM call)
+
+### Phase 4: Polish
+- Vector retrieval
+- Branching
+- Surgical delete + regenerate
+- Snapshot management
+- Backup automation
+- UI affordances (impact preview, rollback granularities)
+
+## Open Questions / Decisions Deferred
+
+- **Token budget exact strategy**: design during Phase 1 with real prompts
+- **Embedding model choice**: defer until Phase 4; pick whatever's cheap and good enough
+- **Vector index implementation**: sqlite-vss vs sqlite-vec — pick at Phase 4 based on current state of those projects
+- **UI framework**: TBD; whatever's pleasant on Mac (could be local web app, Tauri, Electron, native)
+- **Hosting for inference**: start with cloud API (lowest setup), evaluate alternatives once core works
+- **Character template format**: defer until building Phase 1
+- **Multi-session / multi-character casts**: out of scope for v1, leave hooks in schema if cheap
+
+## Reference Patterns
+
+This system aligns with several well-known patterns:
+
+- **Event sourcing**: log of events as source of truth, state as projection
+- **CQRS**: write path (append events) separate from read path (projected state)
+- **Actor model**: entities as actors with private state, message-passing semantics
+- **State machine**: world state transitions on events
+- **CRDT-adjacent**: branches as concurrent histories that can be inspected/merged
+
+For someone with a SCADA / industrial automation background:
+
+- Event log = process historian
+- State projection = live SCADA tags
+- Snapshots = periodic dumps
+- Rollback = restore from checkpoint and replay alarms
+- Significance classifier = condition-based event filtering
+- Required attention = interlock
+- Container constraints = equipment context
+
+Same patterns, different domain.