docs: add v1 requirements design + project README

- docs/plans/2026-04-26-v1-requirements-design.md captures the v1 product requirements and behavioral spec from the initial brainstorm (use case, scope, data model, authoring, play loop, memory, time, rollback, phase cut, non-negotiable rules). - README.md introduces the project for the gitea repo. - CLAUDE.md updated to reference the requirements doc. - .gitignore added for macOS metadata.
2026-04-26 10:46:03 -04:00
parent eb2f814f25
commit 2f94ba7291
4 changed files with 413 additions and 1 deletions
@@ -0,0 +1 @@
+.DS_Store
@@ -2,7 +2,7 @@

 Local-first roleplay chat app that treats fiction as a **simulation**, not a chat log. The LLM is a renderer for structured world state — it does not hold state.

-See [rp-engine-design.md](rp-engine-design.md) for the full design. This file is the working summary.
+See [rp-engine-design.md](rp-engine-design.md) for the architectural design and [docs/plans/2026-04-26-v1-requirements-design.md](docs/plans/2026-04-26-v1-requirements-design.md) for the v1 product requirements & behavioral spec. This file is the working summary.

 ## Why this exists

@@ -0,0 +1,78 @@
+# chat
+
+A local-first roleplay chat engine that treats fiction as a **simulation**, not a chat log.
+
+The LLM is a renderer for structured world state — it does not hold the state. State lives in an event-sourced SQLite database and is projected on demand. Models can be swapped freely behind a stateless `generate(prompt, params) -> text` interface.
+
+> **Status:** design phase. No code yet. See [rp-engine-design.md](rp-engine-design.md) for the full design and [CLAUDE.md](CLAUDE.md) for the working summary and conventions.
+
+## Why
+
+Conventional RP chatbots have three persistent failure modes:
+
+1. **Memory loss** — old context drops as history grows.
+2. **Quality decay** — bots get terse and generic over long conversations.
+3. **Stale state pollution** — bots fixate on past props (the "picnic basket" problem: bring a basket to one scene, the bot reaches for it forever).
+
+The fix is to model the world as structured state — locations, time, who's present, what they're doing, what they remember, how they feel about each other — and use the LLM only to render that state into prose.
+
+## Scope
+
+Deliberately small, so the design can be made to actually work:
+
+- **Single user, single machine.**
+- **Maximum 3 entities per scene**: `you` + up to 2 bots. The 3-entity cap is load-bearing — it makes the relationship graph fully enumerable (6 directed edges + 1 group node).
+- **Chat-only.** No voice, no real-time.
+
+Multi-session casts and N-entity scenes are explicit non-goals for v1.
+
+## How it works (at a glance)
+
+- **Entities** (`you`, `botA`, `botB`) have identity, state (mood/goals/status), an activity record (where they are, what they're doing, what they're holding, where their attention is), and per-POV memory.
+- **Containers** (car, restaurant booth, room) hold entities in defined slots and provide spatial constraints the model can reason over.
+- **Relationship graph**: 6 **directed** edges + 1 group node. Asymmetric feelings are first-class — BotA can secretly resent BotB while BotB thinks they're best friends.
+- **Witnessed-by flags**: every memory carries a 3-bit `[you, botA, botB]` mask. A speaker can only retrieve memories their bit is set on. This is what stops bots referencing things they couldn't possibly know.
+- **Events** have lifecycles (`planned → active → completed`) and own their own props. When the picnic ends, the basket goes back into the closed event record. Only narrative gist, acquired objects, learned facts, and relationship changes promote to permanent memory.
+- **Per-POV scene summaries**: every witness gets their own version of a closed scene, written from their angle. Different details, different interpretations. This is what gives bots inner lives.
+- **Event sourcing**: state is a projection of an append-only event log. Free rewind, branching ("what if BotA had said yes"), surgical delete with impact preview, and survivable schema changes — all fall out for free.
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────┐    ┌────────────────────────┐
+│ Mac (always-on)                              │    │ Inference endpoint     │
+│                                              │    │ (stateless)            │
+│  Web UI                                      │    │                        │
+│  Orchestrator                                │ →  │  Anthropic API         │
+│  Event log + projector  ← SQLite (one file)  │    │  OpenAI / OpenRouter   │
+│  Persistence + retrieval + prompt builder    │    │  Local MLX / llama.cpp │
+│                                              │    │  Rented GPU            │
+└──────────────────────────────────────────────┘    └────────────────────────┘
+```
+
+The Mac side holds everything that survives — state, history, retrieval, orchestration. Inference is a swappable, stateless service. State outlives any one model.
+
+## Stack
+
+- **SQLite** (single file) for everything structured. WAL mode, foreign keys on, each turn in a transaction.
+- **sqlite-vss / sqlite-vec** for embedding search in the same DB file (Phase 4).
+- **JSON** for snapshots, character templates, scene exports.
+- No Postgres. No Redis. No Pinecone. No Docker.
+
+## Roadmap
+
+1. **Core loop** — schema, entities + edges, single container, event log + projector, single-bot conversation, one LLM backend, streaming UI, manual rollback.
+2. **Multi-entity** — second bot, group node, scene configurations, witness filtering, per-POV memories, activity/containers, scene transitions with compression.
+3. **Events & skips** — event queue with triggers, time skips (elision and jump), active threads, significance classifier.
+4. **Polish** — vector retrieval, branching, surgical delete + regenerate, snapshots, backup automation, impact-preview UI for rewinds.
+
+Each phase must work end-to-end before the next begins.
+
+## Repository
+
+- [rp-engine-design.md](rp-engine-design.md) — full design document.
+- [CLAUDE.md](CLAUDE.md) — working summary and conventions for development with Claude Code.
+
+## License
+
+TBD.
@@ -0,0 +1,333 @@
+# v1 Requirements & Behavioral Design
+
+**Date:** 2026-04-26
+**Status:** approved (brainstorming complete; pending writing-plans)
+**Companion document:** [rp-engine-design.md](../../rp-engine-design.md) — architectural design
+
+This document captures the product- and behavior-level requirements for v1, derived from a structured brainstorm. The architectural design doc covers the *how* (event sourcing, schema sketch, prompt construction). This doc covers the *what* (user experience, scope cuts, and the contract between the user and the system).
+
+---
+
+## 1. Product Vision
+
+A local-first, single-user roleplay engine for **relational and companion-style RP**. The goal is bots that feel persistent, consistent, and like they have inner lives — durable across long horizons, immune to the three classic failure modes (memory loss, quality decay, prop pollution).
+
+The LLM is treated as a **renderer** for structured world state, not as the state-holder. State lives in an event-sourced SQLite database that survives any model outage and is replayable for free.
+
+## 2. Scope
+
+### 2.1 In scope (v1)
+
+- Single user, single Mac (always-on).
+- A library of bots the user has authored. Each bot is a persistent entity with its own identity, memory, edges, and clock.
+- **One chat per bot.** A second bot can be added as a *guest* into any chat. Hard cap: **2 bots in any scene**.
+- Explicit / mature content allowed.
+- **Featherless** as the LLM backend over its OpenAI-compatible API. Two model slots:
+  - `narrative_model` — Dolphin-Mistral-24B-Venice (uncensored, narrative-grade).
+  - `classifier_model` — small (~3B-class), TBD at Phase 1 start. Used for parsing, significance, interjection, scene-close detection, state-update passes.
+
+### 2.2 Out of scope (v1)
+
+- Multi-user / multi-device.
+- More than 2 bots in a scene.
+- Proactive bot contact / out-of-session messages / push notifications.
+- Background world ticks ("meanwhile…" scenes that play while the user is idle) — deferred to Phase 2-3.
+- Branching timelines with UI (mechanically supported by event sourcing; not exposed yet).
+- Multimodality: no portraits, voice, or images. Defer indefinitely.
+
+### 2.3 Non-functional constraints
+
+- All state survives any model or network outage.
+- Each turn writes one transactional event-log batch — no half-applied state.
+- Streaming required for narrative output.
+- One SQLite file is the entire data layer. No Postgres, Redis, Pinecone, Docker.
+
+## 3. Architecture & Backend
+
+```
+[ Mac, always-on ]                              [ Featherless, stateless ]
+  Web UI  ──► Orchestrator ──► LLM client ──►   narrative_model    (Dolphin-Mistral-24B-Venice)
+                  │                              classifier_model  (small, TBD)
+                  ├── Event log + projector
+                  ├── SQLite (one file)
+                  └── Retrieval + prompt builder
+```
+
+The orchestrator never knows which model is in use — only `generate(prompt, params) -> text` (streamed). The Featherless client is one implementation; mocks and other backends can drop in for tests or future migration.
+
+API key handling: keys live in a local config file outside the repository. **Never** commit a key to the repo, paste in chat logs, or include in exports.
+
+## 4. Data Model (top-level entities)
+
+- **Bot** — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
+- **Chat** — exactly one per bot, with that bot as host. A chat carries an optional current guest bot, its own clock, and its own active scene. Chats do **not** own memories or identity — those are bot-owned.
+- **You** — singleton entity with a light identity (name, voice/persona summary).
+- **Edges** — per-pair, persistent across chats:
+  - `bot → you`, `you → bot` for every authored bot
+  - `botA → botB`, `botB → botA` initialized the first time two bots co-appear
+  - Edges hold: affinity, trust, summary, knowledge known about target, last interaction (chat-id + chat-clock), shared private moments. **Asymmetric — never collapsed into a single shared "relationship" field.**
+- **Memory** — bot-owned. Each memory carries: a 3-bit `[you, host, guest]` witnessed-by mask, the chat it occurred in, that chat's clock at the time, source/reliability if secondhand, significance, embedding (Phase 4).
+- **Events / threads** — scoped to the chat where they exist. (Phase 3.)
+- **Scenes** — recorded per chat, with per-POV summaries written for each present witness on close.
+
+### 4.1 Per-chat clocks
+
+Each chat has its own `chat_state.time`, initialized from its host bot's kickoff scene and advanced **only** by explicit user time skips within that chat. Two clocks for two chats are **independent** — you may spend in-fiction days with BotA without BotB's clock advancing.
+
+When BotB guests in BotA's chat, the scene runs on **BotA's** chat clock; memories written to both bots are timestamped at that clock value. BotB's chat clock is unchanged. Cross-chat time arithmetic is intentionally fuzzy — bots can reference cross-chat events ("when I came over that night") but the system does not claim precise "X days ago" math across chats. Within a single chat, time math is precise.
+
+### 4.2 Bot library
+
+Bots are top-level. Bot count in the system is unbounded. Per-scene cap is 2 bots (you + 1 host + optional 1 guest, or you + 1 host with no guest).
+
+## 5. Authoring
+
+Authoring is structured, done in a form-based UI. A bot is created once, then edited through reset (full wipe) or by amending immutable identity fields directly.
+
+### 5.1 Authored fields per bot
+
+- **Identity (immutable per session):** name, persona paragraph, voice samples (1–3 short prose samples in the bot's voice), trait list, backstory.
+- **Initial relationship to you:** free-form prose ("BotA is my coworker; we've worked together for two years; she has a crush on me she hasn't admitted"). Parsed once into seeded `you ↔ bot` edge content on first run.
+- **Kickoff scene:** free-form prose describing the first encounter ("you stay late at the office; only you and BotA are there; she's at her desk pretending to work"). Parsed on first init into structured container, activity, seed edge content, and initial scene state. The user confirms or edits the parsed result before play begins.
+
+### 5.2 First co-appearance: "have they met?"
+
+The first time two bots appear in the same scene, the orchestrator prompts: *Have BotA and BotB met before?*
+
+- **Yes** → user writes a short prose seed describing how they know each other. Parsed into initial `botA ↔ botB` edge content.
+- **No** → edges initialize empty. The bots actually meet on-screen.
+
+### 5.3 Reset
+
+Reset is a per-bot action with hard confirmation (type the bot's name).
+
+- Wipes: chat history, memories, scenes, edges involving the bot, current scene state.
+- Preserves: identity, initial edge seed, kickoff scene.
+- After reset, the chat **sits ready** — kickoff does not auto-play. The next user message triggers kickoff.
+
+## 6. Play Loop
+
+### 6.1 Input convention (mixed prose, novel-style)
+
+A turn is free-form prose with conventional markers:
+
+- `*walks over*` — action.
+- Quoted or bare text — dialogue.
+- `((double parens))` — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; configurable before play begins.)
+
+A small classifier call splits the turn into segments tagged `dialogue | action | ooc`. Action segments update the user's activity record.
+
+### 6.2 Turn-taking (scene with you + host, optional guest)
+
+- **Addressee gets the floor.** With one bot present, the bot replies. With two bots, the addressee bot replies (inferred from prose: name mention, gaze, context).
+- **Interjection allowance** (only when guest is present): a classifier call decides whether the non-addressee bot interjects this turn. If yes, it produces a short reaction beat after the addressee's reply. If no, it silently witnesses.
+- **State-update pass on every present entity** after every utterance, not just the speaker. Silent witnesses still update edges (BotB watching BotA say something cruel updates `botB → botA`).
+
+### 6.3 Speaker prompt assembly
+
+For the speaking bot, the prompt is assembled from **their own** state and **their own** witnessed memories — never from a global view:
+
+1. Speaker identity + current state (mood, goals).
+2. Speaker → you edge (and speaker → other-bot edge, if guest is present).
+3. Group node (Phase 2+, only if all 3 present).
+4. Chat-state snapshot: time, weather, location.
+5. Active scene description.
+6. **Activity snapshot for all present entities** (always a small structured block — anchors spatial grounding).
+7. Active threads (Phase 3).
+8. Recent dialogue window.
+9. Retrieved memories (top-K, witness-filtered, speaker-owned).
+10. Currently active events + props (Phase 3).
+
+### 6.4 Drawer (state visibility)
+
+A collapsible right-side drawer. **Closed by default.** When open, shows for the current chat:
+
+- Current scene + container.
+- Activity record per present entity.
+- Edges (host ↔ you, host ↔ guest if any).
+- Recent witnessed memories from the host's POV.
+- Active threads and currently active events (Phase 3).
+
+Read-only by default; each row has an edit affordance for surgically fixing things the LLM got wrong (full edit surface lands progressively across phases).
+
+## 7. Scene Lifecycle
+
+### 7.1 Starts
+
+- **First-ever scene with a bot** → kickoff plays after the kickoff prose is parsed and the user confirms the structured form.
+- **Returning to an existing chat** → resume in-place. Same container, same activity, same active scene. **No auto time advancement.**
+- **Adding a guest bot** → guest just appears in the current scene. The user narrates any in-fiction justification in prose. `botA ↔ botB` edges initialize per the per-pair "have they met?" answer if it's their first co-appearance.
+
+### 7.2 Closes (hybrid auto + manual)
+
+A scene closes when one of the following fires:
+
+- **Auto, hard signals:**
+  - Container change (parsed from prose — "we drove to the park", "we stepped outside").
+  - Declared time skip (Phase 3).
+  - Explicit user pattern ("we're done here", "fade out", etc.) recognized by classifier.
+- **Manual:** "Close scene" button in the drawer for soft transitions the user wants bookmarked.
+
+False positives are the bigger risk than false negatives, so the auto-detector errs conservative; manual close is always available.
+
+### 7.3 On close
+
+1. Significance classifier pass on the closing scene.
+2. Per-POV summaries written — one per witness, from their angle. **No omniscient narration.**
+3. Edge updates applied (affinity, trust, summary, knowledge, last-interaction).
+4. Closed events finalize; their props remain in the closed event record (do **not** promote to memory).
+5. Raw dialogue archived to cold storage.
+6. New active scene is opened (resume or fresh).
+
+## 8. Memory Retrieval
+
+### 8.1 Always-loaded (no retrieval cost)
+
+- Pinned memories (user pins from the drawer).
+- Current scene's running dialogue window.
+- Active threads (Phase 3).
+- Last N scenes' per-POV summaries from the current chat (`N` tunable; default **3**).
+
+### 8.2 Retrieved (top-K)
+
+- **Phase 1:** SQLite FTS5 over `memories.pov_summary` for the speaker. Filter `WHERE owner_id = speaker AND witness_bit_for_speaker = 1` (hard SQL constraint, not a soft signal).
+- **Phase 4:** vector search via sqlite-vss/sqlite-vec, same filter.
+- Default `K = 4`. Recency boost + significance boost in ranking.
+
+### 8.3 Witness rule (non-negotiable)
+
+A bot **cannot** retrieve memories whose witness bit for them is `0`. Period. This is the mechanism preventing bots from referencing things they couldn't possibly know.
+
+### 8.4 Cross-chat memory
+
+A bot's memory store contains memories from any chat the bot has been in (host or guest). All are retrievable. Bots may reference cross-chat events naturally; precise cross-chat time arithmetic is not attempted.
+
+## 9. Time, Skips, Events (Phase 3 surface)
+
+- Each chat has its own clock; advances **only** on explicit user skip commands within that chat.
+- **Elision skip** — "skip to when we arrive". Resolves to end-of-current-action; activity completes; landing state set; brief transition narration generated.
+- **Jump skip** — "next morning", "a week later". User is prompted: *"anything notable happen?"* Answer becomes synthesized memory(ies) for the speaker bot via classifier call. Chat clock advances. Activity is reset to a coherent landing state.
+- **Events** — lifecycles (`planned | active | completed | cancelled | expired`) with their own scoped props. Promotion rules on close (only the four categories: object acquired → inventory, knowledge gained → edge knowledge, relationship change → edge summary, everything else stays in closed record).
+
+Phase 1 has no skips and no events. Time is set at kickoff and stays put unless the bot is reset.
+
+## 10. Rollback, Regenerate, Reset
+
+- **Rewind to here** — button on every turn. Truncates event log past that turn; rebuilds projection. Confirmation modal shows turn count + scene transitions affected. Always snapshots pre-rewind for "undo rewind".
+- **Regenerate this turn** — button on the latest bot turn. **Edit-then-regenerate**: the user may edit their preceding turn before re-running. Replaces the old `assistant_turn` event with a new one carrying the new outcome. Downstream classifier passes (state updates, significance) re-run on the new output.
+- **Reset bot** — full wipe with hard confirm (type bot name). Behavior detailed in §5.3.
+- **Branching, hide-from-view, surgical delete + cascade with impact preview** — Phase 4. Mechanically supported by event sourcing already; no UI yet.
+
+## 11. Compression & Promotion
+
+- **Significance pass** — classifier call after each turn (queued, async) tags the turn 0–3.
+- **Per-POV summaries** — written per witness when a scene closes. Different details, different interpretations. No omniscient narration.
+- **Promotion rules (the "picnic basket" rule):**
+  - Object acquired → entity inventory.
+  - Knowledge gained → relevant edge's `knowledge`.
+  - Relationship change → edge `summary`.
+  - **Everything else stays in the closed event/scene record.** Surfaces only on explicit recall.
+- **Compression tiers:**
+  - Last scene: full dialogue retained.
+  - Recent scenes: per-POV summary + key quotes.
+  - Older scenes: per-POV summary only.
+  - Distant past: rolled into edge summaries.
+
+## 12. Persistence & Ops (v1 defaults)
+
+- SQLite WAL mode, foreign keys on, transactional turns.
+- Single DB file. Default path TBD (likely `~/Library/Application Support/chat/chat.db`).
+- **Auto-backup** nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned.
+- **Significant-scene JSON exports** written to a sibling folder when scenes close at significance ≥ 2.
+- Schema versioned in a `meta` table; migrations applied on startup.
+
+## 13. Phase Cut
+
+### Phase 1 (v1) — must build end-to-end before any Phase 2 work
+
+- Featherless client (OpenAI-compatible) with `narrative_model` and `classifier_model` configured.
+- Schema, event log, projector, replay.
+- Bot authoring UI (form-based) including kickoff prose + parse-and-confirm.
+- Single-bot chat (host only, no guest yet):
+  - Mixed-prose input with `((parens))` OOC marker.
+  - Addressee = host. No interjection logic yet.
+  - Narrative streaming.
+  - Post-turn state-update pass.
+- Drawer: read-only first; edit-on-demand may land in Phase 1.5.
+- Scene close (hard-signal auto + manual button) with per-POV summary for the host bot only.
+- Memory: witness flag stored, FTS5 retrieval (K=4), recency + significance boost.
+- Rewind + regenerate (with edit-then-regenerate).
+- Reset bot (hard confirm).
+- Per-chat clock (set at kickoff; no skips yet).
+- Nightly backups + pre-rewind snapshots.
+
+### Phase 2 — multi-entity
+
+- Guest bot in chat (3-entity scene config).
+- Interjection classifier call.
+- Witness filtering across multiple owners.
+- Group node (when all three present).
+- Per-pair "have they met?" prompt.
+- `botA ↔ botB` edges.
+
+### Phase 3 — events, skips, threads
+
+- Events with lifecycles and scoped props.
+- Time skips: elision and jump.
+- Active threads.
+- Significance classifier improvements.
+- "Meanwhile…" scenes (scene config 4) — autonomous.
+
+### Phase 4 — polish
+
+- Vector retrieval (sqlite-vss or sqlite-vec).
+- Branching UI.
+- Drawer-edit on every field.
+- Backup tooling improvements.
+- Significance review UI.
+- Surgical delete + cascade with impact preview.
+- Hide-from-view soft delete.
+
+## 14. Open / Deferred Decisions
+
+- Exact small classifier model name on Featherless (pick at start of Phase 1: cheapest model that's good enough at structured-output classification).
+- Token budget tier strategy (must-include / should-include / nice-to-include) — designed against real prompts during Phase 1.
+- UI framework — TBD; local web app is the default direction.
+- OOC marker (`((parens))` proposed as default; user may change before play begins).
+- DB file location.
+- Embedding model choice (Phase 4).
+- sqlite-vss vs sqlite-vec (Phase 4).
+
+## 15. Non-Negotiables (rules every implementer must respect)
+
+1. **State changes go through the event log.** Never `UPDATE` a state row directly; append an event, let the projector apply it.
+2. **Witness filter every memory read.** A bot must not retrieve memories whose witness bit for them is 0.
+3. **Edges are directed.** `botA → botB` and `botB → botA` are independent.
+4. **Don't promote event props to memory.** Only the four promotion categories (object, knowledge, relationship change, narrative gist via summary).
+5. **Per-POV, not omniscient.** Scene summaries are written per witness, from their angle.
+6. **Activity block is always in the prompt.** Spatial grounding prevents "leaning on the kitchen counter while in a car" failures.
+7. **Streaming on the inference path; non-blocking bookkeeping** runs while the LLM streams.
+8. **No extra services.** SQLite + a process. Push back on suggestions to add infrastructure.
+
+---
+
+## Appendix A — Decisions Log (this brainstorm)
+
+| # | Decision | Choice |
+|---|----------|--------|
+| 1 | Primary experience | Relational + companion RP |
+| 2 | Content scope | Explicit content allowed |
+| 3 | Backend | Featherless (OpenAI-compatible API), Dolphin-Mistral-24B-Venice for narrative |
+| 4 | Authoring | Rich structured fields |
+| 5 | Input style | Mixed prose, novel-style |
+| 6 | Multi-bot turn-taking | Addressee + interjection allowance |
+| 7 | Bot autonomy | Strictly turn-based for v1; "meanwhile…" deferred to P2-3 |
+| 8 | Sessions/campaigns | Bots are persistent units; one chat per bot; guests addable; bot-owned memory |
+| 9 | First co-appearance | Per-pair "have they met?" prompt |
+| 10 | Scene starts | Kickoff for new; resume in-place for returning; just-appear for guests |
+| 11 | State visibility | Optional drawer, closed by default, read-only with edit-on-demand |
+| 12 | Scene close | Hybrid (hard-signal auto + manual button) |
+| 13 | Time | Per-chat clock, advanced only by explicit user skips |
+| 14 | Model strategy | Small classifier model + large narrative model |
+| 15 | Reset | Full wipe + hard confirm; chat sits ready for kickoff |
+| 16 | Rollback | Rewind + regenerate (with edit-then-regenerate) |