Files

T

Joseph Doherty 5869f1c5ce docs: lock remaining v1 design decisions

Resolves the open/deferred decisions from the v1 requirements brainstorm:
runtime stack, classifier model, token budgets, OOC marker, data layout.

- Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement,
  not a polish item). 127.0.0.1 only, no auth in v1.
- Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented
  fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated).
- Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier;
  Must/Should/Nice trimming tiers spelled out in §3.2.
- OOC marker locked to ((double parens)), configurable.
- All runtime data lives under <repo>/data/ (DB, backups, snapshots,
  exports, config). Tree is gitignored. CHAT_DB_PATH env var honored.

CLAUDE.md and the requirements doc updated to match. Decisions log in
the requirements doc appendix extended with the new locks (#17–21).

2026-04-26 10:56:51 -04:00

23 KiB

Raw Blame History

v1 Requirements & Behavioral Design

Date: 2026-04-26 Status: approved (brainstorming complete; pending writing-plans) Companion document: rp-engine-design.md — architectural design

This document captures the product- and behavior-level requirements for v1, derived from a structured brainstorm. The architectural design doc covers the how (event sourcing, schema sketch, prompt construction). This doc covers the what (user experience, scope cuts, and the contract between the user and the system).

1. Product Vision

A local-first, single-user roleplay engine for relational and companion-style RP. The goal is bots that feel persistent, consistent, and like they have inner lives — durable across long horizons, immune to the three classic failure modes (memory loss, quality decay, prop pollution).

The LLM is treated as a renderer for structured world state, not as the state-holder. State lives in an event-sourced SQLite database that survives any model outage and is replayable for free.

2. Scope

2.1 In scope (v1)

Single user, single Mac (always-on).
A library of bots the user has authored. Each bot is a persistent entity with its own identity, memory, edges, and clock.
One chat per bot. A second bot can be added as a guest into any chat. Hard cap: 2 bots in any scene.
Explicit / mature content allowed.
Featherless as the LLM backend over its OpenAI-compatible API. Two model slots:
- narrative_model — dphn/Dolphin-Mistral-24B-Venice-Edition (uncensored, narrative-grade). 32K context.
- classifier_model — NousResearch/Hermes-3-Llama-3.1-8B (uncensored, tuned for tool use / structured output). 128K context. Fallback chain if it underperforms on JSON: cognitivecomputations/dolphin-2.9.4-llama3-8b → mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated.
- Classifier is used for: turn parsing (dialogue/action/ooc), kickoff prose parsing, scene-close detection, interjection decisions, significance scoring, state-update extraction, jump-skip memory synthesis.

2.2 Out of scope (v1)

Multi-user / multi-device.
More than 2 bots in a scene.
Proactive bot contact / out-of-session messages / push notifications.
Background world ticks ("meanwhile…" scenes that play while the user is idle) — deferred to Phase 2-3.
Branching timelines with UI (mechanically supported by event sourcing; not exposed yet).
Multimodality: no portraits, voice, or images. Defer indefinitely.

2.3 Non-functional constraints

All state survives any model or network outage.
Each turn writes one transactional event-log batch — no half-applied state.
Streaming required for narrative output.
One SQLite file is the entire data layer. No Postgres, Redis, Pinecone, Docker.

3. Architecture & Backend

[ Mac, always-on ]                              [ Featherless, stateless ]
  Web UI  ──► Orchestrator ──► LLM client ──►   narrative_model    (Dolphin-Mistral-24B-Venice)
                  │                              classifier_model  (small, TBD)
                  ├── Event log + projector
                  ├── SQLite (one file)
                  └── Retrieval + prompt builder

The orchestrator never knows which model is in use — only generate(prompt, params) -> text (streamed). The Featherless client is one implementation; mocks and other backends can drop in for tests or future migration.

API key handling: keys live in a local config file outside the repository. Never commit a key to the repo, paste in chat logs, or include in exports.

3.1 Runtime stack

Backend: Python 3.11+ with FastAPI as the HTTP server.
Frontend: server-rendered HTML + HTMX + minimal vanilla JS/CSS. No JS build chain.
Live updates: Server-Sent Events (SSE) per chat. Server keeps a per-chat in-process pub/sub channel (an asyncio.Queue per chat_id). Every browser tab on /chats/<id> opens an SSE connection to /chats/<id>/events. State changes (new turn, streamed tokens, drawer state, edge updates, scene close) publish to the channel; all subscribed tabs receive the event and HTMX swaps the relevant DOM region.
Multi-tab sync is a Phase 1 requirement, not a polish item. Two browser tabs open to the same chat must mirror each other in real time. Implications:
- In-progress typing is tab-local until submit (no collaborative input in v1).
- On reconnect/refresh, the server first sends a "current state" snapshot, then resumes streaming.
- The same architecture trivially supports a phone or tablet on the LAN later — bind to 0.0.0.0 + add a shared-secret token if/when desired. Default is 127.0.0.1, no auth.

3.2 Token budgets and trimming tiers

Token accounting via tiktoken with the closest cl100k approximation. Mistral and Llama tokenizers diverge ~5%; we accept the drift.

Narrative prompt: 8K hard ceiling, 6K soft target. Leaves ~2-4K headroom for streamed output and avoids long-context performance cliffs. Plenty for our prompt shape.
Classifier prompt: 4K hard ceiling. Most calls are well under 1K.

When the assembled prompt exceeds the soft target, trim in this order — never trim must-include:

MUST-include (always present):
- System message + speaker identity
- Speaker's edge to the addressee
- Activity snapshot for all present entities
- Current scene description
- Last 4 turns of dialogue
SHOULD-include (trim when over budget):
- Other edges of the speaker (e.g. speaker → other present)
- Group node summary (when applicable)
- Active threads
- Currently active events + props
NICE-include (trim first):
- Retrieved memories beyond top-2 (drop K=4 to K=2)
- Dialogue turns beyond the last 4 (replace older turns with a one-line summary)
- Per-POV summary of the previous scene

4. Data Model (top-level entities)

Bot — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
Chat — exactly one per bot, with that bot as host. A chat carries an optional current guest bot, its own clock, and its own active scene. Chats do not own memories or identity — those are bot-owned.
You — singleton entity with a light identity (name, voice/persona summary).
Edges — per-pair, persistent across chats:
- bot → you, you → bot for every authored bot
- botA → botB, botB → botA initialized the first time two bots co-appear
- Edges hold: affinity, trust, summary, knowledge known about target, last interaction (chat-id + chat-clock), shared private moments. Asymmetric — never collapsed into a single shared "relationship" field.
Memory — bot-owned. Each memory carries: a 3-bit [you, host, guest] witnessed-by mask, the chat it occurred in, that chat's clock at the time, source/reliability if secondhand, significance, embedding (Phase 4).
Events / threads — scoped to the chat where they exist. (Phase 3.)
Scenes — recorded per chat, with per-POV summaries written for each present witness on close.

4.1 Per-chat clocks

Each chat has its own chat_state.time, initialized from its host bot's kickoff scene and advanced only by explicit user time skips within that chat. Two clocks for two chats are independent — you may spend in-fiction days with BotA without BotB's clock advancing.

When BotB guests in BotA's chat, the scene runs on BotA's chat clock; memories written to both bots are timestamped at that clock value. BotB's chat clock is unchanged. Cross-chat time arithmetic is intentionally fuzzy — bots can reference cross-chat events ("when I came over that night") but the system does not claim precise "X days ago" math across chats. Within a single chat, time math is precise.

4.2 Bot library

Bots are top-level. Bot count in the system is unbounded. Per-scene cap is 2 bots (you + 1 host + optional 1 guest, or you + 1 host with no guest).

5. Authoring

Authoring is structured, done in a form-based UI. A bot is created once, then edited through reset (full wipe) or by amending immutable identity fields directly.

5.1 Authored fields per bot

Identity (immutable per session): name, persona paragraph, voice samples (1–3 short prose samples in the bot's voice), trait list, backstory.
Initial relationship to you: free-form prose ("BotA is my coworker; we've worked together for two years; she has a crush on me she hasn't admitted"). Parsed once into seeded you ↔ bot edge content on first run.
Kickoff scene: free-form prose describing the first encounter ("you stay late at the office; only you and BotA are there; she's at her desk pretending to work"). Parsed on first init into structured container, activity, seed edge content, and initial scene state. The user confirms or edits the parsed result before play begins.

5.2 First co-appearance: "have they met?"

The first time two bots appear in the same scene, the orchestrator prompts: Have BotA and BotB met before?

Yes → user writes a short prose seed describing how they know each other. Parsed into initial botA ↔ botB edge content.
No → edges initialize empty. The bots actually meet on-screen.

5.3 Reset

Reset is a per-bot action with hard confirmation (type the bot's name).

Wipes: chat history, memories, scenes, edges involving the bot, current scene state.
Preserves: identity, initial edge seed, kickoff scene.
After reset, the chat sits ready — kickoff does not auto-play. The next user message triggers kickoff.

6. Play Loop

6.1 Input convention (mixed prose, novel-style)

A turn is free-form prose with conventional markers:

*walks over* — action.
Quoted or bare text — dialogue.
((double parens)) — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; stored as a config field; the user may change it before play begins.)

A small classifier call splits the turn into segments tagged dialogue | action | ooc. Action segments update the user's activity record.

6.2 Turn-taking (scene with you + host, optional guest)

Addressee gets the floor. With one bot present, the bot replies. With two bots, the addressee bot replies (inferred from prose: name mention, gaze, context).
Interjection allowance (only when guest is present): a classifier call decides whether the non-addressee bot interjects this turn. If yes, it produces a short reaction beat after the addressee's reply. If no, it silently witnesses.
State-update pass on every present entity after every utterance, not just the speaker. Silent witnesses still update edges (BotB watching BotA say something cruel updates botB → botA).

6.3 Speaker prompt assembly

For the speaking bot, the prompt is assembled from their own state and their own witnessed memories — never from a global view:

Speaker identity + current state (mood, goals).
Speaker → you edge (and speaker → other-bot edge, if guest is present).
Group node (Phase 2+, only if all 3 present).
Chat-state snapshot: time, weather, location.
Active scene description.
Activity snapshot for all present entities (always a small structured block — anchors spatial grounding).
Active threads (Phase 3).
Recent dialogue window.
Retrieved memories (top-K, witness-filtered, speaker-owned).
Currently active events + props (Phase 3).

6.4 Drawer (state visibility)

A collapsible right-side drawer. Closed by default. When open, shows for the current chat:

Current scene + container.
Activity record per present entity.
Edges (host ↔ you, host ↔ guest if any).
Recent witnessed memories from the host's POV.
Active threads and currently active events (Phase 3).

Read-only by default; each row has an edit affordance for surgically fixing things the LLM got wrong (full edit surface lands progressively across phases).

7. Scene Lifecycle

7.1 Starts

First-ever scene with a bot → kickoff plays after the kickoff prose is parsed and the user confirms the structured form.
Returning to an existing chat → resume in-place. Same container, same activity, same active scene. No auto time advancement.
Adding a guest bot → guest just appears in the current scene. The user narrates any in-fiction justification in prose. botA ↔ botB edges initialize per the per-pair "have they met?" answer if it's their first co-appearance.

7.2 Closes (hybrid auto + manual)

A scene closes when one of the following fires:

Auto, hard signals:
- Container change (parsed from prose — "we drove to the park", "we stepped outside").
- Declared time skip (Phase 3).
- Explicit user pattern ("we're done here", "fade out", etc.) recognized by classifier.
Manual: "Close scene" button in the drawer for soft transitions the user wants bookmarked.

False positives are the bigger risk than false negatives, so the auto-detector errs conservative; manual close is always available.

7.3 On close

Significance classifier pass on the closing scene.
Per-POV summaries written — one per witness, from their angle. No omniscient narration.
Edge updates applied (affinity, trust, summary, knowledge, last-interaction).
Closed events finalize; their props remain in the closed event record (do not promote to memory).
Raw dialogue archived to cold storage.
New active scene is opened (resume or fresh).

8. Memory Retrieval

8.1 Always-loaded (no retrieval cost)

Pinned memories (user pins from the drawer).
Current scene's running dialogue window.
Active threads (Phase 3).
Last N scenes' per-POV summaries from the current chat (N tunable; default 3).

8.2 Retrieved (top-K)

Phase 1: SQLite FTS5 over memories.pov_summary for the speaker. Filter WHERE owner_id = speaker AND witness_bit_for_speaker = 1 (hard SQL constraint, not a soft signal).
Phase 4: vector search via sqlite-vss/sqlite-vec, same filter.
Default K = 4. Recency boost + significance boost in ranking.

8.3 Witness rule (non-negotiable)

A bot cannot retrieve memories whose witness bit for them is 0. Period. This is the mechanism preventing bots from referencing things they couldn't possibly know.

8.4 Cross-chat memory

A bot's memory store contains memories from any chat the bot has been in (host or guest). All are retrievable. Bots may reference cross-chat events naturally; precise cross-chat time arithmetic is not attempted.

9. Time, Skips, Events (Phase 3 surface)

Each chat has its own clock; advances only on explicit user skip commands within that chat.
Elision skip — "skip to when we arrive". Resolves to end-of-current-action; activity completes; landing state set; brief transition narration generated.
Jump skip — "next morning", "a week later". User is prompted: "anything notable happen?" Answer becomes synthesized memory(ies) for the speaker bot via classifier call. Chat clock advances. Activity is reset to a coherent landing state.
Events — lifecycles (planned | active | completed | cancelled | expired) with their own scoped props. Promotion rules on close (only the four categories: object acquired → inventory, knowledge gained → edge knowledge, relationship change → edge summary, everything else stays in closed record).

Phase 1 has no skips and no events. Time is set at kickoff and stays put unless the bot is reset.

10. Rollback, Regenerate, Reset

Rewind to here — button on every turn. Truncates event log past that turn; rebuilds projection. Confirmation modal shows turn count + scene transitions affected. Always snapshots pre-rewind for "undo rewind".
Regenerate this turn — button on the latest bot turn. Edit-then-regenerate: the user may edit their preceding turn before re-running. Replaces the old assistant_turn event with a new one carrying the new outcome. Downstream classifier passes (state updates, significance) re-run on the new output.
Reset bot — full wipe with hard confirm (type bot name). Behavior detailed in §5.3.
Branching, hide-from-view, surgical delete + cascade with impact preview — Phase 4. Mechanically supported by event sourcing already; no UI yet.

11. Compression & Promotion

Significance pass — classifier call after each turn (queued, async) tags the turn 0–3.
Per-POV summaries — written per witness when a scene closes. Different details, different interpretations. No omniscient narration.
Promotion rules (the "picnic basket" rule):
- Object acquired → entity inventory.
- Knowledge gained → relevant edge's knowledge.
- Relationship change → edge summary.
- Everything else stays in the closed event/scene record. Surfaces only on explicit recall.
Compression tiers:
- Last scene: full dialogue retained.
- Recent scenes: per-POV summary + key quotes.
- Older scenes: per-POV summary only.
- Distant past: rolled into edge summaries.

12. Persistence & Ops (v1 defaults)

SQLite WAL mode, foreign keys on, transactional turns.
Project-folder layout (DB lives inside the repo, gitignored):
- DB: <repo>/data/chat.db
- Backups: <repo>/data/backups/ (timestamped copies)
- Pre-rewind snapshots: <repo>/data/snapshots/
- Significant-scene JSON exports: <repo>/data/exports/
- Config: <repo>/data/config.toml (holds Featherless API key, model names, OOC marker, K, budget, etc. Gitignored.)
- The entire data/ tree is in .gitignore so secrets and state never get committed.
- CHAT_DB_PATH env var honored as an override if you want to point at a different file (e.g., a backup or a sibling repo's data).
Auto-backup nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned.
Significant-scene JSON exports written to data/exports/ when scenes close at significance ≥ 2.
Schema versioned in a meta table; migrations applied on startup.

13. Phase Cut

Phase 1 (v1) — must build end-to-end before any Phase 2 work

Featherless client (OpenAI-compatible) with narrative_model and classifier_model configured.
Schema, event log, projector, replay.
Bot authoring UI (form-based) including kickoff prose + parse-and-confirm.
Single-bot chat (host only, no guest yet):
- Mixed-prose input with ((parens)) OOC marker.
- Addressee = host. No interjection logic yet.
- Narrative streaming.
- Post-turn state-update pass.
Drawer: read-only first; edit-on-demand may land in Phase 1.5.
Scene close (hard-signal auto + manual button) with per-POV summary for the host bot only.
Memory: witness flag stored, FTS5 retrieval (K=4), recency + significance boost.
Rewind + regenerate (with edit-then-regenerate).
Reset bot (hard confirm).
Per-chat clock (set at kickoff; no skips yet).
Nightly backups + pre-rewind snapshots.

Phase 2 — multi-entity

Guest bot in chat (3-entity scene config).
Interjection classifier call.
Witness filtering across multiple owners.
Group node (when all three present).
Per-pair "have they met?" prompt.
botA ↔ botB edges.

Phase 3 — events, skips, threads

Events with lifecycles and scoped props.
Time skips: elision and jump.
Active threads.
Significance classifier improvements.
"Meanwhile…" scenes (scene config 4) — autonomous.

Phase 4 — polish

Vector retrieval (sqlite-vss or sqlite-vec).
Branching UI.
Drawer-edit on every field.
Backup tooling improvements.
Significance review UI.
Surgical delete + cascade with impact preview.
Hide-from-view soft delete.

14. Open / Deferred Decisions

Resolved by this brainstorm (now reflected in §3 / §6 / §12 above):

~~Classifier model name~~ → NousResearch/Hermes-3-Llama-3.1-8B, with documented fallback chain.
~~Token budget tier strategy~~ → §3.2 (8K / 6K narrative, 4K classifier; must / should / nice tiers).
~~UI framework~~ → FastAPI + HTMX + SSE, multi-tab sync as a Phase 1 requirement (§3.1).
~~OOC marker~~ → ((double parens)), configurable.
~~DB file location~~ → project-folder <repo>/data/ tree (§12).

Still deferred:

Embedding model (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time).
sqlite-vss vs sqlite-vec (Phase 4 — pick based on the projects' state at the time).
Significance scoring rubric — what does 0/1/2/3 mean? Drafted during Phase 1 against real scenes.
Activity-record action verbs — open vocabulary or constrained list? Decided during Phase 1 implementation.
Drawer edit-affordance UX — which fields editable in v1, which slip to Phase 1.5 / Phase 4.

15. Non-Negotiables (rules every implementer must respect)

State changes go through the event log. Never UPDATE a state row directly; append an event, let the projector apply it.
Witness filter every memory read. A bot must not retrieve memories whose witness bit for them is 0.
Edges are directed. botA → botB and botB → botA are independent.
Don't promote event props to memory. Only the four promotion categories (object, knowledge, relationship change, narrative gist via summary).
Per-POV, not omniscient. Scene summaries are written per witness, from their angle.
Activity block is always in the prompt. Spatial grounding prevents "leaning on the kitchen counter while in a car" failures.
Streaming on the inference path; non-blocking bookkeeping runs while the LLM streams.
No extra services. SQLite + a process. Push back on suggestions to add infrastructure.

Appendix A — Decisions Log (this brainstorm)

#	Decision	Choice
1	Primary experience	Relational + companion RP
2	Content scope	Explicit content allowed
3	Backend	Featherless (OpenAI-compatible API), Dolphin-Mistral-24B-Venice for narrative
4	Authoring	Rich structured fields
5	Input style	Mixed prose, novel-style
6	Multi-bot turn-taking	Addressee + interjection allowance
7	Bot autonomy	Strictly turn-based for v1; "meanwhile…" deferred to P2-3
8	Sessions/campaigns	Bots are persistent units; one chat per bot; guests addable; bot-owned memory
9	First co-appearance	Per-pair "have they met?" prompt
10	Scene starts	Kickoff for new; resume in-place for returning; just-appear for guests
11	State visibility	Optional drawer, closed by default, read-only with edit-on-demand
12	Scene close	Hybrid (hard-signal auto + manual button)
13	Time	Per-chat clock, advanced only by explicit user skips
14	Model strategy	Small classifier model + large narrative model
15	Reset	Full wipe + hard confirm; chat sits ready for kickoff
16	Rollback	Rewind + regenerate (with edit-then-regenerate)
17	UI framework	FastAPI + HTMX + SSE; multi-tab sync as a Phase 1 requirement
18	Classifier model	`NousResearch/Hermes-3-Llama-3.1-8B` (fallbacks: `dolphin-2.9.4-llama3-8b`, `Meta-Llama-3.1-8B-Instruct-abliterated`)
19	Token budgets	Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2
20	OOC marker	`((double parens))`, configurable
21	DB location	Project-folder `<repo>/data/` tree (DB, backups, snapshots, exports, config). Gitignored. `CHAT_DB_PATH` env var honored

23 KiB Raw Blame History Unescape Escape