44 KiB
v1 Requirements & Behavioral Design
Date: 2026-04-26 Status: approved (brainstorming complete; pending writing-plans) Companion document: rp-engine-design.md — architectural design
This document captures the product- and behavior-level requirements for v1, derived from a structured brainstorm. The architectural design doc covers the how (event sourcing, schema sketch, prompt construction). This doc covers the what (user experience, scope cuts, and the contract between the user and the system).
1. Product Vision
A local-first, single-user roleplay engine for relational and companion-style RP. The goal is bots that feel persistent, consistent, and like they have inner lives — durable across long horizons, immune to the three classic failure modes (memory loss, quality decay, prop pollution).
The LLM is treated as a renderer for structured world state, not as the state-holder. State lives in an event-sourced SQLite database that survives any model outage and is replayable for free.
2. Scope
2.1 In scope (v1)
- Single user, single Mac (always-on).
- A library of bots the user has authored. Each bot is a persistent entity with its own identity, memory, edges, and clock.
- One chat per bot. A second bot can be added as a guest into any chat. Hard cap: 2 bots in any scene.
- Explicit / mature content allowed.
- Featherless as the LLM backend over its OpenAI-compatible API. Two model slots:
narrative_model—dphn/Dolphin-Mistral-24B-Venice-Edition(uncensored, narrative-grade). 32K context.classifier_model—NousResearch/Hermes-3-Llama-3.1-8B(uncensored, tuned for tool use / structured output). 128K context. Fallback chain if it underperforms on JSON:cognitivecomputations/dolphin-2.9.4-llama3-8b→mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated.- Classifier is used for: turn parsing (dialogue/action/ooc), kickoff prose parsing, scene-close detection, interjection decisions, significance scoring, state-update extraction, jump-skip memory synthesis.
2.2 Out of scope (v1)
- Multi-user / multi-device.
- More than 2 bots in a scene.
- Proactive bot contact / out-of-session messages / push notifications.
- Background world ticks ("meanwhile…" scenes that play while the user is idle) — deferred to Phase 2-3.
- Branching timelines with UI (mechanically supported by event sourcing; not exposed yet).
- Multimodality: no portraits, voice, or images. Defer indefinitely.
2.3 Non-functional constraints
- All state survives any model or network outage.
- Each turn writes one transactional event-log batch — no half-applied state.
- Streaming required for narrative output.
- One SQLite file is the entire data layer. No Postgres, Redis, Pinecone, Docker.
3. Architecture & Backend
[ Mac, always-on ] [ Featherless, stateless ]
Web UI ──► Orchestrator ──► LLM client ──► narrative_model (Dolphin-Mistral-24B-Venice)
│ classifier_model (small, TBD)
├── Event log + projector
├── SQLite (one file)
└── Retrieval + prompt builder
The orchestrator never knows which model is in use — only generate(prompt, params) -> text (streamed). The Featherless client is one implementation; mocks and other backends can drop in for tests or future migration.
API key handling: keys live in a local config file outside the repository. Never commit a key to the repo, paste in chat logs, or include in exports.
3.1 Runtime stack
- Backend: Python 3.11+ with FastAPI as the HTTP server.
- Frontend: server-rendered HTML + HTMX + minimal vanilla JS/CSS. No JS build chain.
- Live updates: Server-Sent Events (SSE) per chat. Server keeps a per-chat in-process pub/sub channel (an
asyncio.Queueper chat_id). Every browser tab on/chats/<id>opens an SSE connection to/chats/<id>/events. State changes (new turn, streamed tokens, drawer state, edge updates, scene close) publish to the channel; all subscribed tabs receive the event and HTMX swaps the relevant DOM region. - Multi-tab sync is a Phase 1 requirement, not a polish item. Two browser tabs open to the same chat must mirror each other in real time. Implications:
- In-progress typing is tab-local until submit (no collaborative input in v1).
- On reconnect/refresh, the server first sends a "current state" snapshot, then resumes streaming.
- The same architecture trivially supports a phone or tablet on the LAN later — bind to
0.0.0.0+ add a shared-secret token if/when desired. Default is127.0.0.1, no auth.
3.2 Token budgets and trimming tiers
Token accounting via tiktoken with the closest cl100k approximation. Mistral and Llama tokenizers diverge ~5%; we accept the drift.
- Narrative prompt: 8K hard ceiling, 6K soft target. Leaves ~2-4K headroom for streamed output and avoids long-context performance cliffs. Plenty for our prompt shape.
- Classifier prompt: 4K hard ceiling. Most calls are well under 1K.
When the assembled prompt exceeds the soft target, trim in this order — never trim must-include:
- MUST-include (always present):
- System message + speaker identity
- Speaker's edge to the addressee
- Activity snapshot for all present entities
- Current scene description
- Last 4 turns of dialogue
- SHOULD-include (trim when over budget):
- Other edges of the speaker (e.g. speaker → other present)
- Group node summary (when applicable)
- Active threads
- Currently active events + props
- NICE-include (trim first):
- Retrieved memories beyond top-2 (drop K=4 to K=2)
- Dialogue turns beyond the last 4 (replace older turns with a one-line summary)
- Per-POV summary of the previous scene
3.3 Classifier failure handling
The classifier does ~5 different jobs (turn parse, scene-close detection, interjection, significance, state-update). Any can fail (malformed JSON, refusal, timeout). The play loop must never block on classifier errors.
- Constrained output first. Use Pydantic models for every classifier call, passed through
instructor(or Featherless-native JSON-schema mode if available). - One retry on parse failure with a stricter "respond with JSON only" reminder appended. Same prompt, same model.
- Schema-default fallback on second failure:
- turn-parse default: treat the whole turn as one
dialoguesegment. - scene-close default: don't close.
- interjection default: don't interject.
- significance default: 1 (Notable). Conservative.
- state-update default: no deltas, no facts; just bump
last_interaction.
- turn-parse default: treat the whole turn as one
- Log every fallback to a
classifier_failurestable (event_id, kind, raw_text, attempt_count). Drawer dev panel surfaces a count. - Timeout: 10s per classifier call. Beyond that, fall back. Narrative model has no orchestrator-imposed timeout (let it stream).
- Refusal detection: if the response starts with a refusal pattern ("I can't", "I'm sorry, but…") and isn't valid JSON, treat as a parse failure and retry with the JSON-only reminder. If it refuses again, swap to the next model in the fallback chain (
dolphin-2.9.4-llama3-8bfirst), automatically, for that one call. Log it.
3.4 Edge update granularity
Edges drift smoothly turn-to-turn so retrieval and prompt assembly always see current values; long-form summary only churns at compression boundaries.
Per turn (cheap, small): the post-utterance state-update classifier on each present entity produces a delta:
affinity_delta(signed integer, ±0–3 on a 0-100 scale; conservative).trust_delta(same shape).knowledge_facts: [string]— new things this entity now knows about the target.last_interactionis bumped to the current chat-id + chat-clock unconditionally.
Per scene close (expensive, summarizing):
- Edge
summaryrewritten by the classifier from the per-POV summary of the closing scene plus the prior summary. Aggregates the meaningful arc. - "Shared private moments" list appended to (one entry per closed scene at significance ≥ 2).
Every edge mutation goes into the event log as an edge_update event so rollback works.
4. Data Model (top-level entities)
- Bot — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
- Chat — exactly one per bot, with that bot as host. A chat carries an optional current guest bot, its own clock, and its own active scene. Chats do not own memories or identity — those are bot-owned.
- You — singleton entity with a light identity (name, voice/persona summary).
- Edges — per-pair, persistent across chats:
bot → you,you → botfor every authored botbotA → botB,botB → botAinitialized the first time two bots co-appear- Edges hold: affinity, trust, summary, knowledge known about target, last interaction (chat-id + chat-clock), shared private moments. Asymmetric — never collapsed into a single shared "relationship" field.
- Memory — bot-owned. Each memory carries: a 3-bit
[you, host, guest]witnessed-by mask, the chat it occurred in, that chat's clock at the time, source/reliability if secondhand, significance, embedding (Phase 4). - Events / threads — scoped to the chat where they exist. (Phase 3.)
- Scenes — recorded per chat, with per-POV summaries written for each present witness on close.
4.1 Per-chat clocks
Each chat has its own chat_state.time, initialized from its host bot's kickoff scene and advanced only by explicit user time skips within that chat. Two clocks for two chats are independent — you may spend in-fiction days with BotA without BotB's clock advancing.
When BotB guests in BotA's chat, the scene runs on BotA's chat clock; memories written to both bots are timestamped at that clock value. BotB's chat clock is unchanged. Cross-chat time arithmetic is intentionally fuzzy — bots can reference cross-chat events ("when I came over that night") but the system does not claim precise "X days ago" math across chats. Within a single chat, time math is precise.
4.2 Bot library
Bots are top-level. Bot count in the system is unbounded. Per-scene cap is 2 bots (you + 1 host + optional 1 guest, or you + 1 host with no guest).
4.3 Chat clock format (storage vs display)
- Stored as a precise UTC datetime (ISO 8601). Initialized at kickoff to a sensible default (the kickoff time mentioned in prose, or a "now in-fiction" fallback).
- Displayed in the chat header as a friendly relative format: "Tuesday evening, 9:14pm" or "Day 3 — late afternoon". Helps preserve fictional feel. Computed from the stored datetime + the chat's narrative anchor (the in-fiction date corresponding to "Day 1").
- The drawer shows the precise stored datetime alongside the friendly form.
- Time skips advance the stored datetime; the friendly format re-renders.
- This split lets the system do precise time arithmetic (last-interaction calculations, Phase 3 event triggers) while showing humane time to the user.
5. Authoring
Authoring is structured, done in a form-based UI. A bot is created once, then edited through reset (full wipe) or by amending immutable identity fields directly.
5.1 Authored fields per bot
Identity (immutable per session):
- Name (required).
- Persona paragraph — free-form prose; goes into must-include identity block.
- Voice samples — 1–3 short prose passages in the bot's voice. Stored joined with
---separators. Injected into the speaker prompt as a must-include block: "Voice reference. Match this register, vocabulary, and rhythm: {samples}". Always sent in full in v1 (rotation strategy is Phase 2 fallback if budget pressure forces it). - Trait list — free-form list of comma- or newline-separated phrases ("introverted, quick to anger, terrible at small talk, loves cats"). 3–15 items typical, no hard cap. Stored as
traits: list[str]. Rendered in prompt asTraits: …. Not Big Five / MBTI — free-form anchors voice without biasing the model. - Backstory — free-form prose, single string. Soft target 100–500 words; no hard cap, but trim tier downgrades long backstories under budget pressure.
Per-bot relationship & first encounter:
- Initial relationship to you: free-form prose ("BotA is my coworker; we've worked together for two years; she has a crush on me she hasn't admitted"). Parsed once into seeded
you ↔ botedge content on first run. - Kickoff scene: free-form prose describing the first encounter ("you stay late at the office; only you and BotA are there; she's at her desk pretending to work"). Parsed on first init into structured container, activity, seed edge content, and initial scene state. The user confirms or edits the parsed result before play begins.
5.2 First co-appearance: "have they met?"
The first time two bots appear in the same scene, the orchestrator prompts: Have BotA and BotB met before?
- Yes → user writes a short prose seed describing how they know each other. Parsed into initial
botA ↔ botBedge content. - No → edges initialize empty. The bots actually meet on-screen.
5.3 Reset
Reset is a per-bot action with hard confirmation (type the bot's name).
- Wipes: chat history, memories, scenes, edges involving the bot, current scene state.
- Preserves: identity, initial edge seed, kickoff scene.
- After reset, the chat sits ready — kickoff does not auto-play. The next user message triggers kickoff.
5.4 "You" entity authoring (one-time, shared across all chats)
- Name (required).
- Pronouns (optional, freeform string; if empty, model infers).
- Persona blob (optional, recommended) — short paragraph for context ("30, software engineer, lives alone, dry sense of humor"). Provides backdrop the bots can use without overriding per-bot framing.
No voice samples for "you" — the user provides their own dialogue, the bot doesn't need to mimic the user's voice. Per-bot specifics about how each bot knows you stay in the per-bot "initial relationship to you" field (§5.1).
6. Play Loop
6.1 Input convention (mixed prose, novel-style)
A turn is free-form prose with conventional markers:
*walks over*— action.- Quoted or bare text — dialogue.
((double parens))— out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; stored as a config field; the user may change it before play begins.)
A small classifier call splits the turn into segments tagged dialogue | action | ooc. Action segments update the user's activity record.
6.2 Turn-taking (scene with you + host, optional guest)
- Addressee gets the floor. With one bot present, the bot replies. With two bots, the addressee bot replies (inferred from prose: name mention, gaze, context).
- Interjection allowance (only when guest is present): a classifier call decides whether the non-addressee bot interjects this turn. If yes, it produces a short reaction beat after the addressee's reply. If no, it silently witnesses.
- State-update pass on every present entity after every utterance, not just the speaker. Silent witnesses still update edges (BotB watching BotA say something cruel updates
botB → botA).
6.3 Speaker prompt assembly
For the speaking bot, the prompt is assembled from their own state and their own witnessed memories — never from a global view:
- Speaker identity + current state (mood, goals).
- Speaker → you edge (and speaker → other-bot edge, if guest is present).
- Group node (Phase 2+, only if all 3 present).
- Chat-state snapshot: time, weather, location.
- Active scene description.
- Activity snapshot for all present entities (always a small structured block — anchors spatial grounding).
- Active threads (Phase 3).
- Recent dialogue window.
- Retrieved memories (top-K, witness-filtered, speaker-owned).
- Currently active events + props (Phase 3).
6.4 Drawer (state visibility)
A collapsible right-side drawer. Closed by default. When open, shows for the current chat:
- Current scene + container.
- Activity record per present entity.
- Edges (host ↔ you, host ↔ guest if any).
- Recent witnessed memories from the host's POV (with significance markers
·•★★★). - Pinned memories in their own section with a
n / 8counter and an unpin affordance per row. - Active threads and currently active events (Phase 3).
- Snapshots panel (Phase 4 surface; data exists from Phase 1).
v1 editable fields (highest-value rescues for LLM errors):
- Activity record:
action.verb,attention,posture(text fields). Fixes "leaning on the kitchen counter while in a car". - Edges:
affinity,trust(sliders 0–100),summary(textarea),knowledge_facts(add/remove list). Fixes misread moments. - Memory:
pov_summary(textarea),significance(dropdown 0–3), pin toggle. Fixes wrong summaries.
Read-only in v1 (Phase 4 makes editable):
- Container properties.
- Identity fields (immutable per session by design — change via reset).
- Witness flags (rewriting these silently changes continuity logic).
- Structural fields (
owner_id,scene_id,chat_id).
Every drawer edit goes through the event log as a manual_edit event capturing the prior value, so it is fully reversible.
6.5 Activity record specifics
The activity record per entity carries:
current_action.verb— open string ("driving", "writing an email", "lying in bed reading"). No enum.current_action.interruptible— bool. Classifier-extracted alongside the verb on each activity update.current_action.required_attention—low | medium | high. Classifier-extracted.current_action.expected_duration— text estimate ("a few minutes", "an hour", "ongoing"). Classifier-extracted.current_action.started_at— chat-clock snapshot when the action begins.posture—standing | sitting | lying | …(open string; classifier-extracted).position—{container_id, slot_name}.holding— list of items (open strings).attention— optional free-form string ("the road", "her phone", "you", "the document"). Set only when prose makes attention explicit; left empty otherwise. When empty, prompt assembly omits the field rather than rendering "attention: unknown".status—{conscious, sober, injured, …}(open dict).
The activity block in the prompt renders verb + structured properties + posture + attention (if set) so the narrative model has both the natural-language verb and the structured constraints to dramatize ("BotA is driving — high attention, not interruptible, attention on the road").
7. Scene Lifecycle
7.1 Starts
- First-ever scene with a bot → kickoff plays after the kickoff prose is parsed and the user confirms the structured form.
- Returning to an existing chat → resume in-place. Same container, same activity, same active scene. No auto time advancement.
- Adding a guest bot → guest just appears in the current scene. The user narrates any in-fiction justification in prose.
botA ↔ botBedges initialize per the per-pair "have they met?" answer if it's their first co-appearance.
7.2 Closes (hybrid auto + manual)
A scene closes when one of the following fires:
- Auto, hard signals:
- Container change (parsed from prose — "we drove to the park", "we stepped outside").
- Declared time skip (Phase 3).
- Explicit user pattern ("we're done here", "fade out", etc.) recognized by classifier.
- Manual: "Close scene" button in the drawer for soft transitions the user wants bookmarked.
False positives are the bigger risk than false negatives, so the auto-detector errs conservative; manual close is always available.
7.3 On close
- Significance classifier pass on the closing scene.
- Per-POV summaries written — one per witness, from their angle. No omniscient narration.
- Edge updates applied (affinity, trust, summary, knowledge, last-interaction).
- Closed events finalize; their props remain in the closed event record (do not promote to memory).
- Raw dialogue archived to cold storage.
- New active scene is opened (resume or fresh).
7.4 Container authoring (parse-and-extend)
- Kickoff parse creates the initial container for a bot's chat from the kickoff prose ("you stay late at the office" → container
office, typeworkplace, public, slots auto-named from context). - Transitions during play ("we drove to the park", "let me grab a drink from the kitchen") are detected by the scene-close classifier; if a container with that name doesn't exist yet in this chat, it's created on the fly with classifier-inferred defaults; if it does, it's reused.
- Drawer shows the current container with all fields editable (Phase 4) or read-only with manual creation (v1). New containers can be pre-authored in the drawer if you want.
- Schema:
containers(id, chat_id, name, type, properties_json, parent_id)whereproperties_jsonholds{moving: bool, public: bool, audible_range: text, slots: [{name, occupant_id?}]}. - Containers are scoped per chat. BotA's "apartment" and BotC's "apartment" are distinct records — no name collision.
7.5 Guest leaves mid-scene
Adding a guest then having them leave is a normal flow. What happens:
- The user removes the guest via the drawer ("Remove BotB from scene"), or BotB exits in prose ("BotB grabs her coat and heads out") — the classifier detects exit on the next turn and prompts for confirmation.
- The current scene closes at the moment of guest exit (this counts as a hard close signal). Per-POV summaries are written for all three witnesses including BotB. Edges update.
- A new scene immediately opens with just you + host bot, in the same container, with carry-over activity. No need to re-narrate "now we're alone".
- Witness flags from the closed scene stay
[1, host, guest]for the period BotB was present — that data is permanent and travels with the memory. - The group node persists for the chat (its content is updated on close, not deleted), available for future co-presences in this chat. If BotB never returns, the group node just sits unused.
- BotB's chat clock is unchanged — it remains wherever it was when last visited (per §4.1).
- BotB's memories of the scene are written to BotB's memory store, available next time you talk to BotB alone.
8. Memory Retrieval
8.1 Always-loaded (no retrieval cost)
- Pinned memories (user pins from the drawer).
- Current scene's running dialogue window.
- Active threads (Phase 3).
- Last N scenes' per-POV summaries from the current chat (
Ntunable; default 3).
8.2 Retrieved (top-K)
- Phase 1: SQLite FTS5 over
memories.pov_summaryfor the speaker. FilterWHERE owner_id = speaker AND witness_bit_for_speaker = 1(hard SQL constraint, not a soft signal). - Phase 4: vector search via sqlite-vss/sqlite-vec, same filter.
- Default
K = 4. Recency boost + significance boost in ranking.
8.3 Witness rule (non-negotiable)
A bot cannot retrieve memories whose witness bit for them is 0. Period. This is the mechanism preventing bots from referencing things they couldn't possibly know.
8.4 Cross-chat memory
A bot's memory store contains memories from any chat the bot has been in (host or guest). All are retrievable. Bots may reference cross-chat events naturally; precise cross-chat time arithmetic is not attempted.
8.5 Pinning
- User pins via the drawer (pin icon next to a memory row).
- Pinned memories are always-loaded for the speaker (no retrieval cost).
- Soft cap: 8 pins per bot.
- Pivotal-significance (score 3) memories are auto-pinned for the witness whose POV they're in.
- When over cap, the oldest auto-pin is unpinned (not deleted — just removed from the pinned set; still findable via retrieval). Manual pins are never auto-evicted — user must unpin manually.
- Drawer shows pinned memories in their own section with a
n / 8counter and an unpin affordance per row.
9. Time, Skips, Events (Phase 3 surface)
- Each chat has its own clock; advances only on explicit user skip commands within that chat.
- Elision skip — "skip to when we arrive". Resolves to end-of-current-action; activity completes; landing state set; brief transition narration generated.
- Jump skip — "next morning", "a week later". User is prompted: "anything notable happen?" Answer becomes synthesized memory(ies) for the speaker bot via classifier call. Chat clock advances. Activity is reset to a coherent landing state.
- Events — lifecycles (
planned | active | completed | cancelled | expired) with their own scoped props. Promotion rules on close (only the four categories: object acquired → inventory, knowledge gained → edge knowledge, relationship change → edge summary, everything else stays in closed record).
Phase 1 has no skips and no events. Time is set at kickoff and stays put unless the bot is reset.
10. Rollback, Regenerate, Reset
10.1 Rewind
-
Button on every turn. Truncates event log past that turn; rebuilds projection.
-
Pre-rewind snapshot is always taken automatically before truncating. Stored in
data/snapshots/rewind/. Retained 14 days then pruned. -
Confirmation modal shows an impact preview:
Rewind to turn 47? This will remove: • 12 messages (turns 48–59) • 1 scene transition (drive to park) • 2 edge updates (BotA → You, Group) • 3 memories from BotA's store • 1 fired event (arrived at park) • 1 manual edit (BotA affinity) A pre-rewind snapshot will be saved automatically. [ Cancel ] [ Rewind ] -
After successful rewind, a 30-second toast appears: "Rewound 12 turns. [Undo]" — clicking restores from the pre-rewind snapshot. After the toast dismisses, the snapshot is still on disk and reachable from the drawer's snapshots panel (Phase 4 UI; data is there from Phase 1).
-
"Rewind, keep current as branch" is Phase 4.
10.2 Regenerate (inline, not modal)
Clicking Regenerate on the latest bot turn:
- Scrolls to your last user turn and puts it into an inline edit state (textarea pre-filled with your prose).
- The bot's response below shows a faded "regenerating…" placeholder.
- Submit button is Regenerate (not Send). Hitting it:
- If you edited: appends a
user_turn_editevent capturing the new prose, then a newassistant_turnevent with the new generated response. - If you didn't edit: appends only a new
assistant_turnevent.
- If you edited: appends a
- The previous
assistant_turnevent is superseded, not deleted — kept in the log with asuperseded_bypointer so it's recoverable. Display hides it. - Downstream classifier passes (state-update, significance) re-run on the new response.
- [ Cancel ] reverts to the original (no event written).
10.3 Reset bot
Full wipe with hard confirm (type bot name). Behavior detailed in §5.3.
10.4 Snapshot frequency & retention
- Periodic snapshot: every 100 events OR every 30 minutes of activity, whichever first. Stored in
data/snapshots/periodic/. - Retention: keep last 5 periodic snapshots. Pre-rewind snapshots retained 14 days.
- Cold-load behavior: replay starts from the most recent periodic snapshot, then applies events forward. Bounds replay cost on app start.
10.5 Deferred to Phase 4
Branching with UI, hide-from-view soft delete, surgical delete + cascade with impact preview. Mechanically supported by event sourcing already; no v1 UI surface.
11. Compression & Promotion
11.1 Significance rubric
Classifier call after each turn (queued, async) tags the turn 0–3. Scene significance is the max turn-significance within the scene.
| Score | Name | Definition | Examples |
|---|---|---|---|
| 0 | Routine | Banter, small talk, ordinary action. Forgettable on its own; aggregates only via edge stats. | "Hi, how was your day?" / "Fine, you?" |
| 1 | Notable | A specific detail, opinion, or beat worth remembering but not arc-changing. Default for non-trivial dialogue. | BotA mentions a band she likes; you discover BotB hates a food. |
| 2 | Significant | A scene-level moment — meaningful confession, real disagreement, a date, a confided secret. | First date; BotA tells you about her sister; an argument. |
| 3 | Pivotal | A relationship-altering event. Updates edge summary and (often) affinity substantially. Always auto-pinned. |
First kiss; betrayal; "I love you"; learning a defining secret. |
Where each level is used:
- Retrieval ranking: significance multiplier applied as
score × constantto FTS / vector rank. - Compression: scenes with max-turn-significance ≥ 2 retain key quotes; ≤ 1 collapse fully into the per-POV summary.
- Exports: scene-close JSON written to
data/exports/when scene significance ≥ 2. - Auto-pin: turns scored 3 are auto-pinned for each witness whose POV they're in.
- UI hint: drawer renders score as
·,•,★,★★.
Tie-breakers: turn significance is the max across emotional, factual, and relational facets; the classifier returns the max. Conservative bias — when uncertain, score lower.
11.2 Per-POV summaries
Written per witness when a scene closes. Different details, different interpretations. No omniscient narration.
11.3 Promotion rules ("picnic basket" rule)
- Object acquired → entity inventory.
- Knowledge gained → relevant edge's
knowledge. - Relationship change → edge
summary. - Everything else stays in the closed event/scene record. Surfaces only on explicit recall.
11.4 Compression tiers
- Last scene: full dialogue retained.
- Recent scenes: per-POV summary + key quotes (only if significance ≥ 2; else summary only).
- Older scenes: per-POV summary only.
- Distant past: rolled into edge summaries.
12. Persistence & Ops (v1 defaults)
- SQLite WAL mode, foreign keys on, transactional turns.
- Project-folder layout (DB lives inside the repo, gitignored):
- DB:
<repo>/data/chat.db - Backups:
<repo>/data/backups/(timestamped copies) - Pre-rewind snapshots:
<repo>/data/snapshots/ - Significant-scene JSON exports:
<repo>/data/exports/ - Config:
<repo>/data/config.toml(holds Featherless API key, model names, OOC marker, K, budget, etc. Gitignored.) - The entire
data/tree is in.gitignoreso secrets and state never get committed. CHAT_DB_PATHenv var honored as an override if you want to point at a different file (e.g., a backup or a sibling repo's data).
- DB:
- Auto-backup nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned.
- Significant-scene JSON exports written to
data/exports/when scenes close at significance ≥ 2. - Schema versioned in a
metatable; migrations applied on startup.
13. Phase Cut
Phase 1 (v1) — must build end-to-end before any Phase 2 work
- Featherless client (OpenAI-compatible) with
narrative_modelandclassifier_modelconfigured. - Schema, event log, projector, replay.
- Bot authoring UI (form-based) including kickoff prose + parse-and-confirm.
- Single-bot chat (host only, no guest yet):
- Mixed-prose input with
((parens))OOC marker. - Addressee = host. No interjection logic yet.
- Narrative streaming.
- Post-turn state-update pass.
- Mixed-prose input with
- Drawer: read-only first; edit-on-demand may land in Phase 1.5.
- Scene close (hard-signal auto + manual button) with per-POV summary for the host bot only.
- Memory: witness flag stored, FTS5 retrieval (K=4), recency + significance boost.
- Rewind + regenerate (with edit-then-regenerate).
- Reset bot (hard confirm).
- Per-chat clock (set at kickoff; no skips yet).
- Nightly backups + pre-rewind snapshots.
Phase 2 — multi-entity
Status: shipped 2026-04-26 — multi-entity scene support, guest add/remove drawer UX, guest-aware prompt assembly, multi-entity turn flow with interjection classifier, per-POV scene close summaries for every present witness, group_node initialization/update, and bot reset cascade clearing stale chats.guest_bot_id references all landed across the wave5 task series (see CLAUDE.md § "Phase 2 status" for the deliverable summary and follow-ups).
- Guest bot in chat (3-entity scene config).
- Interjection classifier call.
- Witness filtering across multiple owners.
- Group node (when all three present).
- Per-pair "have they met?" prompt.
botA ↔ botBedges.
Phase 3 — events, skips, threads
Status: shipped 2026-04-26 (T49–T67, 19 tasks across 8 waves; schema baseline now version 11; +68 tests). See "Phase 3 status" in CLAUDE.md for the per-task breakdown.
- Events with lifecycles and scoped props.
- Time skips: elision and jump.
- Active threads.
- Significance classifier improvements.
- "Meanwhile…" scenes (scene config 4) — autonomous.
Phase 4 — polish
Status: shipped 2026-04-27 (T88–T102, 15 tasks across 8 waves; +70 tests). See "Phase 4 status" in CLAUDE.md for the per-task breakdown. Vector retrieval shipped via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions); branching is data-model + drawer UI; significance review, hide-from-view soft delete, surgical delete with cascade preview, snapshot UX, and cross-chat search all surface from the drawer or top-bar.
- Vector retrieval (sqlite-vss or sqlite-vec).
- Branching UI.
- Drawer-edit on every field.
- Backup tooling improvements.
- Significance review UI.
- Surgical delete + cascade with impact preview.
- Hide-from-view soft delete.
14. Open / Deferred Decisions
All round-1 and round-2 brainstorm decisions are resolved and folded into §3–§12 / §16. Honest list of what is still deferred after this brainstorm:
- Embedding model (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time).
- sqlite-vss vs sqlite-vec (Phase 4 — pick based on each project's state at the time).
- Exact prompt templates for narrative + each classifier job — drafted against real prompts during Phase 1.
- Schema column-level types and FK details — finalized during Phase 1 schema implementation.
- Token-counting accuracy — accept ~5% drift from
tiktokencl100k vs actual Mistral / Llama tokenizers; revisit if drift causes real budget overruns. - "Snapshots" panel UX in the drawer — Phase 4. Data is written from Phase 1, read-back UI lands later.
- In-fiction time auto-advance during a scene — Phase 1 freezes the chat clock between user-initiated skips. If this feels stale during Phase 1 play, revisit with model-narrated parsing in Phase 3 (§9).
- Search across chat history — out of scope for v1. Phase 4 if needed.
- Avatars / portraits — out of scope (multimodality is deferred indefinitely).
- Performance targets — measured against real prompts in Phase 1; no preset SLOs.
15. Non-Negotiables (rules every implementer must respect)
- State changes go through the event log. Never
UPDATEa state row directly; append an event, let the projector apply it. - Witness filter every memory read. A bot must not retrieve memories whose witness bit for them is 0.
- Edges are directed.
botA → botBandbotB → botAare independent. - Don't promote event props to memory. Only the four promotion categories (object, knowledge, relationship change, narrative gist via summary).
- Per-POV, not omniscient. Scene summaries are written per witness, from their angle.
- Activity block is always in the prompt. Spatial grounding prevents "leaning on the kitchen counter while in a car" failures.
- Streaming on the inference path; non-blocking bookkeeping runs while the LLM streams.
- No extra services. SQLite + a process. Push back on suggestions to add infrastructure.
16. UI Shape & Flow
16.1 Top-level navigation
Persistent left rail with three sections:
- Chats (default). List of every authored bot, each row showing: bot name, last message snippet, last-played-at (real time), unread/idle indicator, current chat clock (in-fiction time). Click → opens that chat. Single active chat per browser tab.
- Bots. Library view. List of authored bots with thumbnails (initials/avatar later), edit / reset / delete actions per bot. "New bot" button at the top.
- Settings. Single screen: Featherless API key, model overrides, OOC marker, K, token budgets, "you" entity authoring, theme.
Top-of-rail: "+" button creates a new bot (jumps to authoring); after author + kickoff, you land in the new chat.
16.2 First-run experience
On fresh install (empty DB):
- App boots → "Set up your profile" screen — fills in the "you" entity (name, pronouns, persona blob).
- After save → "Add your first bot" — bot authoring form with inline guides.
- After save → kickoff parse-and-confirm — orchestrator parses kickoff prose, displays the parsed structured form (container, activity, seed edges) with edit affordances; user confirms.
- Lands in the chat, ready for first turn.
Step 1 is skip-able (revisit in Settings); step 2 is required (no chats without a bot).
16.3 Display formatting
- Bot output rendered as lightweight markdown — paragraphs, italics, bold, blockquotes. No headings or code blocks (foreign to RP prose).
- Action segments (
*walks over*) rendered as italics; dialogue rendered plain. - User input uses the same render rules. The textarea is plain — no live preview.
- OOC
((parens)): shown in the transcript with italic + dimmed + smaller font, set off from surrounding prose. Always visible to you (you should see your own meta-commentary). Stripped from the prompt the bot sees. - Speaker labels: bot turns prefixed with the bot's name in bold; your turns prefixed with your "you" name. Tight spacing.
- Stream rendering: tokens append as they arrive; markdown re-rendered on each chunk. Cursor indicator at the trailing edge while streaming.
16.4 Streaming UX
- Typing indicator: the bot's row appears immediately with name + "…" pulse, then tokens fill.
- Stop button on the streaming bot row halts mid-stream. On stop:
- Partial response is committed as a normal
assistant_turnevent withtruncated: true. - Downstream classifier passes still run on the partial text (state-update, significance) — partial is fine, the bot really did "say up to here".
- You can immediately type your next turn or click Regenerate to redo the whole response.
- Partial response is committed as a normal
- Send-while-streaming: disabled. Input box is locked during stream. Stop first.
- Mid-stream disconnect (Featherless drops, network blip, page refresh): treated as an interrupt with
truncated: true, partial text committed if any was received. UI surfaces "connection lost — partial response saved" banner with a Regenerate button. - Multi-tab streaming: tokens stream to all subscribed tabs simultaneously via the per-chat SSE channel (§3.1). Stop from any tab interrupts for everyone.
16.5 Error UX surface
- Featherless API down / unauthorized / out of credits: failed turn shows an error banner inline ("Featherless: 401 unauthorized" / "rate limited" / "service unavailable") with a Retry button. No event committed on hard failure — the turn never happened.
- Featherless slow (> 30s before first token): warning banner — "Slow response — still waiting…" with the same Stop button. No automatic abort.
- Classifier failure (after fallback ran): silent — fallback values used (per §3.3). Logged to
classifier_failurestable. Drawer dev panel surfaces a count. No user-facing notification in normal play. - DB write failure (rare; disk full, file lock): hard error modal — "Couldn't save your turn — fix and retry". Orchestrator does not advance state.
- Schema-migration failure on startup: app refuses to launch, prints the error and the path to the DB. No automatic repair.
- Missing config / first-run with empty config: redirected to Settings before first chat opens.
Appendix A — Decisions Log (this brainstorm)
| # | Decision | Choice |
|---|---|---|
| 1 | Primary experience | Relational + companion RP |
| 2 | Content scope | Explicit content allowed |
| 3 | Backend | Featherless (OpenAI-compatible API), Dolphin-Mistral-24B-Venice for narrative |
| 4 | Authoring | Rich structured fields |
| 5 | Input style | Mixed prose, novel-style |
| 6 | Multi-bot turn-taking | Addressee + interjection allowance |
| 7 | Bot autonomy | Strictly turn-based for v1; "meanwhile…" deferred to P2-3 |
| 8 | Sessions/campaigns | Bots are persistent units; one chat per bot; guests addable; bot-owned memory |
| 9 | First co-appearance | Per-pair "have they met?" prompt |
| 10 | Scene starts | Kickoff for new; resume in-place for returning; just-appear for guests |
| 11 | State visibility | Optional drawer, closed by default, read-only with edit-on-demand |
| 12 | Scene close | Hybrid (hard-signal auto + manual button) |
| 13 | Time | Per-chat clock, advanced only by explicit user skips |
| 14 | Model strategy | Small classifier model + large narrative model |
| 15 | Reset | Full wipe + hard confirm; chat sits ready for kickoff |
| 16 | Rollback | Rewind + regenerate (with edit-then-regenerate) |
| 17 | UI framework | FastAPI + HTMX + SSE; multi-tab sync as a Phase 1 requirement |
| 18 | Classifier model | NousResearch/Hermes-3-Llama-3.1-8B (fallbacks: dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-Instruct-abliterated) |
| 19 | Token budgets | Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2 |
| 20 | OOC marker | ((double parens)), configurable |
| 21 | DB location | Project-folder <repo>/data/ tree (DB, backups, snapshots, exports, config). Gitignored. CHAT_DB_PATH env var honored |
| 22 | Significance rubric | 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Uses across retrieval, compression, exports, auto-pin, drawer. Score-3 turns auto-pinned per witness |
| 23 | Edge update granularity | Per-turn deltas (affinity, trust, knowledge_facts, last_interaction); per-scene-close summary rewrite |
| 24 | Classifier failure handling | Pydantic-constrained → 1 retry → schema-default fallback. Refusal triggers fallback model swap for that call. 10s timeout. Failures logged |
| 25 | "You" entity | Name (req) + pronouns (opt) + persona blob (opt). Per-bot relationship handles bot-specific framing |
| 26 | Voice samples | 1-3 samples per bot; always must-include in v1 |
| 27 | Trait list | Free-form list of phrases (3-15 typical); not Big Five / MBTI |
| 28 | Backstory | Free-form prose, 100-500 words target |
| 29 | Pinning | Soft cap 8 / bot. Score-3 auto-pins. Manual pins never auto-evicted. Drawer surface with n/8 counter |
| 30 | Containers | Parse-and-extend: kickoff parse seeds initial container; transitions create new; per-chat scoped |
| 31 | Activity verbs | Open string for verb; classifier extracts interruptible, required_attention, expected_duration alongside |
| 32 | Attention | Optional free-form string; omitted from prompt when empty |
| 33 | Drawer edit cut (v1) | Editable: activity (action/attention/posture), edges (affinity/trust/summary/knowledge_facts), memory (pov_summary/significance/pin). Read-only: container, identity, witness, structural |
| 34 | Snapshots | Periodic every 100 events / 30 min; pre-rewind always; 5 periodic retained, pre-rewind 14 days |
| 35 | Rewind UX | Modal with structured impact preview; pre-rewind snapshot auto-saved; 30s undo toast |
| 36 | Regenerate UX | Inline (no modal). Edit-then-regenerate. Old assistant_turn superseded, not deleted |
| 37 | Top-level nav | Three-section left rail: Chats / Bots / Settings |
| 38 | First-run | You-profile → first-bot author → kickoff parse-and-confirm → chat |
| 39 | Display formatting | Lightweight markdown; *action* italic; OOC dimmed/italic/smaller; speaker labels bold |
| 40 | Chat clock format | Stored ISO 8601 UTC datetime; displayed friendly relative ("Tuesday evening, 9:14pm") |
| 41 | Streaming UX | Typing indicator, Stop button, Send disabled mid-stream, multi-tab sync via SSE, mid-stream disconnect = truncated commit |
| 42 | Error UX | Featherless errors inline w/ Retry; classifier fails silent w/ fallback; DB write fails modal-blocking; schema migration fails launch-blocking |
| 43 | Guest leaves | Auto scene-close on guest exit; per-POV summaries for all 3 incl. guest; new scene immediately opens with you+host; group node persists for chat |